Prev: 128316 Computer Knowledge, Free and alwqays Up to Date 59
Next: Fwd: Different stacks for return addresses and data?
From: Robert Myers on 8 Mar 2010 20:50 On Mar 8, 3:16 pm, Larry <lstewa...(a)gmail.com> wrote: > > Actually I think R. Myers' facts are wrong here. I downloaded the > HPCC Challenge Results data from the UTK website, and added > a column comparing computing the ration of global FFT performance > versus global HPL performance. Then I filtered away all systems not > achieving at least 100 GFlops FFT performance. > > For systems achieving over a teraflop of FFT performance, BG/L with > 32K cores is beaten <only> by the SX-9 in this figure of > merit. For systems achieving over 100 GF, it is beaten by the SX-9, > by the best Nahelem/Infiniband systems, by the Cray XT3, and > by the largest SiCortex machine. > > BG/L is hard to program, and weird in many ways, buy inability to do > FFT is a bum rap. > > Incidently, only the SX9 gets over 10% of HPL this way, the rest of > the pack is in the 3-6% area. Global FFT is hard. > **sigh** It's not a matter that global FFT is hard. It is that these machines, that claim to be scalable, don't actually scale for the FFT because of severely crippled global bandwidth. That machines the DoE get lousy actual/peak performance is something people have been complaining about for years. As far as the FFT is concerned, we are making essentially no progress. It is the result of budgetary and architectural choices, and it is not an unsolvable problem. > There is a very large market for FFT cycles, particularly in doing 2D > and 3D ffts for seismic processing. The data sets for > these calculations get so large, that there is no way to do > substantial numbers of them in parallel, because the machines > do not have enough memory to hold, say a terabyte of data per run,. > If you try to do it any other > way, you've succeeded in turning a cluster communications problem into > a (worse) I/O problem. > If you're going to criticize me, at least read what I write. Your example is completely irrelevant to the problem that concerns me, as is your conclusion about I/O. Robert.
From: Robert Myers on 8 Mar 2010 23:52 On Mar 7, 10:48 pm, Del Cecchi <delcec...(a)gmail.com> wrote: > If there was some miracle architecture that would be much more effective > than BG for the questions of interest that doesn't require unobtanium to > fabricate, I think someone would have at least proposed it. > > Coherent allcache? KSR1? > > No cache multithreaded? Terra MTA. > > Hottest microprocessor around in moderate numbers using state of art > technology? Power7 boxes. > > AMD processors in a cluster with custom interconnect? (what did cray > call it? I am drawing a blank) > I suppose this is an insulting response, but here goes. You go to the DriveUp window at MacDonald's. There is a menu. You pick. How smart do you have to be to do that? Design a meal I've never thought of, that will be irreistibly appealing and inexpensive. How many yahoo's can do that? Who works at IBM and at the national labs? Yes, I intend to be provocative. I have to pay [name deleted] so he can drop names and be a menu-chooser? Get out of town, all of you. This is comp.arch, not the hotline for EE's who think the world owes them. Robert.
From: Terje Mathisen "terje.mathisen at on 9 Mar 2010 02:46 Robert Myers wrote: > If you're going to criticize me, at least read what I write. I am not > talking about problems where I/O would be the limiting factor. I'd be > surprised if seismic processing were done on anything but rack > clusters. Completely different world. Be surprised: Seismic processing (at least in our installation) is split into two parts, balanced to take approximately the same amount of wall clock time, and so that the total cost of a full run is optimized. The first part run on an SMP Itanium box with lots of memory and very good bandwidth, it is used to do the initial gridding of the problem space, while the second half uses a pretty standard Linux cluster with 2K dual-core cpus. Afaik the smaller SMP machine did account for a significant part of the setup. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: Robert Myers on 9 Mar 2010 08:39 On Mar 9, 2:46 am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > Robert Myers wrote: > > If you're going to criticize me, at least read what I write. I am not > > talking about problems where I/O would be the limiting factor. I'd be > > surprised if seismic processing were done on anything but rack > > clusters. Completely different world. > > Be surprised: > > Seismic processing (at least in our installation) is split into two > parts, balanced to take approximately the same amount of wall clock > time, and so that the total cost of a full run is optimized. > > The first part run on an SMP Itanium box with lots of memory and very > good bandwidth, it is used to do the initial gridding of the problem > space, while the second half uses a pretty standard Linux cluster with > 2K dual-core cpus. > > Afaik the smaller SMP machine did account for a significant part of the > setup. > I don't know enough about seismic processing as it is currently done to understand the logic of the setup you describe. My guess about seismic processing was based on comments by others here who are in the business and on the assumption that speed is essential, so you don't have much time to fiddle around with computer architecture: best just to cobble together the latest and greatest and get moving so you can find the good prospects first. If you want to talk about how to sell lots of chips and boxes, you have one problem and one set of considerations and constraints. That's the position that IBM and Intel would mostly like to be in. They don't care all that much whether you are doing seismic processing or computer animation. They want the revenue, as do the people who are buying their hardware. National security spending in the U.S., and especially spending by NASA, the NSF, and the bomb labs *claims* to have other goals. Every press release about a new machine claims to be advancing the state of the art. In most cases, those claims are little different from marketspeak coming from IBM or Intel. They have done whatever they could with the money they had, and their goal is to advertise having one of the baddest, fastest machines on earth. A machine that achieves 5% efficiency doing a bog-standard problem has little right to claim to be super in any respect. That's my mantra. Those who are in commerce will continue to try to maximize standard measures of business performance. It is not their job to advance basic science. Various departments of the Federal government repeatedly claim that they are advancing basic science, but, so far as computational science is concerned, those claims are largely fraudulent. Now, to be sure, IBM and Intel would like to be able to gild the lily by emphasizing their position on the Top 500 list. If they could be key to a fundamental scientific breakthrough, they'd love to take credit. But as far as actually investing in making computers more powerful and useful for fundamental science, no one (except perhaps DARPA) is in the business. Robert.
From: Del Cecchi on 9 Mar 2010 22:13
Robert Myers wrote: > On Mar 7, 10:48 pm, Del Cecchi <delcec...(a)gmail.com> wrote: > >> If there was some miracle architecture that would be much more effective >> than BG for the questions of interest that doesn't require unobtanium to >> fabricate, I think someone would have at least proposed it. >> >> Coherent allcache? KSR1? >> >> No cache multithreaded? Terra MTA. >> >> Hottest microprocessor around in moderate numbers using state of art >> technology? Power7 boxes. >> >> AMD processors in a cluster with custom interconnect? (what did cray >> call it? I am drawing a blank) >> > I suppose this is an insulting response, but here goes. > > You go to the DriveUp window at MacDonald's. There is a menu. You > pick. How smart do you have to be to do that? > > Design a meal I've never thought of, that will be irreistibly > appealing and inexpensive. How many yahoo's can do that? Who works > at IBM and at the national labs? > > Yes, I intend to be provocative. I have to pay [name deleted] so he > can drop names and be a menu-chooser? Get out of town, all of you. > > This is comp.arch, not the hotline for EE's who think the world owes > them. > > Robert. > I am an EE. I don't think the world owes me. I have been disabused of that notion long ago. If you want a meal that hasn't been done before at least for very many folks there are restaurants and chefs that will do the job. El Bulli for example. You just have to pay the money. Now, given the constraints of the technology that exists today, what should the folks with the money and capability build? It won't be inexpensive since it is new and the time of skilled folks must be paid for. |