Prev: 128316 Computer Knowledge, Free and alwqays Up to Date 59
Next: Fwd: Different stacks for return addresses and data?
From: "Andy "Krazy" Glew" on 10 Mar 2010 01:59 Robert Myers wrote: > A machine that achieves 5% efficiency doing a bog-standard problem has > little right to claim to be super in any respect. That's my mantra. I've been lurking and listening and learning, but, hold on: this is silly. Utilization has nothing to do with "super-ness". Cost effectiveness is what matters. When you say 5% efficient, I assume that you are talking about as a fraction of peak flops. But if the flops are cheap compared to the bandwidth, then it may very well make sense to add lots of flops. You might want to add more bandwidth, but if you can add a 1% utilized flop and eke a little bit more out of the expensive interconnect... What you really need to do is show that there are better architectures, that deliver the bandwidth you want, as well as getting better utilization. OK, now I'll step back and listen some more.
From: "Andy "Krazy" Glew" on 10 Mar 2010 02:00 nedbrek wrote: > Hello all, > > "Del Cecchi" <delcecchinospamofthenorth(a)gmail.com> wrote in message > news:7viusjFjaqU1(a)mid.individual.net... >> Andy "Krazy" Glew wrote: >>> If you are a computer architect, it's Intel in Oregon, Silicon Valley, >>> Austin. Where else? >> Perhaps IBM in Rochester MN or maybe even Mayo Clinic, Rochester. The >> clinic does a lot of special stuff for medical equipment and Dr Barry >> Gilbert had a group that did high speed stuff for Darpa. > > I sometimes fantasize about going to Intel Israel. I think that no longer having the prospect of visiting Haifa is the only thing I really miss about Intel.
From: Terje Mathisen "terje.mathisen at on 10 Mar 2010 02:16 Andy "Krazy" Glew wrote: > Robert Myers wrote: > >> A machine that achieves 5% efficiency doing a bog-standard problem has >> little right to claim to be super in any respect. That's my mantra. > > I've been lurking and listening and learning, but, hold on: this is silly. > > Utilization has nothing to do with "super-ness". Cost effectiveness is > what matters. Exactly, ref. my post about splitting seismic processing into two parts, on two different architectures. > > When you say 5% efficient, I assume that you are talking about as a > fraction of peak flops. Probably. > > But if the flops are cheap compared to the bandwidth, then it may very > well make sense to add lots of flops. You might want to add more > bandwidth, but if you can add a 1% utilized flop and eke a little bit > more out of the expensive interconnect... This is crucial: Intel (Larrabee) and others have shown that you can put an awful lot of flops on a single die, while still staying within a reasonable power budget. If having a teraflop available in a $200 chip makes it possible to get 10% better use out of the $2000 (per board) interconnect, then that's a good bargain. > > What you really need to do is show that there are better architectures, > that deliver the bandwidth you want, as well as getting better utilization. > > OK, now I'll step back and listen some more. As will I. :-) Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: Robert Myers on 10 Mar 2010 11:51 On Mar 10, 2:16 am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > Andy "Krazy" Glew wrote: > > Robert Myers wrote: > > >> A machine that achieves 5% efficiency doing a bog-standard problem has > >> little right to claim to be super in any respect. That's my mantra. > > > I've been lurking and listening and learning, but, hold on: this is silly. > > > Utilization has nothing to do with "super-ness". Cost effectiveness is > > what matters. > > Exactly, ref. my post about splitting seismic processing into two parts, > on two different architectures. > A commercial customer can use cost-effectiveness as the sole criterion, but my argument, which applies specifically to the national security establishment of the US, is that the cost-effectiveness argument is leading us down a blind alley so far as basic science is concerned. Historically, basic science is funded by sovereigns or others with no immediate commercial goal. I focus on the FFT because there I am certain of my footing. In a broader sense that I can't defend with such specificity, if all your computer designs force localized computation or favor it heavily, you will skew the kind of science that you can and will do. > > > > When you say 5% efficient, I assume that you are talking about as a > > fraction of peak flops. > > Probably. > Some people use fraction of peak flops, but, to make things look less bad, fraction of linpack flops is becoming more popular. > > > > But if the flops are cheap compared to the bandwidth, then it may very > > well make sense to add lots of flops. You might want to add more > > bandwidth, but if you can add a 1% utilized flop and eke a little bit > > more out of the expensive interconnect... > > This is crucial: Intel (Larrabee) and others have shown that you can put > an awful lot of flops on a single die, while still staying within a > reasonable power budget. > > If having a teraflop available in a $200 chip makes it possible to get > 10% better use out of the $2000 (per board) interconnect, then that's a > good bargain. > Maybe it is and maybe it isn't. If you've skewed the design of the entire architecture (which includes not just the nodes and the interconnect, but power budgets and supporting infrastructure, cooling, floor space, etc.) to favor beating the living daylights out of small local datasets, the burden is on you to show how free it really is. My argument is that the scalability of current machines is a PT Barnum hoax with very little scientific payoff. Those huge buildings and staff are expensive. Keeping a gigantic cluster running is itself hugely expensive, even with gigabit ethernet as an interconnect. I'm not saying that we should stop building such machines altogether, but I am (and have been for a long time) waving a very big red flag. People may intuitively think that if you just pile up enough flops in one place, you can do any kind of math you want, but that intuition is dangerous. When I start down this road, I will inevitably start to sputter because I feel so outnumbered. No such thing as a free lunch. If the interesting physics are in 512 nodes, then that's all you should bother with. If you think you can do very large nonlinear problems and get the physics right with localized computation, my position is that you are fooling only yourself. > > > > What you really need to do is show that there are better architectures, > > that deliver the bandwidth you want, as well as getting better utilization. > > > OK, now I'll step back and listen some more. > > As will I. > :-) There's a chunk of this history I'm missing because I was not paying any attention at all at the time. People *appear* to have given up with the KSR failure, and one of the problems there (aside from accounting "irregularities") was with scalability being limited by the interconnect that was supposed to allow the machine to scale. If the limit on interconnects is fundamental, I'd sure like to understand why. Robert.
From: Robert Myers on 10 Mar 2010 12:02
On Mar 9, 10:13 pm, Del Cecchi <delcec...(a)gmail.com> wrote: > > I am an EE. I don't think the world owes me. I have been disabused of > that notion long ago. If you want a meal that hasn't been done before > at least for very many folks there are restaurants and chefs that will > do the job. El Bulli for example. You just have to pay the money. > You're not a EE who feels entitled, and I'm not someone whose experience is so limited as never to have seen just how many smart people there are in the world, how hard they work, or what *really* smart really amounts to. My read on computer architecture is that is is obsessed with latency and has been for a long time. Latency across the network bridge is what killed KSR scalability. You can and we have argued over just how effectively the memory latency problem has been addressed. My complaint about Blue Gene is that the latency for such a large cluster is amazing, but the global bandwidth is pathetic. You've already hinted that the limitation may be fundamental, at least for that design: the boxes were already stuffed full of wires. > Now, given the constraints of the technology that exists today, what > should the folks with the money and capability build? It won't be > inexpensive since it is new and the time of skilled folks must be paid for. There is no doubt that it will be expensive. My argument is that, at this point, there is little reason to build more of the same, other perhaps than to gain energy efficiency. Robert. |