Prev: Last Call for Papers Reminder (extended): World Congress on Engineering and Computer Science WCECS 2010
Next: ARM-based desktop computer ? (Hybrid computers ?: Low + High performance ;))
From: Robert Myers on 18 Jul 2010 18:02 I have lamented, at length, the proliferation of flops at the expense of bytes-per-flop in what are now currently styled as supercomputers. This subject came up recently on the Fedora User's Mailing List when someone claimed that GPU's are just what the doctor ordered to make high-end computation pervasively available. Even I have fallen into that trap, in this forum, and I was quickly corrected. In the most general circumstance, GPU's seem practically to have been invented to expose bandwidth starvation. At least one person on the Fedora list got it and says that he has encountered similar issues in his own work (what is in short supply is not flops, but bytes per flop). He also seems to understand that the problem is fundamental and cannot be made to go away with an endless proliferation of press releases, photographs of "supercomputers," and an endless procession of often meaningless color plots. Since the issue is only tangentially related to the list, he suggested a private mailing list to pursue the issue further without annoying others with a topic that most are manifestly not interested in. The subject is really a mix of micro and macro computer architecture, the physical limitations of hardware, the realities of what is ever likely to be funded, and the grubby details of computational mathematics. Since I have talked most about the subject here and gotten the most valuable feedback here, I thought to solicit advice as to what kind of forum would seem most plausible/attractive to pursue such a subject. I could probably host a mailing list myself, but would that be the way to go about it and would anyone else be interested? Email me privately if you don't care to respond publicly. Thanks. Robert.
From: Edward Feustel on 19 Jul 2010 05:54 On Sun, 18 Jul 2010 18:02:49 -0400, Robert Myers <rbmyersusa(a)gmail.com> wrote: >I have lamented, at length, the proliferation of flops at the expense of >bytes-per-flop in what are now currently styled as supercomputers. > --- >Since I have talked most about the subject here and gotten the most >valuable feedback here, I thought to solicit advice as to what kind of >forum would seem most plausible/attractive to pursue such a subject. I >could probably host a mailing list myself, but would that be the way to >go about it and would anyone else be interested? > >Email me privately if you don't care to respond publicly. > >Thanks. > >Robert. This is an important subject. I would suggest that everything be archived in a searchable environment. Keywords and tightly focused discussion would be helpful (if possible). Please let me know if you decide to do a wiki or e-mail list. Ed Feustel Dartmouth College
From: jacko on 19 Jul 2010 11:02 I might click a look see.
From: MitchAlsup on 19 Jul 2010 11:36 It seems to me that having less than 8 bytes of memory bandwidth per flop leads to an endless series of cache excersizes.** It also seems to me that nobody is going to be able to put the required 100 GB/s/processor pin interface on the part.* Nor does it seam, it would have the latency needed to strip mine main memory continuously were the required BW made available. Thus, we are in essence screwed. * current bandwidths a) 3 GHz processors with 2 FP pipes running 128-bit double DP flops (ala SSE) This gives 12 GFlop/processor b) 12 GFlop/processor demands 100 GByte/processor c) DDR3 can achieve 17 GBytes/channel d) high end PC processors can afford 2 memory channels e) therefore we are screwed: e.1)The memory system can supply only 1/3rd of what a single processor wants e.2)There are 4 and growing numbers of processors e.3) therefore the memory systen can support less than 1/12 as much BW as required. Mitch ** The Ideal memBW/Flop is 3 memory operations per flop, and back in the Cray-1 to XMP transition much of the vectorization gain occurred from the added memBW and the better chaining.
From: Thomas Womack on 19 Jul 2010 11:54
In article <QSK0o.10246$Zp1.7167(a)newsfe15.iad>, Robert Myers <rbmyersusa(a)gmail.com> wrote: >I have lamented, at length, the proliferation of flops at the expense of >bytes-per-flop in what are now currently styled as supercomputers. > >This subject came up recently on the Fedora User's Mailing List when >someone claimed that GPU's are just what the doctor ordered to make >high-end computation pervasively available. Even I have fallen into >that trap, in this forum, and I was quickly corrected. In the most >general circumstance, GPU's seem practically to have been invented to >expose bandwidth starvation. Yes, they've got a very low peak bandwidth:peak flops ratio; but the peak bandwidth is reasonably high in absolute terms - the geforce 480 peak bandwidth is about that of a Cray T916. (the chip has about 2000 balls on the bottom, 384 of which are memory I/O running at 4GHz) I don't think it makes sense to complain about low bw:flops ratios; you could always make the ratio higher by removing ALUs, getting you a machine which is less capable at the many jobs that can be made to need flops but not bytes. Tom |