Prev: Last Call for Papers Reminder (extended): World Congress on Engineering and Computer Science WCECS 2010
Next: ARM-based desktop computer ? (Hybrid computers ?: Low + High performance ;))
From: Robert Myers on 19 Jul 2010 12:31 On Jul 19, 11:54 am, Thomas Womack <twom...(a)chiark.greenend.org.uk> wrote: > In article <QSK0o.10246$Zp1.7...(a)newsfe15.iad>, > Robert Myers <rbmyers...(a)gmail.com> wrote: > > >I have lamented, at length, the proliferation of flops at the expense of > >bytes-per-flop in what are now currently styled as supercomputers. > > >This subject came up recently on the Fedora User's Mailing List when > >someone claimed that GPU's are just what the doctor ordered to make > >high-end computation pervasively available. Even I have fallen into > >that trap, in this forum, and I was quickly corrected. In the most > >general circumstance, GPU's seem practically to have been invented to > >expose bandwidth starvation. > > Yes, they've got a very low peak bandwidth:peak flops ratio; but the > peak bandwidth is reasonably high in absolute terms - the geforce 480 > peak bandwidth is about that of a Cray T916. > > (the chip has about 2000 balls on the bottom, 384 of which are memory > I/O running at 4GHz) > > I don't think it makes sense to complain about low bw:flops ratios; > you could always make the ratio higher by removing ALUs, getting you a > machine which is less capable at the many jobs that can be made to > need flops but not bytes. > It doesn't make much sense, as I have repeatedly been reminded, simply to complain, if that's all you ever do. If nothing else, I've been on a one-man crusade to stop the misrepresentation of current "supercomputers." The designs are *not* scalable, except with respect to a set of problems that are embarrassingly parallel in the global sense, or so close to embarrassingly parallel that the wimpy global bandwidth that's available is not a serious handicap. If you can't examine the most interesting questions about the interaction between the largest and smallest scales the machine can represent without making indefensible mathematical leaps, then why bother building the machine at all? Because there are a bunch of almost embarrassingly parallel problems that you *can* do? I don't think we're ever going to agree on this. Your ongoing annoyance has been noted. I'd like to explore what can and what cannot be done so that everyone understands the consequences of the decisions being made about computational frontiers that, from the way we are going now, will never be explored. Maybe we've reached a brick wall. If so, I'm mostly the only one talking about it, and I'd like to broaden the discussion without annoying people who don't want to hear it. Robert.
From: David L. Craig on 19 Jul 2010 13:42 I am new to comp.arch and so am unclear of the pertinent history of this discussion, so please bear with me and don't take any offense at anything I say, as that is quite unintended. Is the floating point bandwidth issue only being applied to one architecture; e.g., x86? If so, why? Is this not a problem with other designs? Also, why single out floating point bandwidth? For instance, what about the maximum number of parallel RAM acceses architectures can support, which has major impacts on balancing cores' use with I/Os use? If everyone thinks a different group is called for, that's fine with me. I just want to understand the reasons this type of discussion doesn't fit here.
From: Robert Myers on 19 Jul 2010 14:59 David L. Craig wrote: > I am new to comp.arch and so am unclear of the pertinent history of > this > discussion, so please bear with me and don't take any offense at > anything > I say, as that is quite unintended. > > Is the floating point bandwidth issue only being applied to one > architecture; > e.g., x86? If so, why? Is this not a problem with other designs? Some of my harshest criticism has been aimed at computers built around the Power architecture, one of which briefly owned the top spot on the Top 500 list. The problem is not peculiar to any ISA. > Also, why > single out floating point bandwidth? For instance, what about the > maximum > number of parallel RAM acceses architectures can support, which has > major > impacts on balancing cores' use with I/Os use? > I have no desire to limit the figures of merit that deserve consideration. I just want provide some corrective to the "Wow! A gazillion flops!" without even an asterisk talk. Right now, people present, brag about, and plan for just one figure of merit: linpack flops. That makes sense to some, I gather, but it makes no sense to me. Computation is more or less a solved problem. Most of the challenges left have to do with moving data around, with latency and not bandwidth having gotten the lion's share of attention (for good reason). I believe that moving data around will ultimately be the limiting factor with regard to reducing power consumption. > If everyone thinks a different group is called for, that's fine with > me. I just > want to understand the reasons this type of discussion doesn't fit > here. The safest answer that I can think of to this question is that it is really an interdepartmental problem. The computer architects here have been relatively tolerant of my excursions of thought as to why the computers currently being built don't really cut it, but a proper discussion of all the pro's and con's would take the discussion and perhaps the list far beyond any normal definition of computer architecture. Even leaving aside justifying why expensive bandwidth is not optional, there is little precedent here for in-depth explorations of blue-sky proposals. A fair fraction of the blue-sky propositions brought here can't be taken seriously, and my sense of this group is that it wants to keep the thinking mostly inside the box, not for want of imagination, but to avoid science fiction and rambling, uninformed discussion. Robert.
From: jgd on 19 Jul 2010 15:29 In article <QSK0o.10246$Zp1.7167(a)newsfe15.iad>, rbmyersusa(a)gmail.com (Robert Myers) wrote: > Since I have talked most about the subject here and gotten the most > valuable feedback here, I thought to solicit advice as to what kind > of forum would seem most plausible/attractive to pursue such a > subject. A mailing list seems the most plausible to me. When the subject doesn't have a well-defined structure (as yet), a wiki or web BBS tends to get in the way of communication. -- John Dallman, jgd(a)cix.co.uk, HTML mail is treated as probable spam.
From: nik Simpson on 19 Jul 2010 15:44
On 7/19/2010 10:36 AM, MitchAlsup wrote: > > d) high end PC processors can afford 2 memory channels Not quite as screwed as that, the top-end Xeon & Opteron parts have 4 DDR3 memory channels, but still screwed. For the 2-socket space, it's 3 DDR3 memory channels for typical server processors. Of course, the move to on-chip memory controllers means that scope for additional memory channels is pretty much "zero" but that's the price you pay for commodity parts, they are designed to meet the majority of customers, and it's hard to justify the costs of additional memory channels at the processor and board layout levels just to satisfy the needs of bandwidth crazy HPC apps ;-) -- Nik Simpson |