Prev: Call for Papers Reminder (extended): The World Congress on Engineering WCE 2010
Next: Call to stop spamming here
From: Robert Myers on 13 Mar 2010 22:50 I'm slowly doing my catchup homework on what the national wisdom is on bisection bandwidth. Not too surprisingly, there a plenty of people out there who know that it's already a big problem, and that it is only going to get bigger, as there is no Moore's Law for bandwidth. In the meantime, I'd like to offer a succinct explanation as to why this issue is fundamental and won't go away if only enough flops and/ or press releases and/or power point presentations about flops appear. States do not interact in linear time invariant systems. There is almost always a transformation available that will put the problem into a representation where the computation is actually embarrassingly parallel. This is fundamental characteristic of such systems and is not driven by bureaucratic requirements. Actually applying such a transformation to expose the embarrassingly parallel nature of the problem may not be convenient, as it is generally a global matrix operation on a state vector, but the possibility always exists. No such transformation exists for general nonlinear systems. Various approximations to such transformations may exist that amount to linearization around some particular system state, but, once the system has changed in interesting ways, the transformation will cease even to be approximately valid. In general, nonlinear systems mix states in upredictable ways such that localization of the computation is not possible. If you are using finite differences, which are generally a convolution of finite support over a state, you can deceive yourself into thinking that you have successfully localized the problem, but the process of approximating a differential operator by such a localized convolution will *itself* mix states in ways that are unphysical and unrelated to the actual mathematics of the problem. I suspect that this key deception is important to allowing continuing use of "supercomputers" that, on the face of it, are unsuitable for nonlinear systems, because general nonlinear systems globally mix states. The local nature of the computation is an artifact of the discretization scheme that bears no necessary relationship to the physics or mathematics of the actual problem. In some systems, the resulting errors may be acceptable. For general strongly non-linear systems, there is no a priori way that I know of to establish that the errors are acceptable. One common property of "stable" differencing schemes is that they artificially smooth solutions, so that even testing by changing grid resolution may not reveal a problem. If there is a mathematician in the house and I have made an error, I'm sure I will be informed of it. Adequate global bandwidth is not merely desirable, it is a requirement for simulating nonlinear systems. Robert.
From: Terje Mathisen "terje.mathisen at on 14 Mar 2010 05:43 Robert Myers wrote: > I'm slowly doing my catchup homework on what the national wisdom is on > bisection bandwidth. Not too surprisingly, there a plenty of people > out there who know that it's already a big problem, and that it is > only going to get bigger, as there is no Moore's Law for bandwidth. Huh? Sure there is, it is driven by the same size shrinks as regular ram and cpu chips have enjoyed. I guess the real problem is that you'd like the total bandwidth to scale not just with the link frequencies but even faster so that it also keeps up by the increasing total number of ports/nodes in the system, without overloading the central mesh? Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: Robert Myers on 14 Mar 2010 13:15 On Mar 14, 5:43 am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > Robert Myers wrote: > > I'm slowly doing my catchup homework on what the national wisdom is on > > bisection bandwidth. Not too surprisingly, there a plenty of people > > out there who know that it's already a big problem, and that it is > > only going to get bigger, as there is no Moore's Law for bandwidth. > > Huh? > > Sure there is, it is driven by the same size shrinks as regular ram and > cpu chips have enjoyed. > > I guess the real problem is that you'd like the total bandwidth to scale > not just with the link frequencies but even faster so that it also keeps > up by the increasing total number of ports/nodes in the system, without > overloading the central mesh? > At the chip (or maybe chip carrier) level, there are interesting things you can do because of decreased feature sizes, as we have recently discussed. It's conceivable that such trickery will allow a computer with better global scaling of bandwidth, but it is not, so far as I know, an automatic result, unlike Moore's law which has allowed cramming more and more flops into a smaller and smaller space, leaving global bandwidth as the unspoken of and unsolved and perhaps even unsolvable problem. Robert.
From: MitchAlsup on 14 Mar 2010 14:23 On Mar 14, 12:15 pm, Robert Myers <rbmyers...(a)gmail.com> wrote: > On Mar 14, 5:43 am, Terje Mathisen <"terje.mathisen at tmsw.no"> > > I guess the real problem is that you'd like the total bandwidth to scale > > not just with the link frequencies but even faster so that it also keeps > > up by the increasing total number of ports/nodes in the system, without > > overloading the central mesh? > > At the chip (or maybe chip carrier) level, there are interesting > things you can do because of decreased feature sizes, as we have > recently discussed. One achieves maximal "routable" bandwidth at the "frame" scale . With todays current board technologies, this "frame" scale occurs around 1 cubic meter. Consider a 1/2 meter sq motherboard with "several" CPU nodes with 16 bidirectionial (about) byte wide ports running at 6-10 GTs. Now consider a back plane that simply couples this 1/2 sq meter motherboard to another 1/2 sq meter DRAM carring board also with 16 bidirectional (almost) bite wide ports running at the same frequencies. Except, this time, the DRAM boards are perpendicular to the CPU boards. With this arrangement, we have 16 CPU containing motherboards fully connected to 16 DRAM containing motherboards and 256 (almost) byte wide connections running at 6-10 GTs. 1 cubic meter, about the actual size of a refrigerator. {Incidentally, this kind of system would have about 4TB/s of bandwidth to about 4TB of actual memory} Once you get larger than this, all of the wires actualy have to exist as wires (between "frames"), not just traces of coper on a board or through a connector, and one becomes wire bound connecting frames. Mitch
From: Anton Ertl on 14 Mar 2010 14:25
MitchAlsup <MitchAlsup(a)aol.com> writes: >Consider a 1/2 meter sq motherboard with "several" CPU nodes with 16 >bidirectionial (about) byte wide ports running at 6-10 GTs. Now >consider a back plane that simply couples this 1/2 sq meter >motherboard to another 1/2 sq meter DRAM carring board also with 16 >bidirectional (almost) bite wide ports running at the same >frequencies. Except, this time, the DRAM boards are perpendicular to >the CPU boards. With this arrangement, we have 16 CPU containing >motherboards fully connected to 16 DRAM containing motherboards and >256 (almost) byte wide connections running at 6-10 GTs. 1 cubic meter, >about the actual size of a refrigerator. I compute 1/2m x 1/2m x 1/2m = 1/8 m^3. Where have I misunderstood you? But that size is the size of a small freezer around here (typical width 55-60cm, depth about the same, and the height of the small ones is around the same, with the normal-sized ones at about 1m height). Hmm, couldn't you have DRAM boards on both sides of the mainboard (if you find a way to mount the mainboard in the middle and make it strong enough). Then you can have a computer like a normal-size fridge:-). - anton -- M. Anton Ertl Some things have to be seen to be believed anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html |