Prev: Last Call for Papers Reminder (extended): World Congress on Engineering and Computer Science WCECS 2010
Next: ARM-based desktop computer ? (Hybrid computers ?: Low + High performance ;))
From: nedbrek on 21 Jul 2010 08:05 Hello all, "Robert Myers" <rbmyersusa(a)gmail.com> wrote in message news:37734c23-5748-4f4b-9013-1d1c60cb3d94(a)d8g2000yqf.googlegroups.com... > On Jul 20, 1:49 pm, "David L. Craig" <dlc....(a)gmail.com> wrote: >> If we're talking about custom, never-mind-the-cost >> designs, then that's the stuff that should make this >> a really fun group. > > Moving the discussion to some place slightly less visible than > comp.arch might not produce more productive flights of fancy, but I, > for one, am interested in what is physically possible and not just > what can be built with the consent of Sen. Mikulski--a lady I have > always admired, to be sure, from her earliest days in politics, just > not the person I'd cite as intellectual backup for technical > decisions. If we are only limited by physics, a lot is possible... Can you summarize the problem space here? 1) Amount of data - fixed (SPEC), or grows with performance (TPC) 2) Style of access - you mentioned this some, regular (not random) but not really suitable for sequential (or cache line) structures. Is it sparse array? Linked lists? What percentage is pointers vs. FMAC inputs? 3) How branchy is it? I think that should be enough to get some juices going... Ned
From: David L. Craig on 21 Jul 2010 10:58 On Jul 20, 7:11 pm, Robert Myers <rbmyers...(a)gmail.com> wrote: > Maybe quantum entanglement is the answer to moving data around. Sigh... I wonder how many decades we are from that being standard in COTS hardware (assuming the global underpins of R&D hold up that long). Probably more than I've got (unless the medical R&D also grows by leaps and bounds and society deems me worthy of being kept around). I like simultaneous backup 180 degrees around the planet and on the Moon, that's for sure.
From: Jeremy Linton on 21 Jul 2010 11:42 On 7/21/2010 5:26 AM, nmm1(a)cam.ac.uk wrote: > In article<8ant0rFf0gU1(a)mid.individual.net>, > Andrew Reilly<areilly---(a)bigpond.net.au> wrote: >> On Tue, 20 Jul 2010 11:49:03 -0700, Robert Myers wrote: >> >>> (90%+ efficiency for Linpack, 10% for anything even slightly more >>> interesting). >> >> Have you, or anyone else here, ever read any studies of the sensitivities >> of the latter performance figure to differences in interconnect bandwidth/ >> expense? I.e., does plugging another fat IB tree into every node in >> parallel, doubling cross section bandwidth, raise the second figure to >> 20%? > > A little, and I have done a bit of testing. It does help, sometimes > considerably, but the latency is at least as important as the bandwidth. With regard to latency, I've wondered for a while, why no has built a large inifiniband (like?) switch with a large closely attached memory. It probably won't help the MPI guys, but those beasts are only used for the HPC market anyway. Why not modify them to shave a hop off and admit that some segment of the HPC market could use it? Is the HPC market that cost sensitive that they cannot afford a slight improvement, at a disproportionate cost, for one component in the system?
From: Robert Myers on 21 Jul 2010 12:27 Andrew Reilly wrote: > On Tue, 20 Jul 2010 11:49:03 -0700, Robert Myers wrote: > >> (90%+ efficiency for Linpack, 10% for anything even slightly more >> interesting). > > Have you, or anyone else here, ever read any studies of the sensitivities > of the latter performance figure to differences in interconnect bandwidth/ > expense? I.e., does plugging another fat IB tree into every node in > parallel, doubling cross section bandwidth, raise the second figure to > 20%? I have read such studies, yes, and I've even posted some of what I've found here on comp.arch, where there has been past discussion of just those kinds of questions. That's an argument for why this material shouldn't be limited to being scattered through comp.arch. I have a hard time finding even my own posts with Google groups search. A place has generously been offered to host probably a mailing list and a wiki. I'll be glad to try to continue to pursue the conversation here to try to generate as wide interest as possible, but, since I've already worn the patience of some thin by repeating myself, I'd rather focus on finding a relatively quiet gathering place for those who are really interested. I have neither interest in nor intention of moderating a group or limiting the membership, so whatever is done should be available to whoever is interested. Whatever I do will be clearly announced here. Robert.
From: nmm1 on 21 Jul 2010 12:28
In article <i274h0$hqs$1(a)speranza.aioe.org>, Jeremy Linton <reply-to-list(a)nospam.org> wrote: >>> >>>> (90%+ efficiency for Linpack, 10% for anything even slightly more >>>> interesting). >>> >>> Have you, or anyone else here, ever read any studies of the sensitivities >>> of the latter performance figure to differences in interconnect bandwidth/ >>> expense? I.e., does plugging another fat IB tree into every node in >>> parallel, doubling cross section bandwidth, raise the second figure to >>> 20%? >> >> A little, and I have done a bit of testing. It does help, sometimes >> considerably, but the latency is at least as important as the bandwidth. > >With regard to latency, I've wondered for a while, why no has built a >large inifiniband (like?) switch with a large closely attached memory. >It probably won't help the MPI guys, but those beasts are only used for >the HPC market anyway. Why not modify them to shave a hop off and admit >that some segment of the HPC market could use it? Is the HPC market that >cost sensitive that they cannot afford a slight improvement, at a >disproportionate cost, for one component in the system? It's been done, but network attached memory isn't really viable, as local memory is so cheap and so much faster. It sounds as if you think it would reduce latency, but of what? I.e. what would you use it for? Regards, Nick Maclaren. |