Prev: PEEEEEEP
Next: Texture units as a general function
From: nmm1 on 4 Jan 2010 04:43 In article <7qd7bmFfo9U1(a)mid.individual.net>, Del Cecchi <delcecchi(a)gmail.com> wrote: > >You could use the provided hardware scatter-gather if you were astute >enough to use InfiniBand interconnect. :-) And program right down on the bare metal! When I last investigated, there was a distinct shortage of support for InfiniBand's fancy features, including that one. As I read it, almost everyone used OpenIB for commodity clusters, and the support of that was merely a glint in the eye of the developers. And had been for some years. The trouble with InfinBand is that it was design by committee, and is complicated beyond all reason. Even more than SCSI, much of it is likely never to be implemented and used. Regards, Nick Maclaren.
From: Thomas Womack on 4 Jan 2010 04:56 In article <b7dd84e0-0cbc-41c6-8a85-b51197c2d960(a)e27g2000yqd.googlegroups.com>, Robert Myers <rbmyersusa(a)gmail.com> wrote: >On Jan 3, 11:25=A0pm, wclod...(a)lost-alamos.pet (William Clodius) wrote: > >> I think Nick is saying that to improve locality it is essential to >> transpose the dimensions of the array as you cycle through each >> dimension in a multi-dimensional array in a multi-dimensional FFT. > >But it sure wasn't necessary on a Cray-1--a side benefit of not having >cache and having easily worked-around limitations on an arbitrary >stride in memory. But, of course, none of this matters any more, >because no one has to bother with multi-dimensional FFT's. Or, at >least, they'd better not. I can't read this as other than obtuse; yes, you don't need to do these tricks if main memory is the same speed as your processor, and indeed if you're running on an i7 a job small enough to fit on a Cray-1 you probably don't have to do that many of the tricks - the level-3 cache is eight megabytes long, the size of Cray-1 main memory, and the other caches are reasonably good at batching up accesses to L3. The place I work for does 3D FFTs from morning until night, but since nobody knows how to grow large flawless protein crystals, the _size_ of the FFTs hasn't changed since it was difficult to fit them in a Cray-1; FFTs on data small enough to fit in the shared memory of a current cheap SMP are really quite fast and parallelise quite reasonably, and (much more importantly) Frigo&Johnson at MIT have done the parallelisation and FFTW is not expensive even for commercial users. Tom
From: nmm1 on 4 Jan 2010 06:40 In article <CGc*A7g0s(a)news.chiark.greenend.org.uk>, Thomas Womack <twomack(a)chiark.greenend.org.uk> wrote: > >The place I work for does 3D FFTs from morning until night, but since >nobody knows how to grow large flawless protein crystals, the _size_ >of the FFTs hasn't changed since it was difficult to fit them in a >Cray-1; FFTs on data small enough to fit in the shared memory of a >current cheap SMP are really quite fast and parallelise quite >reasonably, and (much more importantly) Frigo&Johnson at MIT have done >the parallelisation and FFTW is not expensive even for commercial >users. When I last looked, the performance wasn't brilliant, but that was simply due to the fact that FFTs are excellent memory stress testers. It was as good as you could expect, given the limitations of the memory subsystem. I am merely one of many thousands of people who have tried to work out how to parallelise multi-dimensional FFTs when communication is the limit, and failed to think of anything new. Regards, Nick Maclaren.
From: Anne & Lynn Wheeler on 4 Jan 2010 07:17 medusa was all about footprint, cooling (handling heat as more & more was compressed into smaller area), and interconnect ... and we were doing cluster scaleup as part of ha/cmp ... which also was heavily into availability http://www.garlic.com/~lynn/subtopic.html#hacmp in fact, when I was out marketing ha/cmp, I had coined the terms "disaster survivability" and "geographic survivability" to differentiate from disaster/recovery http://www.garlic.com/~lynn/submain.html#available primary person behind medusa was engineer that I previously mentioned involved with 6000 serial-link-adaptor (SLA) and then worked in FCS committee ... recent reference: http://www.garlic.com/~lynn/2009s.html#32 Larrabee delayed: anyone know what's happening? misc. old medusa email &/or cluster scaleup http://www.garlic.com/~lynn/lhwemail.html#medusa this is old email (lots of stuff about working with LLNL &/or parties doing stuff for LLNL.) ... possibly just hrs before being told it was being transferred and we weren't to work on anything with more than 4-processors http://www.garlic.com/~lynn/2006x.html#email920129 and then little over 2weeks later announcement (only for *scientific and technical*) http://www.garlic.com/~lynn/2001n.html#6000clusters1 and then later that summer (about the time we were leaving) .... *clusters caught us by surprise* http://www.garlic.com/~lynn/2001n.html#6000clusters2 other recent threads mentioning above: http://www.garlic.com/~lynn/2009o.html#81 big iron mainframe vs. x86 servers http://www.garlic.com/~lynn/2009p.html#54 big iron mainframe vs. x86 servers http://www.garlic.com/~lynn/2009p.html#55 MasPar compiler and simulator http://www.garlic.com/~lynn/2009p.html#85 Anyone going to Supercomputers '09 in Portland? http://www.garlic.com/~lynn/2009q.html#27 Supercomputers Are Still Fast, but Less Super http://www.garlic.com/~lynn/2009q.html#33 Check out Computer glitch to cause flight delays across U.S. as previously mentioned ... some connection between *cluster scaleup* and *electronic commerce* ... reference to jan92 meeting about cluster scaleup for commercial dataprocessing http://www.garlic.com/~lynn/95.html#13 two of the people in that meeting later leave and show up at small client/server startup responsible for something called "commerce server" and we were brought in because they wanted to do payment transactions on the server. The "commerce server" started out as collection of servers providing a multi-store "mall" paradigm implemented with large oracle dbms backend. the startup had also invented this technology called "SSL" they wanted to use. Now before we had left ... besides ha/cmp, FCS, HIPPI and misc. other stuff ... we also played some in SCI. Part of ha/cmp issue involved purity of "801" and simplified hardware ... and major 801 simplification theme was no cache coherency (ruling out shared memory multiprocessing). Since there was no cache coherency & multiprocessing .... that forced scaleup to purely interconnect solution. While doing electronic commerce ... we also talked some to convex about their examplar sci implementation (using HP risc chips). Then HP acquired both Verifone (a major point-of-sale terminal vendor that was looking at moving into the electronic commerce value chain) and Convex ... and we spent some amount of time at HP, talking to the respective responsible executives. -- 40+yrs virtualization experience (since Jan68), online at home since Mar1970
From: "Andy "Krazy" Glew" on 4 Jan 2010 08:42
> You could use the provided hardware scatter-gather if you were astute > enough to use InfiniBand interconnect. :-) > > del > > you can lead a horse to water but you can't make him give up ethernet. Del: What's the stoy on Infiniband? |