Prev: PEEEEEEP
Next: Texture units as a general function
From: Mayan Moudgill on 3 Jan 2010 16:40 Robert Myers wrote: > On Jan 3, 3:16 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote: > > >>Robert Myers wrote: >> >>>I assume that most who buy installations like Blue Gene would have RAS >>>requirements that would be hard or impossible to meet with a Beowulf >>>cluster. In the end, it's probably RAS that rules. >> >>What kind of recovery are you looking for? Against node failures? >> >>I'd suggest that distributed checkpointing with restart on failures >>could be an adequate model for small-to-mid-size clusters. > > > It's the entire management problem: backup, recovery from failure, > detecting and isolating failure, minimizing downtime, and minimizing > time and focus required for maintenance. > > There are, indeed, plausible and relatively off-the-shelf solutions > for small-to-mid-size clusters, but not for a cluster the size of the > Blue Gene installations I know about. > I thought the whole point was that you wanted to do physics that would replicate BlueGene's results at a price you (or a physics department) could afford? If you're saying that the only thing that can do the job is BlueGene-style petaflop computers, then there are no alternatives - you're going to have to live with the results coming out of BlueGene. You can definitely complain about it, and I'm sure that it makes you feel better, but all that complaining not really productive, is it? Also, asking for new languages and architecture features is moot, since its clear that no new language or tweak to existing architectures is going to allow you to approach the computation level afforded by BlueGene-class machines.
From: Robert Myers on 3 Jan 2010 20:04 On Jan 3, 4:40 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote: > I thought the whole point was that you wanted to do physics that would > replicate BlueGene's results at a price you (or a physics department) > could afford? If you're saying that the only thing that can do the job > is BlueGene-style petaflop computers, then there are no alternatives - > you're going to have to live with the results coming out of BlueGene. > You can definitely complain about it, and I'm sure that it makes you > feel better, but all that complaining not really productive, is it? What makes me feel better is that, since I started in on these topics (all the big computers are behind barbed wire, linpack is meaningless as a measure of real-world performance, and people need to stop talking as if things that don't really scale do scale), the landscape has brightened considerably. NCSA (*not* the freaking DoE) is building computers to compete in performance with the national labs' pet toys, people are starting to emphasize that counting peak flops and linpack flops doesn't offer a very good measure of usefulness, and people are starting to emphasize bandwidth (to memory, network fabric, and to the Internet). Here's a really encouraging document: http://www.apan.net/meetings/kaohsiung2009/presentations/opening/kramer.pdf It's interesting, and discouraging, that the document mentions bisection bandwidth and its importance to the FFT, but doesn't offer a number. Do these things just magically happen? If my "complaining" has had no effect, I'm happy to see that the direction things are moving is exactly the direction I said they should. And, whadya know, Blue Waters is being built by IBM. > Also, asking for new languages and architecture features is moot, since > its clear that no new language or tweak to existing architectures is > going to allow you to approach the computation level afforded by > BlueGene-class machines. It isn't clear to me that that statement is true, since most HPC applications (with the linpack benchmark being a glaring exception) tend to be memory bound. In any case, my goal in looking for alternative languages and computing models was never to get blood out of a turnip, but rather to make the world safe for concurrent programming, something you have said in the past is essentially impossible. Robert.
From: James Van Buskirk on 3 Jan 2010 21:41 "Mayan Moudgill" <mayan(a)bestweb.net> wrote in message news:bu6dnVWcH_qenNzWnZ2dnUVZ_hSdnZ2d(a)bestweb.net... > I think we may be saying the same thing. The initial phase of an FFT > consists of a transpose (based on the so-called bit-reversal transpose). Not at all the case. Bit-reversal is not a true transpose. I think Nick is talking about http://www.jjj.de/fxt/fxtbook.pdf Section 19.10, the matrix Fourier algorithm. This stuff really does help reduce memory traffic. -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
From: Del Cecchi on 3 Jan 2010 22:58 <nmm1(a)cam.ac.uk> wrote in message news:hhqtpe$3gf$1(a)smaug.linux.pwf.cam.ac.uk... > In article <u9mdnb0HtsTSaN3WnZ2dnUVZ_gKdnZ2d(a)bestweb.net>, > Mayan Moudgill <mayan(a)bestweb.net> wrote: >> >>(I can't speak to 3D FFTs, so I'll restrict myself to Cooley-Tukey >>radix-2 1D FFT which I do have experience with, and hopefully the >>analysis will carry over) > > In general, it doesn't, but it more-or-less does for the one you are > doing - which is NOT the way to do multi-dimensional FFTs! It is > almost always much faster to transpose, so you are doing vector > operations in the FFT at each stage. > >>Communication can be overlapped with computation - ... > > I am afraid not, when you are using 'commodity clusters'. Firstly, > even with the current offloading of the TCP/IP stack, there is still > a lot of CPU processing needed to manage the transfer, and obviously > a CPU can't be doing an FFT while doing that. Secondly, you have > omitted the cost of the scatter/gather, which again has to be done > by the CPU. > >>Assume a 16-way machine lgM=4, P=1e-9 ns, B=1e9 B/s (assumes dual >>10GbE, >>fairly tuned stacks). >>For lg2N = 30 (N ~ 1G-points), we would end up with Ts = 32.2sec and >>Tp >>= 6.0sec, of which 4.3 ns was communication: speedup=5.33. >>For a 64-way, assuming the same numbers, we end up with Tp = 2.0sec, >>of >>which 1.6s is communication: speedup=16. >>For 256 way, we would end up with Tp=0.6sec, of which 0.5 sec is >>communication: speedup=51 > > Hmm. That's not the experience of the people I know who have tried > it. I haven't checked your calculations, so I am not saying whether > or not I agree with them. > >>Anyway, Nick, you're right that things like FFTs are network >>bandwidth >>limited; however, it is still possible to get fairly good speedups. > > For multi-dimensional FFTs, certainly. I remain doubtful that you > would get results anywhere near that good for single-dimensional > ones. I certainly know that they are notorious SMP-system killers, > and the speedups obtained by vendors' libraries are not very good. > > > Regards, > Nick Maclaren. You could use the provided hardware scatter-gather if you were astute enough to use InfiniBand interconnect. :-) del you can lead a horse to water but you can't make him give up ethernet.
From: Del Cecchi` on 3 Jan 2010 23:11
Robert Myers wrote: > On Jan 3, 3:16 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote: > > >>Robert Myers wrote: >> >>>I assume that most who buy installations like Blue Gene would have RAS >>>requirements that would be hard or impossible to meet with a Beowulf >>>cluster. In the end, it's probably RAS that rules. >> >>What kind of recovery are you looking for? Against node failures? >> >>I'd suggest that distributed checkpointing with restart on failures >>could be an adequate model for small-to-mid-size clusters. > > > It's the entire management problem: backup, recovery from failure, > detecting and isolating failure, minimizing downtime, and minimizing > time and focus required for maintenance. > > There are, indeed, plausible and relatively off-the-shelf solutions > for small-to-mid-size clusters, but not for a cluster the size of the > Blue Gene installations I know about. > > Robert. If you go to Google and type "blue gene ras" you get all sorts of interesting stuff. You might enjoy a little light reading at the IBM cluster information center http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.csm16010.install.doc/am7il_bluegeneapx.html and the Cluster Systems Management Library. Maybe all that attention to making stuff work is part of what justifies the high price? (thunderbird comes through) |