Prev: PEEEEEEP
Next: Texture units as a general function
From: nmm1 on 3 Jan 2010 15:32 In article <N82dnaYlXssoZt3WnZ2dnUVZ_vJi4p2d(a)bestweb.net>, Mayan Moudgill <mayan(a)bestweb.net> wrote: > >> In general, it doesn't, but it more-or-less does for the one you are >> doing - which is NOT the way to do multi-dimensional FFTs! It is >> almost always much faster to transpose, so you are doing vector >> operations in the FFT at each stage. > >The initial bit-reversal is trivial to parallelize (at least for 1D >FFTs). It will (at most) involve a single copy of the array over the >entire network. Eh? If you mean the actual bit-reversal of the size, then it's irrelevant, and doesn't need access to the data! The issue isn't a single distribution of the FFT (unless the FFT is too large for a single node, as is common with multi-dimensional FFTs). It's the access pattern. The native access pattern of FFTs is close to pessimal on modern cached (and even paged) machines. Regards, Nick Maclaren.
From: Mayan Moudgill on 3 Jan 2010 15:41 nmm1(a)cam.ac.uk wrote: > In article <N82dnaYlXssoZt3WnZ2dnUVZ_vJi4p2d(a)bestweb.net>, > Mayan Moudgill <mayan(a)bestweb.net> wrote: > >>>In general, it doesn't, but it more-or-less does for the one you are >>>doing - which is NOT the way to do multi-dimensional FFTs! It is >>>almost always much faster to transpose, so you are doing vector >>>operations in the FFT at each stage. >> >>The initial bit-reversal is trivial to parallelize (at least for 1D >>FFTs). It will (at most) involve a single copy of the array over the >>entire network. > > > Eh? If you mean the actual bit-reversal of the size, then it's > irrelevant, and doesn't need access to the data! > > The issue isn't a single distribution of the FFT (unless the FFT is > too large for a single node, as is common with multi-dimensional FFTs). > It's the access pattern. The native access pattern of FFTs is close > to pessimal on modern cached (and even paged) machines. > I think we may be saying the same thing. The initial phase of an FFT consists of a transpose (based on the so-called bit-reversal transpose). Then, the rest of the FFT consists of lgN steps of the form: for( j = 0; j < N/(2*2**S); j++ ) { for( i = 0; i < 2**S; i++ ) { x[i], x[i+L] = fft_bfly(x[i], x[i+L]) } } where S is the step number, which is perfectly vectorizable. The initial transpose: x[ bitreverse(i) ] = x[i] can also be made vectorizable, basically by tiling.
From: Robert Myers on 3 Jan 2010 16:03 On Jan 3, 2:51 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote: > > Anyway, Nick, you're right that things like FFTs are network bandwidth > limited; however, it is still possible to get fairly good speedups. > > Of course, I haven't analyzed the other approaches; e.g. do the last > lg2M stages on one processor (so that there is only one communication > phase) or other points in the parallelism space. Publish! But try it out on Blue Gene first. Robert.
From: =?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= on 3 Jan 2010 16:14 Robert Myers <rbmyersusa(a)gmail.com> wrote: > On Jan 3, 12:34 pm, "Del Cecchi" <delcec...(a)gmail.com> wrote: > > > You keep this up and I will have to dust off thunderbird for comp.arch > > posting since it seems to get the quoting/attribution correct even for > > robert's posts from google, unlike oe. This will eliminate the > > concern with people putting robert's words in my mouth as above. > > I believe that the problem is that Google groups posts in html, rather > than in plain text. There is a plain text option for gmail, but not > for Google groups. Your posts are not in HTML. I can't see anything that should trip up a newsreader. -- Mvh./Regards, Niels J�rgen Kruse, Vanl�se, Denmark
From: Robert Myers on 3 Jan 2010 16:26
On Jan 3, 3:16 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote: > Robert Myers wrote: > > I assume that most who buy installations like Blue Gene would have RAS > > requirements that would be hard or impossible to meet with a Beowulf > > cluster. In the end, it's probably RAS that rules. > > What kind of recovery are you looking for? Against node failures? > > I'd suggest that distributed checkpointing with restart on failures > could be an adequate model for small-to-mid-size clusters. It's the entire management problem: backup, recovery from failure, detecting and isolating failure, minimizing downtime, and minimizing time and focus required for maintenance. There are, indeed, plausible and relatively off-the-shelf solutions for small-to-mid-size clusters, but not for a cluster the size of the Blue Gene installations I know about. Robert. |