From: Steven G. Johnson on 23 Apr 2008 00:30 On Apr 22, 12:14 pm, Martin Brown <|||newspam...(a)nezumi.demon.co.uk> wrote: > Steven G. Johnson wrote: > > 2^N is only a "lot" simpler to code if you don't care about > > performance. If you care about performance, a textbook radix-2 > > implementation is worthless (5 to 40 times slower than the fastest > > FFTs, depending on N), you have to use larger radices anyway, > > Yes. Although the best optimising compilers do seem to have narrowed the > differences a bit. Compact inner loop code can help. Incidentally do you > find the split radix 16 length transform worth the effort of coding? The factor of 5--40 that I quoted was based on current optimizing compilers on recent machines (the order of magnitude variation is for different N). Compiler differences have an impact, but it is usually on the order of 30%, not factors of 5 (except perhaps for a few special cases like dense-matrix operations for which compilers have been specifically tuned). I'm not exactly sure what you mean by "split radix 16". Traditionally, "split radix" refers to a blending of radices 2 and 4 (although there have been some recent generalizations by Swamy et al. to higher radices while keeping the same operation count). In any case, we use a variant of split radix in generating FFTW's codelets (the hard-coded transforms for the FFT base cases and the radix butterflies), but it is not used for the decomposition of general N. (We typically find large radices to be faster, e.g. radix 16--64 is typical, and for very large N > 2^19 or so we typically do one step of radix-sqrt(N).) More detail can be found in the papers on www.fftw.org. Steven
From: Martin Brown on 23 Apr 2008 05:57 Steven G. Johnson wrote: > On Apr 22, 12:14 pm, Martin Brown <|||newspam...(a)nezumi.demon.co.uk> > wrote: >> Steven G. Johnson wrote: >>> 2^N is only a "lot" simpler to code if you don't care about >>> performance. If you care about performance, a textbook radix-2 >>> implementation is worthless (5 to 40 times slower than the fastest >>> FFTs, depending on N), you have to use larger radices anyway, >> Yes. Although the best optimising compilers do seem to have narrowed the >> differences a bit. Compact inner loop code can help. Incidentally do you >> find the split radix 16 length transform worth the effort of coding? > > The factor of 5--40 that I quoted was based on current optimizing > compilers on recent machines (the order of magnitude variation is for > different N). Compiler differences have an impact, but it is usually > on the order of 30%, not factors of 5 (except perhaps for a few > special cases like dense-matrix operations for which compilers have > been specifically tuned). > > I'm not exactly sure what you mean by "split radix 16". > Traditionally, "split radix" refers to a blending of radices 2 and 4 > (although there have been some recent generalizations by Swamy et al. > to higher radices while keeping the same operation count). > > In any case, we use a variant of split radix in generating FFTW's > codelets (the hard-coded transforms for the FFT base cases and the > radix butterflies), but it is not used for the decomposition of > general N. (We typically find large radices to be faster, e.g. radix > 16--64 is typical, and for very large N > 2^19 or so we typically do > one step of radix-sqrt(N).) More detail can be found in the papers on > www.fftw.org. Thank you for an interesting insight into FFTW's inner workings. I will take a look at these papers and your most recent code (although I have a feeling the MS C++ compiler won't optimise it quite so well as GCC). Regards, Martin Brown ** Posted from http://www.teranews.com **
From: russ lyttle on 4 May 2008 13:50 MooseFET wrote: > On Apr 28, 9:58 am, John Larkin > <jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote: >> On Mon, 28 Apr 2008 11:05:30 -0400, Fred Bloggs <nos...(a)nospam.com> >> wrote: >> >> >> >>> Michael A. Terrell wrote: >>>> No, you're thinking of electrolytic rectifiers. >>> You don't know anything about electronics and you're too stupid to >>> learn, so why do you post here? You have very little comprehension of >>> the technical material, so reading can't be it. >> You don't know about electrolytic rectifiers? Or their optical >> properties? > > They also have some strange RF properties. > > 100K D1 > RF ----/\/\------+----------+--->!-- Scope > ! ! > === C1 ) > ! ) L1 > ! ) > GND ! > GND > > C1 is a variable capacitor, L1 is a coil that resonates in the AM band > when used with C1 and D1 was an electrolytic rectifier. > > This is from distand memory: > > D1 is constructed by taking a bit of sheet lead and sheet aluminum and > spacing them a small distance apart with a few bits of ceramic. A > mixture of water and sodiumbicarbonate was poured over the metal and > the air bubbles shaken out. This made a device that did rectify > somewhat at voltages down near a volt. > > The intent was to make an AM radio but what sound I did get from it > was really awful and very faint. A scope showed that with a fairly > clean 1MHz coming in, if you tuned the capacitor up and down you would > find a place where the output jumped about 20mV quite suddenly. > >> John > I built one like that too. L1 was hand wound on the core from a roll of TP. The ear piece from a broken hearing aid was the speaker. The Scope showed about 50% efficiency in the rectifier. It worked pretty good for 1955. At least the local station came through
From: Nils on 4 May 2008 16:15 Steven G. Johnson schrieb: > (Someday, perhaps, caches will use a proper hash of all the address > bits and most of this wild variation will go away.) I would be happy with an instruction that converts array-indices into morton-order. Doing the same in software sucks because you end up with a huge chain of dependent instructions. You can directly do most arithmetic in morton-order, but you end up writing arithmetic routines that are unmaintainable. With hardware-support we would get rid of this spikey behaviour and improve the overall cache locality of matrix-memory accesses and similar things down to a minimum. Random acceses would still be slow, but anything that as even a hint of locality would benefit a lot. Technically it has the same cost as any logic instruction because all it does is muxing two or three integers together. The graphic guys do this for years to improve the efficency of their texture-caches. The database-guys do it since the 70th to remove paging from disk, but for some reason the major CPU-vendors haven't yet found out that their data-caches would benefit from it a lot as well. Or maybe they do know it but don't implement the instruction because C and Fortran can't deal with it. The TI C64 DSP-family can do that for 2D and 3D array-indexing. Big Kudos for adding the instructions. It improves my code that accesses large two-dimensional arrays by 660% in the average case. Nils
From: Nils on 4 May 2008 16:27
russ lyttle schrieb: > I built one like that too. L1 was hand wound on the core from a roll of > TP. The ear piece from a broken hearing aid was the speaker. The Scope > showed about 50% efficiency in the rectifier. It worked pretty good for > 1955. At least the local station came through Ha! I even remember that the coil need 64 windings for optimal performance. That circuit was my first radio that I've build with my father in the late 70th. And it worked quite well *) Nils *) I must admit we've been living just a couple of miles away from a very strong AM transmitter, so all it took for us to receive radio was to pick up a diode with the left hand and point it into the sky.... All devices received AM including the telephone and the home-stereo. Troubleshooting analog-cirucits was easy as well. Put your finger on the basis of *any* transistor in the circuit. If you can't hear radio the transistor was dead. |