Prev: Scilab and DSP
Next: Separating Harmonic Spectra
From: Steven G. Johnson on 27 Nov 2007 12:24 On Nov 24, 9:45 am, Richard Owlett <rowl...(a)atlascomm.net> wrote: > In "Chapter 12 - The Fast Fourier Transform / FFT Programs" of _The > Scientist and Engineer's Guide to Digital Signal Processing_ > Smith says "There are also FFT routines that completely eliminate the > bit reversal sorting." > > Could someone point me to a description of one, preferably with BASIC or > FORTRAN sample code. A number of references are given in the Wikipedia article on the Cooley-Tukey FFT algorithm. http://en.wikipedia.org/wiki/Cooley-Tukey_FFT_algorithm The Cooley-Tukey FFT (the most common algorithm), along with other FFT algorithms for that matter, intrinsically requires some sort of re- ordering, in the sense that its operations map consecutive inputs to non-consecutive outputs and vice-versa. The most well-known approach is to do the re-ordering all at once, with a single bit-reversal pass (or, more generally, a mixed-base digit reversal). Another approach that still works in-place is to arrange the sequence of radices so that one can do a sequence of small transpositions interleaved with the transform (alternating DIT/DIF stages work well here). The simplest approaches work out-of-place. e.g. the Stockham algorithm goes back and forth between two arrays with each pass, transposing one digit at a time. The easiest approach, in my mind, is simply to do the algorithm recursively out-of-place; if you directly transcribe the Cooley-Tukey algorithm from the mathematical description, working recursively, all of the outputs are "automatically" put in the right place. (This was the approach we used in the original FFTW; the current version of FFTW implements a variety of algorithms as described in our Proc. IEEE paper.) For example, here is a very simple 15-line radix-2 FFT, working recursively and out-of-place, that does not require any separate bit- reversal pass. (To make it reasonably fast, one would want to use a larger base case than the size-1 base-case used here. It also uses a relatively inaccurate trigonometric recurrence. But of course, if you were actually interested in such practical issues you should be using a pre-existing FFT library.) #include <complex.h> #define TWOPI 6.2831853071795864769252867665590057683943388 /* call with rec_fft(+1 or -1, N, input, output, 1, 1), where input and output are arrays of length N ... requires C99 complex-number support */ void rec_fft(int sign, int N, complex double *x, complex double *y, int is, int os) { if (N == 1) *y = *x; else { int N2 = N/2; rec_fft(sign, N2, x, y, is*2, os); rec_fft(sign, N2, x+is, y+os*N2, is*2, os); complex double omega = 1.0, omega1 = cexp(sign * I * TWOPI / N); for (int i = 0; i < N2; ++i) { complex double a = y[os*i], b = omega * y[os*(i + N2)]; y[os*i] = a + b; y[os*(i + N2)] = a - b; omega *= omega1; } } } Regards, Steven G. Johnson PS. Numerical Recipes, which several other posters have suggested, is not the best reference for information beyond the textbook radix-2 algorithm. Its advice on more sophisticated FFT algorithms is at least 20 years out of date, to the extent that it was ever true.
From: Steven G. Johnson on 27 Nov 2007 12:35 On Nov 24, 11:30 pm, robert bristow-johnson <r...(a)audioimagination.com> wrote: > there are drawings of it in Oppenheim and Schafer (any era of O&S) for > the case of N=8. i think i can see a pattern in them but, they do not > appear to me to be as efficient, for most instruction sets, as the > simple FFT where you have to bit reverse before or after. you need to > maintain many more pointers and i'm not completely sure how you > manipulate them all for the different FFT passes. To the contrary, a separate bit-reversal pass means that you have an additional "pass" over your data, which can have a non-negligible impact on performance (if the rest of your FFT is fast...if you are using vanilla radix-2 it doesn't make much difference). For large N, the problem is the cache impact. For small N, e.g. N=8, you should just read everything into registers, compute the transform, and write it back out in the correct order (or whatever order you want). Even for N larger than the cache (or registers, which can be viewed as a kind of cache), the goal is to do as much as possible with the data while it is in cache, and avoid going in and out of the cache as much as possible. One point of potential confusion: the FFT variants that do not require bit-reversal are not really different "algorithms". All of the ones being referred to here are still variants of the same Cooley-Tukey factorization of the DFT, with the same number of arithmetic operations. The only difference lies in where/when data are stored/ accessed in memory. Regards, Steven G. Johnson
From: robert bristow-johnson on 27 Nov 2007 13:46 On Nov 27, 12:35 pm, "Steven G. Johnson" <stev...(a)alum.mit.edu> wrote: > On Nov 24, 11:30 pm, robert bristow-johnson > > <r...(a)audioimagination.com> wrote: > > there are drawings of it in Oppenheim and Schafer (any era of O&S) for > > the case of N=8. i think i can see a pattern in them but, they do not > > appear to me to be as efficient, for most instruction sets, as the > > simple FFT where you have to bit reverse before or after. you need to > > maintain many more pointers and i'm not completely sure how you > > manipulate them all for the different FFT passes. > > To the contrary, a separate bit-reversal pass means that you have an > additional "pass" over your data, which can have a non-negligible > impact on performance (if the rest of your FFT is fast...if you are > using vanilla radix-2 it doesn't make much difference). For large N, > the problem is the cache impact. For small N, e.g. N=8, you should > just read everything into registers, compute the transform, and write > it back out in the correct order (or whatever order you want). Even > for N larger than the cache (or registers, which can be viewed as a > kind of cache), the goal is to do as much as possible with the data > while it is in cache, and avoid going in and out of the cache as much > as possible. > > One point of potential confusion: the FFT variants that do not require > bit-reversal are not really different "algorithms". All of the ones > being referred to here are still variants of the same Cooley-Tukey > factorization of the DFT, with the same number of arithmetic > operations. The only difference lies in where/when data are stored/ > accessed in memory. Steven, what's different (in terms of code) is the pointer arithmetic. doing all sorts of goofy pointer arithmetic might be more costly, in the aggregate throughout all of the FFT passes, than a single bit-reversing pass at the beginning or end. the normal in- place FFT should do better cache-wise than these normal-in, normal-out routines because the top (or bottom) of the butterfly will share pages of memory with the top (or bottom) of the next butterfly. and in- place should do better cache-wise than an indexing scheme where two input nodes get read in and crunched and then written to two totally different locations for the output (and this cannot be in place, you would have to have two buffers, each of size N, and ping-pong back and forth between the two for each pass). in addition, as i tried to point out earlier, if you're doing something like fast-convolution using the FFT, then if your forward FFT leaves its results in bit-reversing order and your inverse FFT takes its input in bit reversed order (and leaves the results in normal order), then no bit reversing will be necessary. (the transfer function, besides being pre-computed, will also be pre bit-reversed.) r b-j
From: SteveSmith on 27 Nov 2007 13:50 Hi Richard, I haven't seen these alternate FFTs in years, but I can look around if you have a need. However, they are just as cryptic and difficult to understand as the conventional ones. Also, I doubt that you could modify one to use square waves. It sounds like you should be using the “DFT by Correlation” method. See page 157 in Chapter 8 of my on-line book (www.DSPguide.com). This can directly be modified to use square waves. Good luck and let me know if I can help. Regards, Steve Steve.Smith1(a)SpectrumSDI.com
From: glen herrmannsfeldt on 27 Nov 2007 14:30
Steven G. Johnson wrote: (snip regarding bit-reversal in FFTs) > The most well-known approach is to do the re-ordering all at once, > with a single bit-reversal pass (or, more generally, a mixed-base > digit reversal). Another approach that still works in-place is to > arrange the sequence of radices so that one can do a sequence of small > transpositions interleaved with the transform (alternating DIT/DIF > stages work well here). The simplest approaches work out-of-place. > e.g. the Stockham algorithm goes back and forth between two arrays > with each pass, transposing one digit at a time. I still remember a Fortran FFT subroutine from long ago that did bit reversal using nested DO loops. Depending on N (the number of points) it would (illegally) GOTO the appropriate nested DO and (maybe) GOTO out at the appropriate point. Not very portable, but it must have worked on at least one machine. -- glen |