From: Steven G. Johnson on
On Apr 22, 12:14 pm, Martin Brown <|||newspam...(a)nezumi.demon.co.uk>
wrote:
> Steven G. Johnson wrote:
> > 2^N is only a "lot" simpler to code if you don't care about
> > performance. If you care about performance, a textbook radix-2
> > implementation is worthless (5 to 40 times slower than the fastest
> > FFTs, depending on N), you have to use larger radices anyway,
>
> Yes. Although the best optimising compilers do seem to have narrowed the
> differences a bit. Compact inner loop code can help. Incidentally do you
> find the split radix 16 length transform worth the effort of coding?

The factor of 5--40 that I quoted was based on current optimizing
compilers on recent machines (the order of magnitude variation is for
different N). Compiler differences have an impact, but it is usually
on the order of 30%, not factors of 5 (except perhaps for a few
special cases like dense-matrix operations for which compilers have
been specifically tuned).

I'm not exactly sure what you mean by "split radix 16".
Traditionally, "split radix" refers to a blending of radices 2 and 4
(although there have been some recent generalizations by Swamy et al.
to higher radices while keeping the same operation count).

In any case, we use a variant of split radix in generating FFTW's
codelets (the hard-coded transforms for the FFT base cases and the
radix butterflies), but it is not used for the decomposition of
general N. (We typically find large radices to be faster, e.g. radix
16--64 is typical, and for very large N > 2^19 or so we typically do
one step of radix-sqrt(N).) More detail can be found in the papers on
www.fftw.org.

Steven
From: Martin Brown on
Steven G. Johnson wrote:
> On Apr 22, 12:14 pm, Martin Brown <|||newspam...(a)nezumi.demon.co.uk>
> wrote:
>> Steven G. Johnson wrote:
>>> 2^N is only a "lot" simpler to code if you don't care about
>>> performance. If you care about performance, a textbook radix-2
>>> implementation is worthless (5 to 40 times slower than the fastest
>>> FFTs, depending on N), you have to use larger radices anyway,
>> Yes. Although the best optimising compilers do seem to have narrowed the
>> differences a bit. Compact inner loop code can help. Incidentally do you
>> find the split radix 16 length transform worth the effort of coding?
>
> The factor of 5--40 that I quoted was based on current optimizing
> compilers on recent machines (the order of magnitude variation is for
> different N). Compiler differences have an impact, but it is usually
> on the order of 30%, not factors of 5 (except perhaps for a few
> special cases like dense-matrix operations for which compilers have
> been specifically tuned).
>
> I'm not exactly sure what you mean by "split radix 16".
> Traditionally, "split radix" refers to a blending of radices 2 and 4
> (although there have been some recent generalizations by Swamy et al.
> to higher radices while keeping the same operation count).
>
> In any case, we use a variant of split radix in generating FFTW's
> codelets (the hard-coded transforms for the FFT base cases and the
> radix butterflies), but it is not used for the decomposition of
> general N. (We typically find large radices to be faster, e.g. radix
> 16--64 is typical, and for very large N > 2^19 or so we typically do
> one step of radix-sqrt(N).) More detail can be found in the papers on
> www.fftw.org.

Thank you for an interesting insight into FFTW's inner workings.

I will take a look at these papers and your most recent code (although I
have a feeling the MS C++ compiler won't optimise it quite so well as GCC).

Regards,
Martin Brown
** Posted from http://www.teranews.com **
From: russ lyttle on
MooseFET wrote:
> On Apr 28, 9:58 am, John Larkin
> <jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote:
>> On Mon, 28 Apr 2008 11:05:30 -0400, Fred Bloggs <nos...(a)nospam.com>
>> wrote:
>>
>>
>>
>>> Michael A. Terrell wrote:
>>>> No, you're thinking of electrolytic rectifiers.
>>> You don't know anything about electronics and you're too stupid to
>>> learn, so why do you post here? You have very little comprehension of
>>> the technical material, so reading can't be it.
>> You don't know about electrolytic rectifiers? Or their optical
>> properties?
>
> They also have some strange RF properties.
>
> 100K D1
> RF ----/\/\------+----------+--->!-- Scope
> ! !
> === C1 )
> ! ) L1
> ! )
> GND !
> GND
>
> C1 is a variable capacitor, L1 is a coil that resonates in the AM band
> when used with C1 and D1 was an electrolytic rectifier.
>
> This is from distand memory:
>
> D1 is constructed by taking a bit of sheet lead and sheet aluminum and
> spacing them a small distance apart with a few bits of ceramic. A
> mixture of water and sodiumbicarbonate was poured over the metal and
> the air bubbles shaken out. This made a device that did rectify
> somewhat at voltages down near a volt.
>
> The intent was to make an AM radio but what sound I did get from it
> was really awful and very faint. A scope showed that with a fairly
> clean 1MHz coming in, if you tuned the capacitor up and down you would
> find a place where the output jumped about 20mV quite suddenly.
>
>> John
>
I built one like that too. L1 was hand wound on the core from a roll of
TP. The ear piece from a broken hearing aid was the speaker. The Scope
showed about 50% efficiency in the rectifier. It worked pretty good for
1955. At least the local station came through
From: Nils on
Steven G. Johnson schrieb:
> (Someday, perhaps, caches will use a proper hash of all the address
> bits and most of this wild variation will go away.)

I would be happy with an instruction that converts array-indices into
morton-order. Doing the same in software sucks because you end up with a
huge chain of dependent instructions.

You can directly do most arithmetic in morton-order, but you end up
writing arithmetic routines that are unmaintainable.

With hardware-support we would get rid of this spikey behaviour and
improve the overall cache locality of matrix-memory accesses and similar
things down to a minimum. Random acceses would still be slow, but
anything that as even a hint of locality would benefit a lot.


Technically it has the same cost as any logic instruction because all it
does is muxing two or three integers together.

The graphic guys do this for years to improve the efficency of their
texture-caches. The database-guys do it since the 70th to remove paging
from disk, but for some reason the major CPU-vendors haven't yet found
out that their data-caches would benefit from it a lot as well. Or maybe
they do know it but don't implement the instruction because C and
Fortran can't deal with it.

The TI C64 DSP-family can do that for 2D and 3D array-indexing. Big
Kudos for adding the instructions. It improves my code that accesses
large two-dimensional arrays by 660% in the average case.

Nils
From: Nils on
russ lyttle schrieb:

> I built one like that too. L1 was hand wound on the core from a roll of
> TP. The ear piece from a broken hearing aid was the speaker. The Scope
> showed about 50% efficiency in the rectifier. It worked pretty good for
> 1955. At least the local station came through

Ha!

I even remember that the coil need 64 windings for optimal performance.

That circuit was my first radio that I've build with my father in the
late 70th. And it worked quite well *)

Nils


*) I must admit we've been living just a couple of miles away from a
very strong AM transmitter, so all it took for us to receive radio was
to pick up a diode with the left hand and point it into the sky....

All devices received AM including the telephone and the home-stereo.
Troubleshooting analog-cirucits was easy as well. Put your finger on the
basis of *any* transistor in the circuit. If you can't hear radio the
transistor was dead.