From: Ross on 23 Sep 2009 07:20 Dear all, Does anyone know of any research into the roles of phase and amplitude of frequency domain representations of sound in terms of human perception of timbres. In images, you can take the Fourier transform of two images. You then use the amplitude information from one image, and the phase information from the other. The image that results from the inverse Fourier transform of this mixed data looks pretty strange as you'd expect, but you see more of the image the phase information came from than the other, suggesting that in images phase information dominates over amplitude information. My wild guess is that for static audio timbres the opposite is true, but I would very much like to check this out properly. Any ideas/ references/pointers? I'm guessing that I'm probably asking in the wrong group, but don't know where to ask. Any recommendations of other places to ask would be greatly appreciated.
From: Rune Allnor on 23 Sep 2009 07:27 On 23 Sep, 13:20, Ross <rossclem...(a)gmail.com> wrote: > Dear all, > > Does anyone know of any research into the roles of phase and amplitude > of frequency domain representations of sound in terms of human > perception of timbres. From http://en.wikipedia.org/wiki/Timbre: " Timbre has been called ... "the psychoacoustician's multidimensional wastebasket category for everything that cannot be qualified as pitch or loudness." " Rune
From: Richard Dobson on 23 Sep 2009 08:28 Ross wrote: > Dear all, > > Does anyone know of any research into the roles of phase and amplitude > of frequency domain representations of sound in terms of human > perception of timbres. > > In images, you can take the Fourier transform of two images. You then > use the amplitude information from one image, and the phase > information from the other. The image that results from the inverse > Fourier transform of this mixed data looks pretty strange as you'd > expect, but you see more of the image the phase information came from > than the other, suggesting that in images phase information dominates > over amplitude information. > > My wild guess is that for static audio timbres the opposite is true, > but I would very much like to check this out properly. Any ideas/ > references/pointers? > > I'm guessing that I'm probably asking in the wrong group, but don't > know where to ask. Any recommendations of other places to ask would be > greatly appreciated. You will find lots of interest in this question in the musicdsp list. The significance of phase is pretty well canonical in audio processing, with respect to all and any combinations of sounds. Processes such as phasers and flangers combine wet and dry sounds to produce dynamic cancellation effects. There is a pretty direct audio counterpart to your image example in various techniques of cross-synthesis, hybridising and morphing of sounds. The simplest example is phase-vocoder processing where the bin amplitudes of one sound are combined with the frequency values of another. Most of the famous "problems" of the phase vocoder arise through the smearing of phase between bins. Phase relationships (not least, the preservation of them) is also central to most multi-channel production, in either preserving or modifying the "stereo image". So in the general case audio applications seek either to preserve phase relationships, or deliberately distort/modify them. Human perception of timbre is a slightly different topic; it is generally asserted that we are insensitive to (static) phase - you can scramble the phases (while keeping amplitudes the same) of the partials of, say, a square wave or sawtooth wave and the listener will not notice (though needless to say there are those who claim they can distinguish them). So in broad terms your guess is correct. The general principle is that our ears are drawn to anything changing (which of course is what we experience most of the time); addition/removal of partials, and changing phase relationships. The challenge of the subject from a research point of view is that our hearing tends to be "categorical" - given a transformation (e.g. in morphing), our perception tends to lock on one recognition until a certain point where it flips to another; somewhat akin to the famous optical illusions where we flip from seeing a vase to seeing a face, etc. In the audio case, this tends to apply even during a nominally smooth transformation. See (among other references) "Auditory Scene Analysis" by Albert S Bregman. See also the work of Diana Deutsch (http://deutsch.ucsd.edu); especially "The Psychology of Music", which discusses auditory illusions, among many other things. And: "Music, Cognition, and Computerized Sound", Perry Cook. The main sound synthesis lists will also be sources of rich and informed discussions, e.g. for PD, Csound, Max/Msp, Supercollider, etc. Richard Dobson
|
Pages: 1 Prev: changing sampling rate of filter coeficients Next: Regarding Digital NCO design |