separating speech and music in a sound file [Matlab]

Prev: Sinusoidal response
Next: standard deviation

From: gospelmic gini on 28 Jun 2010 06:21

i want to separate the speech/voice from the background music in a sound file(.wav file) using fourier transform. i tried to modify my sound signal by removing certain values from the fft of the wav file but the inverse came out to be complex to which i applied abs() n got real values. this corrupted my sound file and i got a weird sound added with some disturbance. can anyone tell me how shall i proceed??

From: Wayne King on 28 Jun 2010 06:53

"gospelmic gini" <gospelmic(a)gmail.com> wrote in message <i09t30$65b$1(a)fred.mathworks.com>...
> i want to separate the speech/voice from the background music in a sound file(.wav file) using fourier transform. i tried to modify my sound signal by removing certain values from the fft of the wav file but the inverse came out to be complex to which i applied abs() n got real values. this corrupted my sound file and i got a weird sound added with some disturbance. can anyone tell me how shall i proceed??

Hi, if the music and speech have overlapping regions of spectral content (which I suspect they do), I don't see how you can separate them just based on the Fourier transform. You can certainly remove some of the music contribution by lowpass, or bandpass filtering. The music will have energy distributed into higher frequencies than the speech signal.

If you want to use Fourier methods, than you have to be careful about just removing certain DFT coefficients. You can't break the conjugate symmetry of the DFT and expect to have a real-valued signal when you take the inverse Fourier transform.

One thing you can try is a bandpass filter from say 100 Hz to 4 kHz. I don't know what your sampling frequency is so you'll need to know that. For speech, you may be able to limit this further to a bandpass filter from about 300-3000 Hz. But keep in mind, if the music has significant energy in that frequency region, your filter will pass that as well. See the help for fdesign.bandpass and fdesign.lowpass.

Usually in these applications where the spectral content of the "noise" substantially overlaps with the spectral content of the "signal", it's best if the two sources are spatially separated and you can sample them with multiple sensors. Then you can use spatial filtering techniques like beamforming to improve the SNR.

Wayne

From: ImageAnalyst on 28 Jun 2010 08:01

gospelmic gini:
Did you remove (erase) the frequencies symmetrically in both the real
and imaginary parts of the spectrum?

You might want to try using Independent Component Analysis
http://www.cnl.salk.edu/~tony/ica.html
http://www.cis.hut.fi/projects/ica/icademo/

|
Pages: 1
Prev: Sinusoidal response
Next: standard deviation