Circular convolution optimization [DSP]

Prev: correlation of two identical sinuosiduals
Next: nonstationary autocorrelation but stationary PSD

From: jungledmnc on 17 May 2010 09:58

Hi there. I'm going to implement circular convolution with zero latency,
however here I'll simplify it to say 1024 points latency, because I need to
understand where is the problem.

So let's say we have an extremely long kernel, like 1000 * 1024 points. Now
this is the input:
i1 i2 i3 .. i1000 (each 1024 points long).

And this is the kernel:
k1 k2 k3 .. k1000 (each 1024 points long).

To perform convolution we can zero pad each input and kernel block to 2048
samples and:
out1 = idft( dft(i1 zero padded) * dft(k1 zero padded) )

And this is the result of convolution for input i1000:
finalout1000 = out1 + out2 + .. + out1000 + leakagefromfinalout999

We can obviously precalculate dft's for k1 .. k1000. But we can also
calculate input dft's just once, when processing realtime,
because what is now i1000, will soon be i999, and then i998 etc., right?

And here's my dilemma - I think we can also calculate output idft once. Why
summing out1+out2.., when we can sum it in frequency domain? So, is this
true: ?

finalout1000 = idft(
dft(i1 zero padded) * dft(k1 zero padded) +
dft(i2 zero padded) * dft(k2 zero padded) +
..
dft(i1000 zero padded) * dft(k1000 zero padded)) +
leakagefromfinalout999

If that would be true, the convolution could be done too quickly I believe,
since FFT is the most time-consuming task here. But why it is wrong?

Thank in advance.

From: Andor on 17 May 2010 10:12

On 17 Mai, 15:58, "jungledmnc" <jungledmnc(a)n_o_s_p_a_m.gmail.com>
wrote:
> Hi there. I'm going to implement circular convolution with zero latency,
> however here I'll simplify it to say 1024 points latency, because I need to
> understand where is the problem.
>
> So let's say we have an extremely long kernel, like 1000 * 1024 points. Now
> this is the input:
> i1 i2 i3 .. i1000 (each 1024 points long).
>
> And this is the kernel:
> k1 k2 k3 .. k1000 (each 1024 points long).
>
> To perform convolution we can zero pad each input and kernel block to 2048
> samples and:
> out1 = idft( dft(i1 zero padded) * dft(k1 zero padded) )
>
> And this is the result of convolution for input i1000:
> finalout1000 = out1 + out2 + .. + out1000 + leakagefromfinalout999
>
> We can obviously precalculate dft's for k1 .. k1000. But we can also
> calculate input dft's just once, when processing realtime,
> because what is now i1000, will soon be i999, and then i998 etc., right?
>
> And here's my dilemma - I think we can also calculate output idft once. Why
> summing out1+out2.., when we can sum it in frequency domain? So, is this
> true: ?
>
> finalout1000 = idft(
> dft(i1 zero padded) * dft(k1 zero padded) +
> dft(i2 zero padded) * dft(k2 zero padded) +
> ..
> dft(i1000 zero padded) * dft(k1000 zero padded)) +
> leakagefromfinalout999
>
> If that would be true, the convolution could be done too quickly I believe,
> since FFT is the most time-consuming task here. But why it is wrong?
>
> Thank in advance.

I'm not quite sure what your question is, but you might find answers
here:

http://pcfarina.eng.unipr.it/Public/AES-113/

Regards,
Andor

From: jungledmnc on 17 May 2010 11:54

Aaaa, thanks a lot! It's in the Microsoft patent!

|
Pages: 1
Prev: correlation of two identical sinuosiduals
Next: nonstationary autocorrelation but stationary PSD