A partial FFT algorithm that is even faster than the build-in "fft" function [Matlab]

Prev: Problem with gath geva algorithm
Next: geometry profile in pde tool.

From: Matt J on 27 Mar 2010 09:25

Doug Schwarz <see(a)sig.for.address.edu> wrote in message <see-3B9FF0.08442127032010(a)news.frontiernet.net>...

> Amazingly, The MathWorks has anticipated your need and has provided a
> function to compute exactly the first element of the DFT. It is very
> fast. Use it like this:
>
> X0 = sum(x);
=======

Well, yes, but in fairness, the OP offered that only as an example.

For me, until I understand better the advantages of the Goertzel algorithm, the thing to do would be an explicit DFT computation

Result=exp(j*(-2*pi/N)*freqs(:)*(0:n-1))*X;

where freqs is a vector of zero-based frequency indices.

From: Bruno Luong on 27 Mar 2010 10:14

"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hokvm6$3u6$1(a)fred.mathworks.com>...

> This must be something from the Signal Processing Toolbox. I have no doc on a GOERTZEL command, at any rate.
>
> I've never encountered the Goertzel algorithm before, although from this wiki article, it has obviously been around for a long time,
>
> http://en.wikipedia.org/wiki/Goertzel
>
> However, I can't really see what advantages it conveys over direct evaluation of the DFT. The computational complexity, as discussed in teh article, seems to be the same for both.

I must agree with Matt here. Goertzel's is probably intesresting for realtime system. But on Intel processor, it can be beat with simple C program as showed below (Vista 32, 2010A, 2-year-old laptop)

N = 1e6;
data=rand(1,N)+1i*rand(1,N);
k = 1000; % fft component

tic
y1=goertzel(data,k);
t1=toc % Elapsed time is 0.0280 seconds.

tic
N = length(data);
a=exp(-1j*(2*pi*(k-1)/N));
v(1:N) = a;
v(1) = 1;
v = cumprod(v);
y2 = data*v(:);
t2=toc % Elapsed time is 0.0752 seconds.

tic
N = length(data);
a = exp(-1j*(2*pi*(k-1)/N));
y3 = get1fft(data, a); % FEX
t3=toc % Elapsed time is 0.0118 seconds.

/* MEXFILE get1fft.C */
#include "mex.h"

#define X prhs[0]
#define A prhs[1]
#define S plhs[0]
void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])
{
mwIndex N, n;
double ar, ai, anr, ani, tmp;
double sr, si;
double *xr, *xi, *xrn, *xin;
mwSize dim[2] = {1, 1};

N = mxGetN(X);
ar = *mxGetPr(A);
ai = *mxGetPi(A);
xr = mxGetPr(X);
xi = mxGetPi(X);
anr = 1;
ani = 0;

sr = si = 0;
xrn = xr;
if (xi==NULL)
{
for (n=0; n<N; n++)
{
sr += anr*(*xrn);
si += ani*(*xrn);
tmp = anr*ar-ani*ai;
ani = anr*ai+ani*ar;
anr = tmp;
xrn++;
}
} else
{
xin = xi;
for (n=0; n<N; n++)
{
sr += anr*(*xrn)-ani*(*xin);
si += ani*(*xrn)+anr*(*xin);
tmp = anr*ar-ani*ai;
ani = anr*ai+ani*ar;
anr = tmp;
xrn++;
xin++;
}
}

S = mxCreateNumericArray(2, dim, mxDOUBLE_CLASS, mxCOMPLEX);
*mxGetPr(S) = sr;
*mxGetPi(S) = si;

return;
}

% Bruno

From: Bruno Luong on 27 Mar 2010 10:45

BTW thank you Wayne for the head up of goertzel's algorithm. I have leaned something new.

Bruno

From: Wayne King on 27 Mar 2010 10:54

"Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <hol5m2$rr0$1(a)fred.mathworks.com>...
> BTW thank you Wayne for the head up of goertzel's algorithm. I have leaned something new.
>
> Bruno

No problem :) I thought that maybe in its second-order implementation, the Goertzel algorithm might help, but I see that the computational savings are not what I had thought.

Wayne

From: Lei on 28 Mar 2010 02:11

"Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <hol3rd$2h8$1(a)fred.mathworks.com>...
> "Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <hokvm6$3u6$1(a)fred.mathworks.com>...
>
> > This must be something from the Signal Processing Toolbox. I have no doc on a GOERTZEL command, at any rate.
> >
> > I've never encountered the Goertzel algorithm before, although from this wiki article, it has obviously been around for a long time,
> >
> > http://en.wikipedia.org/wiki/Goertzel
> >
> > However, I can't really see what advantages it conveys over direct evaluation of the DFT. The computational complexity, as discussed in teh article, seems to be the same for both.
>
> I must agree with Matt here. Goertzel's is probably intesresting for realtime system. But on Intel processor, it can be beat with simple C program as showed below (Vista 32, 2010A, 2-year-old laptop)
>
> N = 1e6;
> data=rand(1,N)+1i*rand(1,N);
> k = 1000; % fft component
>
> tic
> y1=goertzel(data,k);
> t1=toc % Elapsed time is 0.0280 seconds.
>
> tic
> N = length(data);
> a=exp(-1j*(2*pi*(k-1)/N));
> v(1:N) = a;
> v(1) = 1;
> v = cumprod(v);
> y2 = data*v(:);
> t2=toc % Elapsed time is 0.0752 seconds.
>
>
> tic
> N = length(data);
> a = exp(-1j*(2*pi*(k-1)/N));
> y3 = get1fft(data, a); % FEX
> t3=toc % Elapsed time is 0.0118 seconds.
>
>
> /* MEXFILE get1fft.C */
> #include "mex.h"
>
> #define X prhs[0]
> #define A prhs[1]
> #define S plhs[0]
> void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])
> {
> mwIndex N, n;
> double ar, ai, anr, ani, tmp;
> double sr, si;
> double *xr, *xi, *xrn, *xin;
> mwSize dim[2] = {1, 1};
>
> N = mxGetN(X);
> ar = *mxGetPr(A);
> ai = *mxGetPi(A);
> xr = mxGetPr(X);
> xi = mxGetPi(X);
> anr = 1;
> ani = 0;
>
> sr = si = 0;
> xrn = xr;
> if (xi==NULL)
> {
> for (n=0; n<N; n++)
> {
> sr += anr*(*xrn);
> si += ani*(*xrn);
> tmp = anr*ar-ani*ai;
> ani = anr*ai+ani*ar;
> anr = tmp;
> xrn++;
> }
> } else
> {
> xin = xi;
> for (n=0; n<N; n++)
> {
> sr += anr*(*xrn)-ani*(*xin);
> si += ani*(*xrn)+anr*(*xin);
> tmp = anr*ar-ani*ai;
> ani = anr*ai+ani*ar;
> anr = tmp;
> xrn++;
> xin++;
> }
> }
>
>
> S = mxCreateNumericArray(2, dim, mxDOUBLE_CLASS, mxCOMPLEX);
> *mxGetPr(S) = sr;
> *mxGetPi(S) = si;
>
> return;
> }
>
> % Bruno

Dear Bruno,
Thank you very much for your guidance. The Mexfunction approach is absolutely an brillant approach, actually I am just about to use this approach to optimize my code. I have conducted a few tests on the algorithm proposed by you, and it has huge advantages over the build-in "fft" when there are large numbers of input points (e.g. N = 2^20).
However in my application, I only need to calculate 1024-point partial FFTs. The sample points are 256, and the other input points are zeros. I did test the computational speed of this algorithm, as is shown below:

clear
N = 1024;
data1=rand(1,256)+1i*rand(1,256);
data2=zeros(1,N-256);
data = [data1 data2];
k = 2; % fft component
L = length(data);
a = exp(-1j*(2*pi*(k-1)/L));
tic
for lop = 1:1000
%y1 = p_ifft(data,N,1);
%y2 = pfft_lei(data);
y1 = get1fft(data, a); % FEX
end
t1=toc % Elapsed time is 0.0385 seconds.

tic
for lop2 = 1:1000
y2 = fft(data);
end
t2 = toc % Elapsed time is 0.0275 seconds.

It shows that under this circumstance the computational speed of the Mexfunction is slower than the build-in "fft" function. I am wondering how to solve this problem.

Before yesterday I had no idea on the Mexfunction approach, so I need to learn this programming skill at the first stage. The code below is my partial-FFT algorithm:

function Y =pfft_lei(data)
% This is a radix-2 decimation-in-time PARTIAL-FFT algorithm
N = length(data);
X=bitrevorder(data);
L=log2(N);
Y(N,L,2) = zeros;

bin1 = path_trace(N);

for q = 1:N/2 %Output index selection
k3=1; bin = bin1(q,:);
for i1=1:L %Iteration stage
for i2=bin(i1):k3:N
if (Y(i2,i1,2)==0)
k=N*(bin(i1)-1)/k3/2;
j=i2+k3;
W=complex(cos(2*pi*k/N),-sin(2*pi*k/N));
T=X(j)*W;
X(j)=X(i2)-T; X(i2)=X(i2)+T;
Y(i2,i1,1)=X(i2);
Y(j,i1,1)=X(j);
Y(i2,i1,2)=1;
Y(j,i1,2)=1;
else
X(i2)=Y(i2,i1,1);
end
end
k3=k3*2;
end
end

function bin=path_trace(N)
L = log2(N);
for s=1:L
bin(:,s)=repmat([1:2^(s-1)],1,N/2^(s-1));
end

Currently I don't know how the convert this program into an Mex file. Besides, my algorithm is quite time-comsuming and it does not have any value for practical application. I may need some help on optimizing my code by the Mex approach. Thanks.

First | Prev | Next | Last
Pages: 1 2 3
Prev: Problem with gath geva algorithm
Next: geometry profile in pde tool.