From: Lei on
The Matlab build-in function "fft" is a highly efficient algorithm for fast implementation of the Fourier transform. However, in some situations I do not need the calculate the full-point FFTs every time. Suppose I only need to calculate a subset of outputs, I have to develop a partial FFT (i.e. fractional FFT, or pruned FFT) algorithm.

In theory, the computational speed of the partial-FFT algorithm should be faster than the build-in "fft" function because "fft" calculates full-point FFT every time. For example, if I only need to calculate one output, say X(9), of a 1024-point FFT, my partial FFT algorithm should be faster than the standard "fft" algorithm.

Actually I have developed a partial-FFT algorithm by myself in an M-file, but the computational speed is far slower than the build-in "fft" function. My application is to calculate 1024-point partial FFTs, the first 256 inputs are complex numbers and the rest inputs are all zeros. Because I only need to get one output sample each time, so I only need a partial FFT algorithm, not the full-point FFT. I need to explore possible code optimization for my partial FFT algorithm.

The Mex-files apprach is a brilliant approach.

Before yesterday I had no idea on how to develop an Mex file, so I need to learn this programming skill at the first stage. The code below is my partial-FFT algorithm:

function Y =pfft_lei(data)
% This is a radix-2 decimation-in-time PARTIAL-FFT algorithm
N = length(data);
X=bitrevorder(data);
L=log2(N);
Y(N,L,2) = zeros;

bin1 = path_trace(N);

for q = 1:N/2 %Output index selection
k3=1; bin = bin1(q,:);
for i1=1:L %Iteration stage
for i2=bin(i1):k3:N
if (Y(i2,i1,2)==0)
k=N*(bin(i1)-1)/k3/2;
j=i2+k3;
W=complex(cos(2*pi*k/N),-sin(2*pi*k/N));
T=X(j)*W;
X(j)=X(i2)-T; X(i2)=X(i2)+T;
Y(i2,i1,1)=X(i2);
Y(j,i1,1)=X(j);
Y(i2,i1,2)=1;
Y(j,i1,2)=1;
else
X(i2)=Y(i2,i1,1);
end
end
k3=k3*2;
end
end

function bin=path_trace(N)
L = log2(N);
for s=1:L
bin(:,s)=repmat([1:2^(s-1)],1,N/2^(s-1));
end

IIf the above code is implemented in Matlab directly, it is really time-comsuming to get the output values. I need some help on optimizing my code by the Mex approach. How to convert the code into Mex-files?
Thanks.