From: Rudolph Gatt on
Hi, I am trying to convert a matlab code to CUDA. In matlab, the functionY = fft2(X,m,n) truncates X, or pads X with zeros to create an m-by-n array before doing the transform.
I would like to perform a fft2 on 2D filter with the CUFFT library. I did not find any CUDA API function which does zero padding so I implemented my own. This function adds zeros to the inputted matrix as follows (from a 3X3 matrix to a 6X6 matrix):

3 X 3
1 1 1
1 1 1
1 1 1

to

6 X 6
1 1 1 0 0 0
1 1 1 0 0 0
1 1 1 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0

When I zero pad the 2D filter and compute the fft2 (CUFFT) for a square matrix (example 3X3 matrix) the result matches the result of the matlab code. But when the 2D filter is not a square matrix (example 3X6 matrix) the result does not match the result of the matlab code.

Can somebody help me please? Can somebody verify whether the zero padding is done the way I am doing it in matlab?

//-------------------MATLAB code--------------------------------
for (o = 1 : numOrient)
fftf{o} = fft2(allFilter{1, o}, sx+h+h, sy+h+h);
end
//-------------------MATLAB code--------------------------------

//--------------CUDA Code---------------------(Zero padding Kernel)
__global__ void
zeroPadding(cufftComplex* Filter, cufftComplex* InputFilterFFT, int newCols, int newRows, int oldCols, int oldRows)
{
unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;

if ((x <= oldCols) && (y <= oldRows))
InputFilterFFT[(y * newCols) + x] = Filter[(y * oldCols) + x];
else
{
if ((x <= newCols) && (y <= newRows))
{
cufftComplex temp;
temp.x = 0.0; temp.y = 0.0;
InputFilterFFT[(y * newCols) + x] = temp;
}
}
}
//--------------Code---------------------(Zero padding Kernel)

int ApplyGaborFilter(cufftComplex *allFilter[])
{
....
for (int i = 0; i < 15; i++)
{
CUT_SAFE_CALL(cudaMemcpy(d_Filter, allFilter[i], (17 * 17 * sizeof(cufftComplex)), cudaMemcpyHostToDevice)); checkforerror();
dim3 dimBlock(16, 16);
dim3 dimGrid(cuiDivUp(cols, dimBlock.x), cuiDivUp(rows, dimBlock.y), 1);
// Call the zero-padding kernel
zeroPadding<<< dimGrid, dimBlock, 0 >>>( d_Filter, d_InputFilterFFT, cols, rows, 17, 17);
CUDA_SAFE_CALL( cudaThreadSynchronize() );

CUT_SAFE_CALL(cudaMemcpy(FilterFFT[i], d_InputFilterFFT, (cols * rows * sizeof(cufftComplex)), cudaMemcpyDeviceToHost)); checkforerror();

cufftExecC2C(plan, d_InputFilterFFT, d_OutputFilterFFT, CUFFT_FORWARD);
CUT_SAFE_CALL(cudaMemcpy(FilterFFT[i], d_OutputFilterFFT, (cols * rows * sizeof(cufftComplex)), cudaMemcpyDeviceToHost)); checkforerror();
}
....
From: John Melonakos on
There are a bunch of tricky things like this in CUDA/MATLAB. Jacket (http://www.accelereyes.com) would probably help you avoid this hassle - just shoot us an email to sales(a)accelereyes.com. If you are trying to get CUDA code into MATLAB or vice versa, you may also want to checkout the JacketSDK.

-John
 | 
Pages: 1
Prev: help
Next: compiling C to mex file