From: Henrik Andresen on
Hi

I'm developing some CUDA code and uses my laptop for this implementation. Since I don't have a CUDA enabled GPU here, I use emulation mode, which works fine... usually.

I compile using the 3.0 toolkit, matlab 2010a and MS VC++ Express 2008, using sm_13 compilation flags.

I have some simple code which basically is just adding numbers. I run this code with the following:

dim3 dimBlock( 16, 8 );
dim3 dimGrid( 16, 16 );

This works fine, and returns the result in less than 1 second. If I increase the block dimension to 16x16, my Matlab simply waits with no CPU usage or memory usage, and I have to kill the thread through windows. Any reason why this would happen?

This stall happens whenever I start a process with more than 155 threads. Changing the number of blocks also don't do anything. The moment I execute a thread with 156 threads, my entire process stalls with no CPU usage whatsoever.

Anybody with thoughts / suggestions?

Cheers

Henrik Andresen