From: Jan Simon on 23 Feb 2010 07:46 Dear Rune! > Of course I became a bit intrigued by this, so I couldn't > resists testing the two C compilers against the built-in > SUM function (matlab script and C code below). As before, > I compiled the same C code to two executables, testsum2lcc > compiled with the LCC compiler and testsum2msvc compiled > with the MSVS 2008 C compiler, where the /arch:SSE2 flag > was set in the MSVC compiler. > > The output is: > Results match > SUM : 100 runs in 3.59375 s > testsum2lcc : 100 runs in 3.78125 s > testsum2msvc : 100 runs in 1.75 s As expected: Better compilers compile better compilations. > Ah, yes, I almost forgot: I ran this test with an old matlab > version (R2006a). If somebody tries this with a newer version, > keep in mind that there were some changes made recently, > where the SUM function was adapted to match the results > of parallel algorithms. Exactly the behaviour of parallelized SUM was my problem. But now give LCC a chance to use its optimizer: % ---------------------------------------- 8< ------------- % SimpleSum.c, Jan Simon, Matlab 6.5 to 2009b #include "mex.h" // 32 bit array dimensions for Matlab 6.5: #ifndef MWSIZE_MAX #define mwSize int32_T // Defined in tmwtypes.h #define mwIndex int32_T #define MWSIZE_MAX MAX_int32_T #endif double GetSum1(double *X, int N); void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { double *X; mwSize N; X = mxGetPr(prhs[0]); N = mxGetNumberOfElements(prhs[0]); plhs[0] = mxCreateDoubleScalar(GetSum1(X, N)); return; } double GetSum1(double *X, int N) { int i; double Sum = 0.0; for (i = 0; i < N; i++) { Sum += X[i]; } return (Sum); } % ---------------------------- >8 ------------------------ While Open Watcom can handle your program as expected, the LCC from 2003 shipped with Matlab needs the caclulations in a separate function to get the register allocation correctly. Then the time for summing gets from 0.90 to 0.60 on my computer. I'm interested if this speed gain is reproducible. I will compare this with MSVS 2008 when I find some time. Kind regards, Jan
From: Jan Simon on 27 Feb 2010 07:01 Dear Rune! Sorry - this post is only weakly related to Matlab. At least I'm going to publish the source for stable summation in the FEX. Snippet from the C-Mex implementation: double GetSum1(double *X, double *Xf) { double Sum = 0.0; // or: long double _control87(PC_64, MCW_PC); // LCC: _control87(_PC_64,_MCW_PC); for ( ; X < Xf; Sum += *X++) ; // empty loop _control87(PC_53, MCW_PC); // LCC: _control87(_PC_53,_MCW_PC); return (Sum); } X = randn(1E7, 1); tic; for i=1:10; v = sum(X); clear('v'); end; toc tic; for i=1:10; v = mexsum(X); clear('v'); end; toc Oberservations (1.5 GHz Pentium-M, Matlab 2009a, single-threaded): - SUM: 0.94 sec - LCC v2.4 (shipped with Matlab): 0.96 sec, same accuracy as SUM - LCC v3.8: 0.66 sec, same accuracy SUM. 3 additional valid digits, if "Sum" is a long double. Then 0.85 sec. - Open Watcom 1.8: 0.70 sec 3 additional valid digits, because the double "Sum" is really accumulated in a 80 bit register (but no further improvements for long double). - MS VC++ 2008 Express: 0.50 sec The result is just 5% more accurate than SUM when compiled with /fp:fast and equal to SUM when compiled with /fp:precise (5% ?!? How can this be possible?). No differences between double and long double. Jan
First
|
Prev
|
Pages: 1 2 3 4 5 Prev: Pass large matrices accross matlab functions Next: RGB 2D slice in a 3D plot |