From: Jan Simon on 22 Feb 2010 15:25 Dear Rune! > Well, if you aren't interested, I am a bit surprised that you > brought up the subject in the first place - mind you; you *did* > criticize matlab by comparing it to C. That alone is sufficient > reason to question if you understand your own comparision. Thanks for posting. I've explained already, what I'm interested in. You are welcome to answer or not to answer. Jan
From: Jan Simon on 22 Feb 2010 15:56 Dear Bruno, Oleg and Aarif! Thanks!!! I've stacked the commands in one line for compact code in the newsgroup post only. In real programs this impedes line-wise debugging and readability, and speed if the JIT compiler gets confused. Summary: > 2010A, 64 bit Prerelease, Intel core 2 duo E8500, 3.16 GHz > x = rand(88999, 1); > toc % Elapsed time is 0.102085 seconds. > x = rand(89000, 1); > toc % Elapsed time is 0.107495 seconds. > 2007a, Athalon 64 X2 2.4 GHz > x = rand(88999, 1); > Elapsed time is 0.283402 seconds. > x = rand(89000, 1); > Elapsed time is 0.282451 seconds. > 2009b, Intel Core 2 Duo 2.5 GHz Vista 32 > time drops around the 89.. limit > 2009a, Pentium-M: > time drops at 88999 My conclusion: The SUM breakdown was not present in 2007a, but in 2009a and 2009b and it vanished again in 2010A 64 bit. The behaviour does not depend on the number of cores (except if Oleg started his Matlab with -singleCompThread, but I do not assume this). Jan
From: Oleg Komarov on 22 Feb 2010 16:17 "Jan Simon" <matlab.THIS_YEAR(a)nMINUSsimon.de> wrote in message <hlur14$aru$1(a)fred.mathworks.com>... > Dear Bruno, Oleg and Aarif! > > Thanks!!! > > I've stacked the commands in one line for compact code in the newsgroup post only. In real programs this impedes line-wise debugging and readability, and speed if the JIT compiler gets confused. > > Summary: > > > 2010A, 64 bit Prerelease, Intel core 2 duo E8500, 3.16 GHz > > x = rand(88999, 1); > > toc % Elapsed time is 0.102085 seconds. > > x = rand(89000, 1); > > toc % Elapsed time is 0.107495 seconds. > > > 2007a, Athalon 64 X2 2.4 GHz > > x = rand(88999, 1); > > Elapsed time is 0.283402 seconds. > > x = rand(89000, 1); > > Elapsed time is 0.282451 seconds. > > > 2009b, Intel Core 2 Duo 2.5 GHz Vista 32 > > time drops around the 89.. limit > > > 2009a, Pentium-M: > > time drops at 88999 > > My conclusion: The SUM breakdown was not present in 2007a, but in 2009a and 2009b and it vanished again in 2010A 64 bit. The behaviour does not depend on the number of cores (except if Oleg started his Matlab with -singleCompThread, but I do not assume this). > > Jan Apparently starting as single threaded the matrixSize-time plot doesn't drop: http://drop.io/lpwtozo Oleg
From: Jan Simon on 22 Feb 2010 16:42 Dear Oleg! > Apparently starting as single threaded the matrixSize-time plot doesn't drop: > http://drop.io/lpwtozo Better conclusion: Matlab 2009a/b, - multi-thread machine: SUM is 50% faster for > 89000 elements. - multi-thread machine driven with singleCompThread: no speed change. - single-thread machine: SUM is 75% slower for > 89000 elements. (independent from -singleCompThread -- not nice, TMW!) Matlab 2010A: At least no change in speed at the magic 89000 limit. So I arrived in the century of multi-threading: Comparing "speed" is ruled by defining it at first. Thanks, Jan
From: Rune Allnor on 23 Feb 2010 04:24 On 22 Feb, 17:46, Rune Allnor <all...(a)tele.ntnu.no> wrote: > So if these numbers are representative for C compilers, your > program, compiled with freeware, runs a factor 10'ish too slow. > Which, by induction, might suggest that matlab runs a factor 5'ish > slower than the fast C code. > > If correct - one would need to confirm your numbers by compiling > your code with a state-of-the-art compiler - the question becomes > why matlab runs that slow in the first place. Of course I became a bit intrigued by this, so I couldn't resists testing the two C compilers against the built-in SUM function (matlab script and C code below). As before, I compiled the same C code to two executables, testsum2lcc compiled with the LCC compiler and testsum2msvc compiled with the MSVS 2008 C compiler, where the /arch:SSE2 flag was set in the MSVC compiler. The output is: Results match SUM : 100 runs in 3.59375 s testsum2lcc : 100 runs in 3.78125 s testsum2msvc : 100 runs in 1.75 s which means the MSVC executable runs twice as fast as the built-in function, which in turn runs some 5-10% faster than the LCC executable. Do note that the data are fetched through a far pointer acess - through a pointer into a different cimpilation module - which often is considered to be slow. This far pointer access would explain the relative improvement by a factor 5 of the LCC relative to the MSVC, compared to the test I did yesterday. In that test all the work was done on local variables. It would be very interesting to see a similar test done with the Intel compiler. Ah, yes, I almost forgot: I ran this test with an old matlab version (R2006a). If somebody tries this with a newer version, keep in mind that there were some changes made recently, where the SUM function was adapted to match the results of parallel algorithms. This will first of all introduce some overhead in the single-thread version of SUM, slowing it even more, and also change the results somewhat so that the matching test at the end will fail. But none of this was in place in 2006, when my matlab version was released. Rune %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% clc x = randn(1,10000000); % Make sure all executables are loaded prior to % timed run s0 = sum(x); s1 = testsum2lcc(x); s2 = testsum2msvc(x); Nruns = 100; t0 = 0; for n=1:Nruns ts = cputime; s0 = sum(x); te = cputime - ts; t0 = t0 + te; end t1 = 0; for n=1:Nruns ts = cputime; s1 = testsum2lcc(x); te = cputime - ts; t1 = t1 + te; end t2 = 0; for n=1:Nruns ts = cputime; s2 = testsum2msvc(x); te = cputime - ts; t2 = t2 + te; end if (~((s0 == s1 )&(s1 == s2 ))) disp('Deviant results') disp('Expected in recent matlab versions due to parallel algorithms') else disp('Results match') end disp(sprintf('SUM : %d runs in %g s',Nruns,t0)); disp(sprintf('testsum2lcc : %d runs in %g s',Nruns,t1)); disp(sprintf('testsum2msvc : %d runs in %g s',Nruns,t2)); /****************************************************************/ #include <math.h> #include "mex.h" extern void _main(); void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[] ) { int i,N; double sum; double *x; N = mxGetN(prhs[0]); x = mxGetPr(prhs[0]); sum = 0; for (i = 0; i< N; ++i) { sum +=x[i]; } plhs[0] = mxCreateDoubleScalar(sum); return; }
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: Pass large matrices accross matlab functions Next: RGB 2D slice in a 3D plot |