Prev: spm_select
Next: MATLAB code speed
From: Jean-Francois on 15 Mar 2010 13:16 "Jan Simon" <matlab.THIS_YEAR(a)nMINUSsimon.de> wrote in message <hniell$g5l$1(a)fred.mathworks.com>... > Dear Jean-Francois! > > > Also, what is meant in Rune's post (#4) by 'storing vectors sequentially' and 'optimizing the compiler'? I use MSVC++, I guess with defaults settings. What should I do? > > Storing vectors sequentially has the advantage, that the function can access neighbouring elements. Neighbouring elements can be accessed faster, because it is "easier" to get them from the RAM. E.g. the summation over rows of a matrix is slower than over columns: > x = rand(1000); > tic; for i=1:100; v = sum(x, 1); end; toc % fast > tic; for i=1:100; v = sum(x, 2); end; toc % slow > Transposing the input is usually not recommended, because the TRANSPOSE itself suffers from accessing elements, which are not neighbouring: > tic; for i=1:100; v = sum(transpose(x), 1); end; toc % slow also > So it is recommended to create the arrays such, that the orientation allows a sequential access (here the example is ridiculous, but it is just a demonstration): > x = transposes(x); tic; for i=1:100; v = sum(x, 1); end; toc % fast > > You can try to set the /arch:SSE2 flag in the mexopts.bat file in the folder PREFDIR. This sometimes help. > > Kind regards, Jan ********************** Hi Jan, 1. How do I set the /arch:SSE2 flag in the mexopts.bat file? There is no reference to /arch. 2. Also, is using --> mex -O "filename" <-- supposed to significantly optimize the code? I tried it, but with no noticeable difference. 3. Are there exceptions to programming column wise in C that you can think of? I have just begun low-level programming, so rules of thumb would make the game relatively easier at this stage. Thanks for your help.
From: Jan Simon on 17 Mar 2010 08:51
Dear Jean-Francois! > 1. How do I set the /arch:SSE2 flag in the mexopts.bat file? There is no reference to /arch. Create the mexopts.bat file with "mex -setup" cd(prefdir) edit mexopts.bat ==> insert "/arch:SSE2" in the line: set OPTIMFLAGS= ... You can try this also: "/fp:fast" In both cases you have to check the (accuracy of) the results and the speed can be in- or decreased! > 2. Also, is using --> mex -O "filename" <-- supposed to significantly optimize the code? I tried it, but with no noticeable difference. This can happen. > 3. Are there exceptions to programming column wise in C that you can think of? I have just begun low-level programming, so rules of thumb would make the game relatively easier at this stage. Accessing neighbouring elements is faster in every programming language, because this allows an faster caching by the processor. The really important and general exception is, that columnwise storing of values can be confusing and the time to debug the "optimal" algorithm can be 1000 times longer than the total processing time. Never optimize a part of a program, which is not frequently used - just concentrate on the bottlenecks. Kind regards, Jan |