Speed change of SUM [Matlab]

Prev: Pass large matrices accross matlab functions
Next: RGB 2D slice in a 3D plot

From: Jan Simon on 22 Feb 2010 04:32

Dear readers!

I've done some comparisons between Matlab's SUM and a C-Mex function. Now I'm confused about this:

Matlab 2009a, single threaded (1.5 GHz Pentium-M):
x = rand(88999, 1);
tic; for i = 1:1000; v = sum(x); clear('v'); end
>> 0.22 sec

x = rand(89000, 1);
tic; for i = 1:1000; v = sum(x); clear('v'); end
>> 0.82 sec

I assume, that Matlab tries to start different treads at the magic limit of 90.000 elements - and of my single core system this slows down the processing by 75%! This does not appear if X is a SINGLE vector with the same of the double length.
The behaviour is not concerned by starting Matlab with the -singleCompThread flag.

Please, could somebody confirm this?
What is the speed gain / loss on dual/quad cores?

My current solution is a trivial C-Mex function for calculating the sum: For 1000 elements it is 40% slower than Matlab's SUM (thanks TMW!), for 1E7 elements it is 40% faster than SUM (???).

Kind regards, Jan

From: Bruno Luong on 22 Feb 2010 04:48

"Jan Simon" <matlab.THIS_YEAR(a)nMINUSsimon.de> wrote in message <hltium$5mv$1(a)fred.mathworks.com>...
> Dear readers!
>
> I've done some comparisons between Matlab's SUM and a C-Mex function. Now I'm confused about this:
>
> Matlab 2009a, single threaded (1.5 GHz Pentium-M):
> x = rand(88999, 1);
> tic; for i = 1:1000; v = sum(x); clear('v'); end
> >> 0.22 sec
>
> x = rand(89000, 1);
> tic; for i = 1:1000; v = sum(x); clear('v'); end
> >> 0.82 sec
>
> I assume, that Matlab tries to start different treads at the magic limit of 90.000 elements - and of my single core system this slows down the processing by 75%! This does not appear if X is a SINGLE vector with the same of the double length.
> The behaviour is not concerned by starting Matlab with the -singleCompThread flag.
>
> Please, could somebody confirm this?

Dear Jan,

I confirm this, see my reply #14 on this thread:

http://www.mathworks.com/matlabcentral/newsreader/view_thread/260828

Bruno

From: Rune Allnor on 22 Feb 2010 05:10

On 22 Feb, 10:32, "Jan Simon" <matlab.THIS_Y...(a)nMINUSsimon.de> wrote:

> My current solution is a trivial C-Mex function for calculating the sum: For 1000 elements it is 40% slower than Matlab's SUM (thanks TMW!),

It's not TMW that are that good; it's probably you who
haven't configured your C compiler correctly.

Make sure you switch on all the optimizations (inlcuding
extended instruction sets, /arch:SSE2, in visual studio)
and switch off any buffer checks etc (/GS- in Visual Studio,
C++ mode). There might be more to it than that, like the
SECURE_SCL=0 compiler directive that makes a huge difference
with MSVC++, buth that likely has no counterpart in MSVC.

There might also be a question about which compiler you use.
One might expect Intel's compiler to produce a bit faster
code on Intel's CPUs, as Intel staff would likely know more
about how to best exploit the Intel CPUs than other vendors.

Rune

From: Jan Simon on 22 Feb 2010 07:05

Dear Rune!

> > My current solution is a trivial C-Mex function for calculating the sum: For 1000 elements it is 40% slower than Matlab's SUM (thanks TMW!),
>
> It's not TMW that are that good; it's probably you who
> haven't configured your C compiler correctly.

I let Open Watcom 1.8, LCC 2.4 (shipped with Matlab), LCC 3.8 (from the net) and BCC 5.5 compile my C-code.
The speed of the executables created by these compilers can differ remarkably. E.g. LCC 3.8 is really fast for accessing a vector by a moving pointer ("Sum += *X++;"), while BCC can handle indices efficiently ("Sum += X[i]"). So it is not the configuration of the compiler, but finding a formulation of the algorithm, which matchs the compilers optimizer best.

Kind regards, Jan

From: Jan Simon on 22 Feb 2010 07:16

Dear Bruno!

> > I assume, that Matlab tries to start different treads at the magic limit of 90.000 elements - and of my single core system this slows down the processing by 75%! This does not appear if X is a SINGLE vector with the same of the double length.
> > The behaviour is not concerned by starting Matlab with the -singleCompThread flag.
> >
> > Please, could somebody confirm this?

> I confirm this, see my reply #14 on this thread:
>
> http://www.mathworks.com/matlabcentral/newsreader/view_thread/260828

Thanks Bruno! I've searched the NG for "88999" and "88900" also, but in this thread the magic number was "88998"...

So if I understand correctly: If the vector length exceeds 89000, the sum is split into parts. Processing these parts on a single core slows down the computing by 40% to 80% (compared with my dull Mex or with SUM for small vectors). This slow down could be the effect of several threads competing with eachother to use the processor cache.

This sounds suboptimal. Jan

| Next | Last
Pages: 1 2 3 4 5
Prev: Pass large matrices accross matlab functions
Next: RGB 2D slice in a 3D plot