From: Jie on
Hi, I have just bought Matlab Parallel Computing Toolbox and tried to use some very simple test to see how much time I may be able to save by using "parfor" instead of "for" loop. I have 4 processors on my local PC and I used matlabpool to open them all. To my surprise, using "parfor" is not too much faster than using "for" loops. Am I doing anything wrong here?

clear all; close all

tic
r = rand(1,10000);
parfor ii = 1:length(r)
b(ii) = r(ii);
end

toc

Thanks,
Jie
From: Walter Roberson on
Jie wrote:
> Hi, I have just bought Matlab Parallel Computing Toolbox and tried to
> use some very simple test to see how much time I may be able to save by
> using "parfor" instead of "for" loop. I have 4 processors on my local PC
> and I used matlabpool to open them all. To my surprise, using "parfor"
> is not too much faster than using "for" loops. Am I doing anything wrong
> here?
> clear all; close all
>
> tic
> r = rand(1,10000);
> parfor ii = 1:length(r)
> b(ii) = r(ii);
> end
>
> toc


Chunk your computations

numchunks = 4;
tic
r = rand(1,10000);
chunklength = length(r) / numchunks; %watch out for non-divisible
parfor ii = 1 : numchunks
b(1+(ii-1)*chunklength : ii*chunklength) = r(1+(ii-1)*chunklength :
ii*chunklength);
end


It is inefficient to access too few elements at the same time, as processor
caches typically buffer ~128 KB to 2 MB.
From: Jie on
Thank you for your reply, Walter. I was trying to run your modified program but it didn't run because of the way b was defined. My main question is actually how to use "parfor" efficiently but I don't quite understand the last sentence in your message. I do simulations that requires repeating many frequencies. So I use "parfor" for the frequency loop. By comparing with "parfor" and "for" computation time, seems I only have saved 40% of computation time. I need to mention that within each frequency loop, the computation is fairly intensive and big matrices are generated. I have tried to put the broadcasting variables inside the loop and use variables that I need as sliced variables, but it doesn't help reduce the computation time. This makes me wonder, with 4 processors, whether using "parfor" can reduce the computation time by a factor of 4 than using "for"?

Thanks,
Jie

Walter Roberson <roberson(a)hushmail.com> wrote in message <hujvub$hev$1(a)canopus.cc.umanitoba.ca>...
>
> Chunk your computations
>
> numchunks = 4;
> tic
> r = rand(1,10000);
> chunklength = length(r) / numchunks; %watch out for non-divisible
> parfor ii = 1 : numchunks
> b(1+(ii-1)*chunklength : ii*chunklength) = r(1+(ii-1)*chunklength :
> ii*chunklength);
> end
>
>
> It is inefficient to access too few elements at the same time, as processor
> caches typically buffer ~128 KB to 2 MB.
From: Walter Roberson on
Jie wrote:
> My main
> question is actually how to use "parfor" efficiently but I don't quite
> understand the last sentence in your message. I do simulations that
> requires repeating many frequencies. So I use "parfor" for the frequency
> loop. By comparing with "parfor" and "for" computation time, seems I
> only have saved 40% of computation time. I need to mention that within
> each frequency loop, the computation is fairly intensive and big
> matrices are generated. I have tried to put the broadcasting variables
> inside the loop and use variables that I need as sliced variables, but
> it doesn't help reduce the computation time. This makes me wonder, with
> 4 processors, whether using "parfor" can reduce the computation time by
> a factor of 4 than using "for"?

Never for something as small as 10000 elements. Setting up for a parfor
requires a bunch of initialization, as any variables held in common or
needed for the loop have to be copied to each of the processors, then
the computation has to be done, and then there has to be synchronization
between the threads to be sure they do not write over what the other one
is doing, and then the results have to be copied back from the
individual processors to the one that initiated it all. The resulting
overhead is sufficient that Matlab does not even bother to start doing
it until the arrays have at least 10000 elements (or so I gather.)

There are other reasons why working with a small vector in parallel can
be much *less* efficient than doing it on a single processor. Please do
some reading on the topics "primary cache", "secondary cache", "cache
lines", and "cache thrashing". The effects are similar to having too
many people trying to go through a doorway at the same time -- and since
each of the people is moving at the same rate, when they back off all of
them try again at the same time, producing another collision.
From: Steven Lord on

"Jie " <yangjie915(a)gmail.com> wrote in message
news:hujv4j$d1c$1(a)fred.mathworks.com...
> Hi, I have just bought Matlab Parallel Computing Toolbox and tried to use
> some very simple test to see how much time I may be able to save by using
> "parfor" instead of "for" loop. I have 4 processors on my local PC and I
> used matlabpool to open them all. To my surprise, using "parfor" is not
> too much faster than using "for" loops. Am I doing anything wrong here?

If you have a very simple task (as you do in this case), the overhead of
subdividing the problem, performing the communication necessary to
distribute the work to the workers, and retrieving the work from the workers
outweighs the gain you get from performing the problem calculations in
parallel.

To put it another way: if you need to throw away a piece of paper, is it
faster to call three of your friends into the room, tear the paper into
fourths, give each friend a piece, and have them throw away their piece or
to simply step over to the trash can yourself?

Now if you were doing something potentially time-consuming, like looking for
a needle in a haystack, you might divide the stack into N pieces and have
each friend look through a piece; because that task can take a long time (as
tested by the MythBusters;
http://en.wikipedia.org/wiki/MythBusters_(2004_season)#Needle_in_a_Haystack)
you might actually gain some benefit from doing so.

--
Steve Lord
slord(a)mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ
To contact Technical Support use the Contact Us link on
http://www.mathworks.com