From: Jie on 7 Jun 2010 19:25 Hi, I have just bought Matlab Parallel Computing Toolbox and tried to use some very simple test to see how much time I may be able to save by using "parfor" instead of "for" loop. I have 4 processors on my local PC and I used matlabpool to open them all. To my surprise, using "parfor" is not too much faster than using "for" loops. Am I doing anything wrong here? clear all; close all tic r = rand(1,10000); parfor ii = 1:length(r) b(ii) = r(ii); end toc Thanks, Jie
From: Walter Roberson on 7 Jun 2010 19:37 Jie wrote: > Hi, I have just bought Matlab Parallel Computing Toolbox and tried to > use some very simple test to see how much time I may be able to save by > using "parfor" instead of "for" loop. I have 4 processors on my local PC > and I used matlabpool to open them all. To my surprise, using "parfor" > is not too much faster than using "for" loops. Am I doing anything wrong > here? > clear all; close all > > tic > r = rand(1,10000); > parfor ii = 1:length(r) > b(ii) = r(ii); > end > > toc Chunk your computations numchunks = 4; tic r = rand(1,10000); chunklength = length(r) / numchunks; %watch out for non-divisible parfor ii = 1 : numchunks b(1+(ii-1)*chunklength : ii*chunklength) = r(1+(ii-1)*chunklength : ii*chunklength); end It is inefficient to access too few elements at the same time, as processor caches typically buffer ~128 KB to 2 MB.
From: Jie on 7 Jun 2010 20:58 Thank you for your reply, Walter. I was trying to run your modified program but it didn't run because of the way b was defined. My main question is actually how to use "parfor" efficiently but I don't quite understand the last sentence in your message. I do simulations that requires repeating many frequencies. So I use "parfor" for the frequency loop. By comparing with "parfor" and "for" computation time, seems I only have saved 40% of computation time. I need to mention that within each frequency loop, the computation is fairly intensive and big matrices are generated. I have tried to put the broadcasting variables inside the loop and use variables that I need as sliced variables, but it doesn't help reduce the computation time. This makes me wonder, with 4 processors, whether using "parfor" can reduce the computation time by a factor of 4 than using "for"? Thanks, Jie Walter Roberson <roberson(a)hushmail.com> wrote in message <hujvub$hev$1(a)canopus.cc.umanitoba.ca>... > > Chunk your computations > > numchunks = 4; > tic > r = rand(1,10000); > chunklength = length(r) / numchunks; %watch out for non-divisible > parfor ii = 1 : numchunks > b(1+(ii-1)*chunklength : ii*chunklength) = r(1+(ii-1)*chunklength : > ii*chunklength); > end > > > It is inefficient to access too few elements at the same time, as processor > caches typically buffer ~128 KB to 2 MB.
From: Walter Roberson on 8 Jun 2010 00:46 Jie wrote: > My main > question is actually how to use "parfor" efficiently but I don't quite > understand the last sentence in your message. I do simulations that > requires repeating many frequencies. So I use "parfor" for the frequency > loop. By comparing with "parfor" and "for" computation time, seems I > only have saved 40% of computation time. I need to mention that within > each frequency loop, the computation is fairly intensive and big > matrices are generated. I have tried to put the broadcasting variables > inside the loop and use variables that I need as sliced variables, but > it doesn't help reduce the computation time. This makes me wonder, with > 4 processors, whether using "parfor" can reduce the computation time by > a factor of 4 than using "for"? Never for something as small as 10000 elements. Setting up for a parfor requires a bunch of initialization, as any variables held in common or needed for the loop have to be copied to each of the processors, then the computation has to be done, and then there has to be synchronization between the threads to be sure they do not write over what the other one is doing, and then the results have to be copied back from the individual processors to the one that initiated it all. The resulting overhead is sufficient that Matlab does not even bother to start doing it until the arrays have at least 10000 elements (or so I gather.) There are other reasons why working with a small vector in parallel can be much *less* efficient than doing it on a single processor. Please do some reading on the topics "primary cache", "secondary cache", "cache lines", and "cache thrashing". The effects are similar to having too many people trying to go through a doorway at the same time -- and since each of the people is moving at the same rate, when they back off all of them try again at the same time, producing another collision.
From: Steven Lord on 8 Jun 2010 10:22 "Jie " <yangjie915(a)gmail.com> wrote in message news:hujv4j$d1c$1(a)fred.mathworks.com... > Hi, I have just bought Matlab Parallel Computing Toolbox and tried to use > some very simple test to see how much time I may be able to save by using > "parfor" instead of "for" loop. I have 4 processors on my local PC and I > used matlabpool to open them all. To my surprise, using "parfor" is not > too much faster than using "for" loops. Am I doing anything wrong here? If you have a very simple task (as you do in this case), the overhead of subdividing the problem, performing the communication necessary to distribute the work to the workers, and retrieving the work from the workers outweighs the gain you get from performing the problem calculations in parallel. To put it another way: if you need to throw away a piece of paper, is it faster to call three of your friends into the room, tear the paper into fourths, give each friend a piece, and have them throw away their piece or to simply step over to the trash can yourself? Now if you were doing something potentially time-consuming, like looking for a needle in a haystack, you might divide the stack into N pieces and have each friend look through a piece; because that task can take a long time (as tested by the MythBusters; http://en.wikipedia.org/wiki/MythBusters_(2004_season)#Needle_in_a_Haystack) you might actually gain some benefit from doing so. -- Steve Lord slord(a)mathworks.com comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ To contact Technical Support use the Contact Us link on http://www.mathworks.com
|
Pages: 1 Prev: Plotting Line Iteratively Next: Patch 2 different object in one GUI window |