From: Jaebum Jung on
You need to distribute definition of your function and variable (fun, b)
to subkernels.

If subkernel doesn't know, it just pass the computation to main kernel.
For example,

In[51]:= fun[n_]:=Module[{a,tic,toc},tic=TimeUsed[];
a=N[Pi,n*10^6];
toc=TimeUsed[];
toc-tic];

In[52]:= fun[3]
Out[52]= 4.66056

In[55]:= b=Table[3,{i,1,8}];

In[72]:= LaunchKernels[]
Out[72]= {KernelObject[3,local],KernelObject[4,local]}

In[73]:= ParallelEvaluate[$ProcessID]
Out[73]= {21462,21463}

In[56]:= DistributeDefinitions[fun,b]

In[66]:= Map[fun[#]&,b]//AbsoluteTiming
Out[66]=
{37.759508,{4.67046,4.67199,4.6748,4.66635,4.66351,4.65885,4.69103,4.67446}}

In[67]:=
Parallelize[Map[fun[#]&,b],Method->"CoarsestGrained"]//AbsoluteTiming
Out[67]=
{20.064523,{4.72813,4.73638,4.72329,4.70464,4.7191,4.73912,4.71899,4.69738}}

In[68]:= Parallelize[Map[fun[#]&,b],Method->"FinestGrained"]//AbsoluteTiming
Out[68]=
{20.319690,{4.73018,4.72562,4.7092,4.70702,4.74691,4.73888,4.72667,4.72734}}

In[63]:= ParallelMap[fun[#]&,b]//AbsoluteTiming
Out[63]=
{20.238199,{4.71829,4.71306,4.72525,4.71633,4.72851,4.76228,4.73172,4.76264}}

In[69]:= ParallelTable[fun[3],{i,1,8}]//AbsoluteTiming
Out[69]=
{20.372381,{4.72952,4.73892,4.72444,4.74621,4.71921,4.7143,4.7259,4.71445}}

- Jaebum



David Bailey wrote:
> pratip wrote:
>
>> Hi Everybody,
>>
>> Recently I was looking through many parallel computation example in
>> the documentation of Mathematica 7.0.1. If not very clear and adequate
>> those documentation looks pretty impressive at the first glance. Hence
>> I decided to do some Mathematica implementation of the small piece of
>> software named Super Pi which is very famous among the common over
>> clockers. It computes Pi up to a user defined decimal digits but in
>> parallel using all the cores of your processor. Have look
>> http://files.extremeoverclocking.com/file.php?f=36
>> So my goal was to write a pure Mathematica code that computes Pi up to
>> three million decimal digits eight times in parallel using the eight
>> kernels available in my pc. However to compute this task once in my pc
>> it requires just around 3.885 seconds (with Intel Core i7 975 extreme
>> processor).
>>
>> fun[n_]:=Module[{a,tic,toc},
>> tic=TimeUsed[];
>> a=N[Pi,n*10^6];
>> toc=TimeUsed[];
>> toc-tic
>> ];
>> (*For 3 million decimal digits*)
>> In[24]:= fun[3]
>> Out[24]= 3.885
>>
>> Now let's see the parallel configuration of the PC. One can see that I
>> indeed have eight kernels present in the system.
>>
>> In[16]:= ParallelEvaluate[$ProcessID]
>> Out[16]= {6712,6636,7928,4112,7196,5832,3992,7484}
>>
>> In[17]:= ParallelEvaluate[$MachineName]
>> Out[17]= {flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-
>> pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc}
>>
>> Now to compute the same thing eight times but in parallel I tried the
>> following combinations with no success at all. See yourself the
>> disappointing timing results.
>>
>> First:
>> In[2]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#]
>> &,b],Method->"CoarsestGrained"];
>> toc=TimeUsed[];
>> toc-tic
>> Out[4]= 30.935
>>
>> Second:
>> In[11]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#=
>> ]
>> &,b],Method->"FinestGrained"];
>> toc=TimeUsed[];
>> toc-tic
>> Out[13]= 30.872
>>
>> Third:
>> In[18]:= ParallelMap[fun[#] &, b] // Timing
>>
>> Out[18]= {30.81, {3.884, 3.822, 3.854, 3.853, 3.837, 3.869, 3.822,
>> 3.869}}
>>
>> Fourth:
>> In[21]:= ParallelTable[fun[3],{i,1,8}]//Timing
>> Out[21]= {30.747,{3.868,3.807,3.837,3.838,3.806,3.854,3.884,3.853}}
>>
>> Now finally to validate the fact that in spite of all these parallel
>> commands only one single kernel is getting used by Mathematica we map
>> our function over a list of eight threes b={3,3,3,3,3,3,3,3} and get
>> the total time for the repetitive computation.
>>
>> Validation of the claim:
>> In[16]:= Map[fun[#]&,b]//Timing
>> Out[16]= {30.748,{3.854,3.822,3.853,3.838,3.837,3.822,3.869,3.853}}
>>
>> This shows that parallel commands used in the above codes had been
>> simply useless.
>>
>> I will highly appreciate if any of you guys can shade some light on
>> this problem. It is very basic in nature but the idea involved is
>> quite central in parallel computing. What I expect is that a neat and
>> clean Mathematica code can be written for this problem that will bring
>> the computation time to somewhere around 6-8 seconds in place of 30-31
>> seconds as we have seen above. I will continue trying on the problem
>> but in the mean time if any of you want to give it a try.
>>
>> With best regards to all.
>>
>> Pratip Chakraborty
>>
>>
> Since nobody else has responded, I think you need to launch some kernels
> before running parallel tasks - but I have not really used this feature
> of Mathematica.
>
> LaunchKernels[]
>
> David Bailey
> http://www.dbaileyconsultancy.co.uk
>
>