A question about parallel computation in mathematica [Mathematica]

Prev: Memory problems when solving equations
Next: SumOfSquaresRepresentations

From: pratip on 23 Sep 2009 23:50

Hi Everybody,

Recently I was looking through many parallel computation example in
the documentation of Mathematica 7.0.1. If not very clear and adequate
those documentation looks pretty impressive at the first glance. Hence
I decided to do some Mathematica implementation of the small piece of
software named Super Pi which is very famous among the common over
clockers. It computes Pi up to a user defined decimal digits but in
parallel using all the cores of your processor. Have look
http://files.extremeoverclocking.com/file.php?f=36
So my goal was to write a pure Mathematica code that computes Pi up to
three million decimal digits eight times in parallel using the eight
kernels available in my pc. However to compute this task once in my pc
it requires just around 3.885 seconds (with Intel Core i7 975 extreme
processor).

fun[n_]:=Module[{a,tic,toc},
tic=TimeUsed[];
a=N[Pi,n*10^6];
toc=TimeUsed[];
toc-tic
];
(*For 3 million decimal digits*)
In[24]:= fun[3]
Out[24]= 3.885

Now let's see the parallel configuration of the PC. One can see that I
indeed have eight kernels present in the system.

In[16]:= ParallelEvaluate[$ProcessID]
Out[16]= {6712,6636,7928,4112,7196,5832,3992,7484}

In[17]:= ParallelEvaluate[$MachineName]
Out[17]= {flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-
pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc}

Now to compute the same thing eight times but in parallel I tried the
following combinations with no success at all. See yourself the
disappointing timing results.

First:
In[2]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#]
&,b],Method->"CoarsestGrained"];
toc=TimeUsed[];
toc-tic
Out[4]= 30.935

Second:
In[11]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#=
]
&,b],Method->"FinestGrained"];
toc=TimeUsed[];
toc-tic
Out[13]= 30.872

Third:
In[18]:= ParallelMap[fun[#] &, b] // Timing

Out[18]= {30.81, {3.884, 3.822, 3.854, 3.853, 3.837, 3.869, 3.822,
3.869}}

Fourth:
In[21]:= ParallelTable[fun[3],{i,1,8}]//Timing
Out[21]= {30.747,{3.868,3.807,3.837,3.838,3.806,3.854,3.884,3.853}}

Now finally to validate the fact that in spite of all these parallel
commands only one single kernel is getting used by Mathematica we map
our function over a list of eight threes b={3,3,3,3,3,3,3,3} and get
the total time for the repetitive computation.

Validation of the claim:
In[16]:= Map[fun[#]&,b]//Timing
Out[16]= {30.748,{3.854,3.822,3.853,3.838,3.837,3.822,3.869,3.853}}

This shows that parallel commands used in the above codes had been
simply useless.

I will highly appreciate if any of you guys can shade some light on
this problem. It is very basic in nature but the idea involved is
quite central in parallel computing. What I expect is that a neat and
clean Mathematica code can be written for this problem that will bring
the computation time to somewhere around 6-8 seconds in place of 30-31
seconds as we have seen above. I will continue trying on the problem
but in the mean time if any of you want to give it a try.

With best regards to all.

Pratip Chakraborty

From: David Bailey on 27 Sep 2009 21:38

pratip wrote:
> Hi Everybody,
>
> Recently I was looking through many parallel computation example in
> the documentation of Mathematica 7.0.1. If not very clear and adequate
> those documentation looks pretty impressive at the first glance. Hence
> I decided to do some Mathematica implementation of the small piece of
> software named Super Pi which is very famous among the common over
> clockers. It computes Pi up to a user defined decimal digits but in
> parallel using all the cores of your processor. Have look
> http://files.extremeoverclocking.com/file.php?f=36
> So my goal was to write a pure Mathematica code that computes Pi up to
> three million decimal digits eight times in parallel using the eight
> kernels available in my pc. However to compute this task once in my pc
> it requires just around 3.885 seconds (with Intel Core i7 975 extreme
> processor).
>
> fun[n_]:=Module[{a,tic,toc},
> tic=TimeUsed[];
> a=N[Pi,n*10^6];
> toc=TimeUsed[];
> toc-tic
> ];
> (*For 3 million decimal digits*)
> In[24]:= fun[3]
> Out[24]= 3.885
>
> Now let's see the parallel configuration of the PC. One can see that I
> indeed have eight kernels present in the system.
>
> In[16]:= ParallelEvaluate[$ProcessID]
> Out[16]= {6712,6636,7928,4112,7196,5832,3992,7484}
>
> In[17]:= ParallelEvaluate[$MachineName]
> Out[17]= {flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-
> pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc}
>
> Now to compute the same thing eight times but in parallel I tried the
> following combinations with no success at all. See yourself the
> disappointing timing results.
>
> First:
> In[2]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#]
> &,b],Method->"CoarsestGrained"];
> toc=TimeUsed[];
> toc-tic
> Out[4]= 30.935
>
> Second:
> In[11]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#=
> ]
> &,b],Method->"FinestGrained"];
> toc=TimeUsed[];
> toc-tic
> Out[13]= 30.872
>
> Third:
> In[18]:= ParallelMap[fun[#] &, b] // Timing
>
> Out[18]= {30.81, {3.884, 3.822, 3.854, 3.853, 3.837, 3.869, 3.822,
> 3.869}}
>
> Fourth:
> In[21]:= ParallelTable[fun[3],{i,1,8}]//Timing
> Out[21]= {30.747,{3.868,3.807,3.837,3.838,3.806,3.854,3.884,3.853}}
>
> Now finally to validate the fact that in spite of all these parallel
> commands only one single kernel is getting used by Mathematica we map
> our function over a list of eight threes b={3,3,3,3,3,3,3,3} and get
> the total time for the repetitive computation.
>
> Validation of the claim:
> In[16]:= Map[fun[#]&,b]//Timing
> Out[16]= {30.748,{3.854,3.822,3.853,3.838,3.837,3.822,3.869,3.853}}
>
> This shows that parallel commands used in the above codes had been
> simply useless.
>
> I will highly appreciate if any of you guys can shade some light on
> this problem. It is very basic in nature but the idea involved is
> quite central in parallel computing. What I expect is that a neat and
> clean Mathematica code can be written for this problem that will bring
> the computation time to somewhere around 6-8 seconds in place of 30-31
> seconds as we have seen above. I will continue trying on the problem
> but in the mean time if any of you want to give it a try.
>
> With best regards to all.
>
> Pratip Chakraborty
>
Since nobody else has responded, I think you need to launch some kernels
before running parallel tasks - but I have not really used this feature
of Mathematica.

LaunchKernels[]

David Bailey
http://www.dbaileyconsultancy.co.uk

From: Patrick Scheibe on 29 Sep 2009 07:42

Hi,

your code is completely useless since I don't see why one should compute
the same result eight times. But here is what you missed:

fun[n_] := First(a)AbsoluteTiming@N[Pi, n*10^6]
ParallelTable[fun[3], {i, 1, 4}] // AbsoluteTiming
DistributeDefinitions[fun];
ParallelTable[fun[3], {i, 1, 4}] // AbsoluteTiming

{29.608226, {6.893410, 6.849890, 6.845198, 6.848202}}

{10.246625, {9.339382, 10.221913, 9.790986, 9.587946}}

you should read ParallelTools/tutorial/Overview first!

Cheers
Patrick

On Wed, 2009-09-23 at 23:50 -0400, pratip wrote:
> Hi Everybody,
>
> Recently I was looking through many parallel computation example in
> the documentation of Mathematica 7.0.1. If not very clear and adequate
> those documentation looks pretty impressive at the first glance. Hence
> I decided to do some Mathematica implementation of the small piece of
> software named Super Pi which is very famous among the common over
> clockers. It computes Pi up to a user defined decimal digits but in
> parallel using all the cores of your processor. Have look
> http://files.extremeoverclocking.com/file.php?f=36
> So my goal was to write a pure Mathematica code that computes Pi up to
> three million decimal digits eight times in parallel using the eight
> kernels available in my pc. However to compute this task once in my pc
> it requires just around 3.885 seconds (with Intel Core i7 975 extreme
> processor).
>
> fun[n_]:=Module[{a,tic,toc},
> tic=TimeUsed[];
> a=N[Pi,n*10^6];
> toc=TimeUsed[];
> toc-tic
> ];
> (*For 3 million decimal digits*)
> In[24]:= fun[3]
> Out[24]= 3.885
>
> Now let's see the parallel configuration of the PC. One can see that I
> indeed have eight kernels present in the system.
>
> In[16]:= ParallelEvaluate[$ProcessID]
> Out[16]= {6712,6636,7928,4112,7196,5832,3992,7484}
>
> In[17]:= ParallelEvaluate[$MachineName]
> Out[17]= {flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-
> pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc}
>
> Now to compute the same thing eight times but in parallel I tried the
> following combinations with no success at all. See yourself the
> disappointing timing results.
>
> First:
> In[2]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#]
> &,b],Method->"CoarsestGrained"];
> toc=TimeUsed[];
> toc-tic
> Out[4]= 30.935
>
> Second:
> In[11]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#=
> ]
> &,b],Method->"FinestGrained"];
> toc=TimeUsed[];
> toc-tic
> Out[13]= 30.872
>
> Third:
> In[18]:= ParallelMap[fun[#] &, b] // Timing
>
> Out[18]= {30.81, {3.884, 3.822, 3.854, 3.853, 3.837, 3.869, 3.822,
> 3.869}}
>
> Fourth:
> In[21]:= ParallelTable[fun[3],{i,1,8}]//Timing
> Out[21]= {30.747,{3.868,3.807,3.837,3.838,3.806,3.854,3.884,3.853}}
>
> Now finally to validate the fact that in spite of all these parallel
> commands only one single kernel is getting used by Mathematica we map
> our function over a list of eight threes b={3,3,3,3,3,3,3,3} and get
> the total time for the repetitive computation.
>
> Validation of the claim:
> In[16]:= Map[fun[#]&,b]//Timing
> Out[16]= {30.748,{3.854,3.822,3.853,3.838,3.837,3.822,3.869,3.853}}
>
> This shows that parallel commands used in the above codes had been
> simply useless.
>
> I will highly appreciate if any of you guys can shade some light on
> this problem. It is very basic in nature but the idea involved is
> quite central in parallel computing. What I expect is that a neat and
> clean Mathematica code can be written for this problem that will bring
> the computation time to somewhere around 6-8 seconds in place of 30-31
> seconds as we have seen above. I will continue trying on the problem
> but in the mean time if any of you want to give it a try.
>
> With best regards to all.
>
> Pratip Chakraborty
>

From: sakra on 29 Sep 2009 07:43

On Sep 24, 5:50 am, pratip <pratip.chakrabo...(a)gmail.com> wrote:
> Hi Everybody,
>
> Recently I was looking through many parallel computation example in
> the documentation of Mathematica 7.0.1. If not very clear and adequate
> those documentation looks pretty impressive at the first glance. Hence
> I decided to do some Mathematica implementation of the small piece of
> software named Super Pi which is very famous among the common over
> clockers. It computes Pi up to a user defined decimal digits but in
> parallel using all the cores of your processor. Have lookhttp://files.ext=
remeoverclocking.com/file.php?f=36
> So my goal was to write a pure Mathematica code that computes Pi up to
> three million decimal digits eight times in parallel using the eight
> kernels available in my pc. However to compute this task once in my pc
> it requires just around 3.885 seconds (with Intel Core i7 975 extreme
> processor).
>
> fun[n_]:=Module[{a,tic,toc},
> tic=TimeUsed[];
> a=N[Pi,n*10^6];
> toc=TimeUsed[];
> toc-tic
> ];
> (*For 3 million decimal digits*)
> In[24]:= fun[3]
> Out[24]= 3.885
>
> Now let's see the parallel configuration of the PC. One can see that I
> indeed have eight kernels present in the system.
>
> In[16]:= ParallelEvaluate[$ProcessID]
> Out[16]= {6712,6636,7928,4112,7196,5832,3992,7484}
>
> In[17]:= ParallelEvaluate[$MachineName]
> Out[17]= {flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-
> pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc}
>
> Now to compute the same thing eight times but in parallel I tried the
> following combinations with no success at all. See yourself the
> disappointing timing results.
>
> First:
> In[2]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[=
#]
> &,b],Method->"CoarsestGrained"];
> toc=TimeUsed[];
> toc-tic
> Out[4]= 30.935
>
> Second:
> In[11]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun=
[#=
> ]
> &,b],Method->"FinestGrained"];
> toc=TimeUsed[];
> toc-tic
> Out[13]= 30.872
>
> Third:
> In[18]:= ParallelMap[fun[#] &, b] // Timing
>
> Out[18]= {30.81, {3.884, 3.822, 3.854, 3.853, 3.837, 3.869, 3.822,
> 3.869}}
>
> Fourth:
> In[21]:= ParallelTable[fun[3],{i,1,8}]//Timing
> Out[21]= {30.747,{3.868,3.807,3.837,3.838,3.806,3.854,3.884,3.853}}
>
> Now finally to validate the fact that in spite of all these parallel
> commands only one single kernel is getting used by Mathematica we map
> our function over a list of eight threes b={3,3,3,3,3,3,3,3} and get
> the total time for the repetitive computation.
>
> Validation of the claim:
> In[16]:= Map[fun[#]&,b]//Timing
> Out[16]= {30.748,{3.854,3.822,3.853,3.838,3.837,3.822,3.869,3.853}}
>
> This shows that parallel commands used in the above codes had been
> simply useless.
>
> I will highly appreciate if any of you guys can shade some light on
> this problem. It is very basic in nature but the idea involved is
> quite central in parallel computing. What I expect is that a neat and
> clean Mathematica code can be written for this problem that will bring
> the computation time to somewhere around 6-8 seconds in place of 30-31
> seconds as we have seen above. I will continue trying on the problem
> but in the mean time if any of you want to give it a try.
>
> With best regards to all.
>
> Pratip Chakraborty

Before running any parallel computation you have to make the
definition of the function fun available on the compute kernels by
entering:

DistributeDefinitions[fun]

Symbols defined in the controller kernel do not become available
automatically on the compute kernels. Unless the definition of the
function fun is available, a compute kernel cannot reduce an
expression involving the symbol fun. The expression will thus be
reduced on the controller kernel instead. This explains why only a
single kernel (the controller kernel) is actually used in your tests.

Sascha

From: Vince on 29 Sep 2009 07:45

On Sep 23, 11:50 pm, pratip <pratip.chakrabo...(a)gmail.com> wrote:
> Hi Everybody,
>
> Recently I was looking through many parallel computation example in
> the documentation of Mathematica 7.0.1. If not very clear and adequate
> those documentation looks pretty impressive at the first glance. Hence
> I decided to do some Mathematica implementation of the small piece of
> software named Super Pi which is very famous among the common over
> clockers. It computes Pi up to a user defined decimal digits but in
> parallel using all the cores of your processor. Have lookhttp://files.ext=
remeoverclocking.com/file.php?f=36
> So my goal was to write a pure Mathematica code that computes Pi up to
> three million decimal digits eight times in parallel using the eight
> kernels available in my pc. However to compute this task once in my pc
> it requires just around 3.885 seconds (with Intel Core i7 975 extreme
> processor).
>
> fun[n_]:=Module[{a,tic,toc},
> tic=TimeUsed[];
> a=N[Pi,n*10^6];
> toc=TimeUsed[];
> toc-tic
> ];
> (*For 3 million decimal digits*)
> In[24]:= fun[3]
> Out[24]= 3.885
>
> Now let's see the parallel configuration of the PC. One can see that I
> indeed have eight kernels present in the system.
>
> In[16]:= ParallelEvaluate[$ProcessID]
> Out[16]= {6712,6636,7928,4112,7196,5832,3992,7484}
>
> In[17]:= ParallelEvaluate[$MachineName]
> Out[17]= {flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-
> pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc}
>
> Now to compute the same thing eight times but in parallel I tried the
> following combinations with no success at all. See yourself the
> disappointing timing results.
>
> First:
> In[2]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[=
#]
> &,b],Method->"CoarsestGrained"];
> toc=TimeUsed[];
> toc-tic
> Out[4]= 30.935
>
> Second:
> In[11]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun=
[#=
> ]
> &,b],Method->"FinestGrained"];
> toc=TimeUsed[];
> toc-tic
> Out[13]= 30.872
>
> Third:
> In[18]:= ParallelMap[fun[#] &, b] // Timing
>
> Out[18]= {30.81, {3.884, 3.822, 3.854, 3.853, 3.837, 3.869, 3.822,
> 3.869}}
>
> Fourth:
> In[21]:= ParallelTable[fun[3],{i,1,8}]//Timing
> Out[21]= {30.747,{3.868,3.807,3.837,3.838,3.806,3.854,3.884,3.853}}
>
> Now finally to validate the fact that in spite of all these parallel
> commands only one single kernel is getting used by Mathematica we map
> our function over a list of eight threes b={3,3,3,3,3,3,3,3} and get
> the total time for the repetitive computation.
>
> Validation of the claim:
> In[16]:= Map[fun[#]&,b]//Timing
> Out[16]= {30.748,{3.854,3.822,3.853,3.838,3.837,3.822,3.869,3.853}}
>
> This shows that parallel commands used in the above codes had been
> simply useless.
>
> I will highly appreciate if any of you guys can shade some light on
> this problem. It is very basic in nature but the idea involved is
> quite central in parallel computing. What I expect is that a neat and
> clean Mathematica code can be written for this problem that will bring
> the computation time to somewhere around 6-8 seconds in place of 30-31
> seconds as we have seen above. I will continue trying on the problem
> but in the mean time if any of you want to give it a try.
>
> With best regards to all.
>
> Pratip Chakraborty

Pratip,

You should see a linear speedup if you precede ParallelMap with
DistributeDefinitions[fun]. Worked for me, with your code. Without it,
I saw the same sequential behavior as you (no time to drill into that
now).

Vince Virgilio

| Next | Last
Pages: 1 2
Prev: Memory problems when solving equations
Next: SumOfSquaresRepresentations