Prev: Any way to make "iterb" in Mathematica 7.0 compatible with older
Next: Repeat Data reading in one file
From: K on 15 Dec 2009 07:33 Hi, I was trying to evaluate definite integrals of different product combinations of trigonometric functions like so: ClearSystemCache[]; AbsoluteTiming[ Table[Integrate[ Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}], {nn, 0, 15}, {mm, 0, 15}];] I included ClearSystemCache[] to get comparable results for successive runs. Output of the actual matrix result is suppressed. On my dual core AMD, I got this result from Mathematica 7.0.1 (Linux x86 64-bit) for the above command: {65.240614, Null} Now I thought that this computation could be almost perfectly parallelized by having, e.g., nn = 0,...,7 evaluated by one kernel and nn=8, ..., 15 by the other and typed: ParallelEvaluate[ClearSystemCache[]]; AbsoluteTiming[ ParallelTable[ Integrate[ Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}], {nn, 0, 15}, {mm, 0, 15}, Method -> "CoarsestGrained"];] The result, however, was disappointing: {76.993888, Null} By the way, Kernel[] returns: {KernelObject[1,local],KernelObject[2,local]} This seems to me that the parallel command should in fact have been evaluated by two kernels. With Method-> "CoarsestGrained", I hoped to obtain the data splitting I mentioned above. If I do the splitting and combining myself, it gets even a bit worse: ParallelEvaluate[ClearSystemCache[]]; AbsoluteTiming[ job1=ParallelSubmit[Table[Integrate[Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos [mm*ph],{ph, Pi/2,Pi}],{nn,0,7},{mm,0,15}]]; job2=ParallelSubmit[Table[Integrate[Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos [mm*ph],{ph, Pi/2,Pi}],{nn,8,15},{mm,0,15}]]; {res1,res2}=WaitAll[{job1,job2}]; Flatten[{{res1},{res2}},2];] Result is here: {78.669442,Null} I can't believe that the splitting and combining overhead on a single machine (no network involved here) can eat up all the gain from distributing the actual workload to two kernels. Does anyone have an idea what is going wrong here? Thanks, K.
From: Eric Wort on 16 Dec 2009 06:18 Hi K, Some processors support adjusting their clock speed on the fly, and in Linux if a process has a low priority the machine will stay running at a lower clock speed if there are only low priority threads competing for cpu time. By default, Mathematica launches subkernels with a lower than standard priority, which can often cause this issue. If you look in the Parallel tab of the Preferences dialog, there is an option entitled "Run kernels at a lower process priority". Make sure that this is not checked if you want the subkernels to run as quickly as possible. I obtained the following results running your example on my system with the option unchecked: In[1]:= ClearSystemCache[]; AbsoluteTiming[ Table[Integrate[ Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}], {nn, 0, 15}, {mm, 0, 15}];] Out[2]= {37.142150, Null} In[3]:= LaunchKernels[2] Out[3]= {KernelObject[1, "local"], KernelObject[2, "local"]} In[4]:= ParallelEvaluate[ClearSystemCache[]]; AbsoluteTiming[ ParallelTable[ Integrate[ Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}], {nn, 0, 15}, {mm, 0, 15}, Method -> "CoarsestGrained"];] Out[5]= {23.712933, Null} Sincerely, Eric Wort K wrote: > Hi, > > I was trying to evaluate definite integrals of different product > combinations of trigonometric functions like so: > > ClearSystemCache[]; > AbsoluteTiming[ > Table[Integrate[ > Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}], {nn, 0, > 15}, {mm, 0, 15}];] > > I included ClearSystemCache[] to get comparable results for successive > runs. Output of the actual matrix result is suppressed. On my dual > core AMD, I got this result from Mathematica 7.0.1 (Linux x86 64-bit) > for the above command: > > {65.240614, Null} > > Now I thought that this computation could be almost perfectly > parallelized by having, e.g., nn = 0,...,7 evaluated by one kernel and > nn=8, ..., 15 by the other and typed: > > ParallelEvaluate[ClearSystemCache[]]; > AbsoluteTiming[ > ParallelTable[ > Integrate[ > Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}], {nn, 0, > 15}, {mm, 0, 15}, Method -> "CoarsestGrained"];] > > The result, however, was disappointing: > > {76.993888, Null} > > By the way, Kernel[] returns: > > {KernelObject[1,local],KernelObject[2,local]} > > This seems to me that the parallel command should in fact have been > evaluated by two kernels. With Method-> "CoarsestGrained", I hoped to > obtain the data splitting I mentioned above. If I do the splitting and > combining myself, it gets even a bit worse: > > ParallelEvaluate[ClearSystemCache[]]; > AbsoluteTiming[ > job1=ParallelSubmit[Table[Integrate[Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos > [mm*ph],{ph, Pi/2,Pi}],{nn,0,7},{mm,0,15}]]; > job2=ParallelSubmit[Table[Integrate[Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos > [mm*ph],{ph, Pi/2,Pi}],{nn,8,15},{mm,0,15}]]; > {res1,res2}=WaitAll[{job1,job2}]; > Flatten[{{res1},{res2}},2];] > > Result is here: > > {78.669442,Null} > > I can't believe that the splitting and combining overhead on a single > machine (no network involved here) can eat up all the gain from > distributing the actual workload to two kernels. Does anyone have an > idea what is going wrong here? > Thanks, > K. > >
From: Patrick Scheibe on 16 Dec 2009 06:21 Hi, Here (Ubuntu 64bit, 4 Cores, Mathematica 7.0.1) the timing is 53 for the serial evaluation and 22 sec for the parallel computation. If I try to minimize the data-transfer-overhead which arises when the kernels return their result, then the speed-up is more visible. Note the changed stepsize and the semicolon: ClearSystemCache[]; AbsoluteTiming[ Table[Integrate[ Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}];, {nn, 0, 15}, {mm, 0, 15, 1/2}];] needs 145.314899 seconds AbsoluteTiming[ ParallelTable[ Table[Integrate[ Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}];, {nn, 0, 15}], {mm, 0, 15, 1/2}] ;] needs 52.036152 seconds. Every call on a new Mathematica-session. Cheers Patrick On Tue, 2009-12-15 at 07:33 -0500, K wrote: > Hi, > > I was trying to evaluate definite integrals of different product > combinations of trigonometric functions like so: > > ClearSystemCache[]; > AbsoluteTiming[ > Table[Integrate[ > Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}], {nn, 0, > 15}, {mm, 0, 15}];] > > I included ClearSystemCache[] to get comparable results for successive > runs. Output of the actual matrix result is suppressed. On my dual > core AMD, I got this result from Mathematica 7.0.1 (Linux x86 64-bit) > for the above command: > > {65.240614, Null} > > Now I thought that this computation could be almost perfectly > parallelized by having, e.g., nn = 0,...,7 evaluated by one kernel and > nn=8, ..., 15 by the other and typed: > > ParallelEvaluate[ClearSystemCache[]]; > AbsoluteTiming[ > ParallelTable[ > Integrate[ > Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}], {nn, 0, > 15}, {mm, 0, 15}, Method -> "CoarsestGrained"];] > > The result, however, was disappointing: > > {76.993888, Null} > > By the way, Kernel[] returns: > > {KernelObject[1,local],KernelObject[2,local]} > > This seems to me that the parallel command should in fact have been > evaluated by two kernels. With Method-> "CoarsestGrained", I hoped to > obtain the data splitting I mentioned above. If I do the splitting and > combining myself, it gets even a bit worse: > > ParallelEvaluate[ClearSystemCache[]]; > AbsoluteTiming[ > job1=ParallelSubmit[Table[Integrate[Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos > [mm*ph],{ph, Pi/2,Pi}],{nn,0,7},{mm,0,15}]]; > job2=ParallelSubmit[Table[Integrate[Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos > [mm*ph],{ph, Pi/2,Pi}],{nn,8,15},{mm,0,15}]]; > {res1,res2}=WaitAll[{job1,job2}]; > Flatten[{{res1},{res2}},2];] > > Result is here: > > {78.669442,Null} > > I can't believe that the splitting and combining overhead on a single > machine (no network involved here) can eat up all the gain from > distributing the actual workload to two kernels. Does anyone have an > idea what is going wrong here? > Thanks, > K. >
From: Mark McClure on 16 Dec 2009 06:23 On Tue, Dec 15, 2009 at 7:33 AM, K <kgspga(a)googlemail.com> wrote: > I was trying to evaluate definite integrals of different product > combinations of trigonometric functions like so: > ... > Now I thought that this computation could be almost perfectly > parallelized by having, e.g., nn = 0,...,7 evaluated by one kernel > and nn=8, ..., 15 by the other and typed: > ... > The result, however, was disappointing: Two symbolic computations that appear superficially similar may actually take vastly different amounts of time to perform and there may be no general a priori way to determine which will take longer. Thus, the computation of a large number of symbolic computations typically parallelizes very poorly, since there is no way to break the problems up into parts that take comparable times. In particular, the integrals in your computation take a wide range of times to compute. Here's a simple illustration of the range of timings in your computation. ClearSystemCache[]; timings = Table[Timing[Integrate[ Sin[ph] Sin[nn*ph]*Cos[mm*ph]/(2 Pi), {ph, Pi/2, Pi}]][[1]], {nn, 0, 15}, {mm, 0, 15}]; ListPlot[Flatten[timings]] In contrast, here is a collection of trivial computations that take similar amounts of time. AbsoluteTiming[ Table[Total[RandomReal[{0, 1}, {500}]], {500}, {500}]; ] {2.781051, Null} In this case, we do gain the expected benifit by performing the computation in parallel. LaunchKernels[2]; AbsoluteTiming[ ParallelTable[Total[RandomReal[{0, 1}, {500}]], {500}, {500}]; ] {1.632608, Null} Mark McClure
From: K on 17 Dec 2009 07:29
Thank you all for your answers to my problem. Eric Wort's suggestion of unchecking the lower process priority option brought the timings down a bit, but not much. I'm now at 58 s/66 s for serial/parallel evaluation. I also unchecked "Enable parallel monitoring tools" to see whether the monitoring had any effect, but it didn't. Mark McClure's remark about the differing times for similar symbolic computation tasks was very valuable, and the list plot of timings is interesting to see. Timings can differ by a factor of 4 or more for the different integrations. The actual time consumed for one integral seems almost random. However, if I use Mark's code to generate the timings table and then sum up over the first half and the second half of the results, then we see that the total time is not as diverse as the single timings: In[8]:= Sum[Flatten[timings][[ii]],{ii,1,128}] Out[8]= 29.2016 In[9]:= Sum[Flatten[timings][[ii]],{ii,129,256}] Out[9]= 27.8508 Just in case, I also split the timings matrix along the other dimension: In[6]:= Sum[Flatten[Transpose[timings]][[ii]],{ii,1,128}] Out[6]= 21.7057 In[7]:= Sum[Flatten[Transpose[timings]][[ii]],{ii,129,256}] Out[7]= 35.3466 Here, we see a more noticeable difference. And indeed, if I watch the kernels in the parallel kernel status window or the usage of the cores in the ksysguard window of KDE, I find that one kernel finishes its work after practically half the time the other kernel uses. However, this behavior remains independently of which variable (nn or mm) I split. In ksysguard, I also noticed that the main kernel, the Mathematica gui, and a java process spawned by Mathematica together take up between 20%-40% of the CPU resources even after I just started Mathematica with an empty notebook. The MathKernel uses 10%-20% even if no computation is going on at all. Is that normal? I'm on Fedora 11, KDE 4.3.3 with Linux kernel 2.6.30.9-102.fc11.x86_64, the processor is an AMD Athlon(tm) 64 X2 Dual Core Processor 5600+. In the parallel kernel status window, the column Time shows usually a value of 25s for the master, then 30s for the first and 40s for the second local kernel for one evaluation of the integration matrix. Are the ratios of these values approximately what you get for the computation? Regards, K. On 15 Dec, 13:33, K <kgs...(a)googlemail.com> wrote: > Hi, > > I was trying to evaluate definite integrals of different product > combinations of trigonometric functions like so: > > ClearSystemCache[]; > AbsoluteTiming[ > Table[Integrate[ > Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}], {nn, = 0, > 15}, {mm, 0, 15}];] > > I included ClearSystemCache[] to get comparable results for successive > runs. Output of the actual matrix result is suppressed. On my dual > core AMD, I got this result from Mathematica 7.0.1 (Linux x86 64-bit) > for the above command: > > {65.240614, Null} > > Now I thought that this computation could be almost perfectly > parallelized by having, e.g., nn = 0,...,7 evaluated by one kernel and > nn=8, ..., 15 by the other and typed: > > ParallelEvaluate[ClearSystemCache[]]; > AbsoluteTiming[ > ParallelTable[ > Integrate[ > Sin[ph]*1/(2 Pi)*Sin[nn*ph]*Cos[mm*ph], {ph, Pi/2, Pi}], {nn, = 0, > 15}, {mm, 0, 15}, Method -> "CoarsestGrained"];] > > The result, however, was disappointing: > > {76.993888, Null} > > By the way, Kernel[] returns: > > {KernelObject[1,local],KernelObject[2,local]} > > This seems to me that the parallel command should in fact have been > evaluated by two kernels. With Method-> "CoarsestGrained", I hoped to > obtain the data splitting I mentioned above. If I do the splitting and > combining myself, it gets even a bit worse: > > ParallelEvaluate[ClearSystemCache[]]; > AbsoluteTiming[ > job1=ParallelSubmit[Table[Integrate[Sin[ph]*1/(2 Pi)*Sin[nn*ph]*= Cos > [mm*ph],{ph, Pi/2,Pi}],{nn,0,7},{mm,0,15}]]; > job2=ParallelSubmit[Table[Integrate[Sin[ph]*1/(2 Pi)*Sin[nn*ph]*= Cos > [mm*ph],{ph, Pi/2,Pi}],{nn,8,15},{mm,0,15}]]; > {res1,res2}=WaitAll[{job1,job2}]; > Flatten[{{res1},{res2}},2];] > > Result is here: > > {78.669442,Null} > > I can't believe that the splitting and combining overhead on a single > machine (no network involved here) can eat up all the gain from > distributing the actual workload to two kernels. Does anyone have an > idea what is going wrong here? > Thanks, > K. |