Prev: From Scripting to Scaling: Multi-core is challenging even the most battle-scared programmer
Next: Coarray Fortran
From: sturlamolden on 14 Jul 2010 10:02 On 2 Jul, 21:35, "Colin Watters" <b...(a)qomputing.com> wrote: > OpenMP's single process means little or no communications overhead. That is not true. It is just hidden from the programmer. Processors must communicate to maintain cache coherence, but it's in the hardware so it is invisible to the programmer. But it is there and it has a significant overhead. And there is the issue of "false sharing", whcih can lead to erratic communication of junk. This is what happens when different processors update different parts of the same cache line. A processor updates a variable that is not shared, but cannot not know it'snot shared, so the other copies of the cache line becomes invalidated. And now the other processors must update their cache line, to synchronize a variable they don't use, and this has to be communicated from the processor that did the update. Modern processors are also pipelined (i.e. work like a assembly line) and use branch-prediction. The benefit from this may go away if they work on the same data. Pentium IV has a 20 stage pipeline. That is very important to keep in mind. If you for example have 4 such processors working in parallel, each one of them could in the worst case do 5 times as much work on it's own. It depends on the algorithm and the predictability. Pipelining is too easily forgotten when parallel computing is discussed. > MPI's multiple processes demand Inter Process Communication (IPC), which is > a big learning curve, and a lot of work up front MPI hide away all the details. You don't have to care if it is shared memory, tcp/ip, Unix sockets, pipes, or whatever. But using IPC instead of just adding "!$omp parallel do" above a loop takes more work, yes.
From: sturlamolden on 14 Jul 2010 11:09 On 14 Jul, 15:56, n...(a)cam.ac.uk wrote: > Er, no, sorry. There is MORE room for mistakes, deadlocks, livelocks > and race conditions. Most of the people I know of who have tried > OpenMP have hit a failure caused by those, completely failed to > locate it, and gone back to MPI because it's easier. You either > have a very simple task, or are doing very well. Short ansver: I mostly use OpenMP to vectorize loops in my Fortran and C codes. Any fancy task and i/o I do in Python. I also perfer to use parallel libraries if I can. Most of the time I benefit from parallel processing it is due to linear algebra or FFTs. That can be delegated to libraries (e.g. GotoBLAS, MKL, and FFTW has this built in). My proposition is that >50% of cases for "multi- threading" due to speed concerns are actually lack of sophistication in linear algebra, resulting in failures to use BLAS or LAPACK instead.
From: nmm1 on 14 Jul 2010 11:16 In article <a058c73a-72b7-4c37-b009-2c080b8f49ea(a)q22g2000yqm.googlegroups.com>, sturlamolden <sturlamolden(a)yahoo.no> wrote: > >> Er, no, sorry. =A0There is MORE room for mistakes, deadlocks, livelocks >> and race conditions. =A0Most of the people I know of who have tried >> OpenMP have hit a failure caused by those, completely failed to >> locate it, and gone back to MPI because it's easier. =A0You either >> have a very simple task, or are doing very well. > >Short ansver: I mostly use OpenMP to vectorize loops in my Fortran and >C codes. Any fancy task and i/o I do in Python. Right. Vectorising loops is the the case where it is a LOT easier and more reliable to use OpenMP, provided that the loops are purely numeric (i.e. include no I/O or anything tricky). Regards, Nick Maclaren.
From: Victor Eijkhout on 14 Jul 2010 15:50 <nmm1(a)cam.ac.uk> wrote: > In article <1jlm3dl.110jatrru0ngcN%see(a)sig.for.address>, > Victor Eijkhout <see(a)sig.for.address> wrote: > >sturlamolden <sturlamolden(a)yahoo.no> wrote: > > > >> OpenMP uses a > >> "shared memory" model, which is harder to implement on a cluster > >> architecture. But it has been done too. > > > >What are you thinking of? > > Intel Cluster Tools. Hm. I watched a video on the Intel site, and there is no hint of distributed shared memory. (MPI, Tracers, math libraries, but nothing deeper.) http://software.intel.com/en-us/intel-cluster-toolkit/ Can you give me a more specific pointer? Victor. -- Victor Eijkhout -- eijkhout at tacc utexas edu
From: nmm1 on 14 Jul 2010 16:23
In article <1jlmjr4.1h9muj0pihqb1N%see(a)sig.for.address>, Victor Eijkhout <see(a)sig.for.address> wrote: > >> >> OpenMP uses a >> >> "shared memory" model, which is harder to implement on a cluster >> >> architecture. But it has been done too. >> > >> >What are you thinking of? >> >> Intel Cluster Tools. > >Hm. I watched a video on the Intel site, and there is no hint of >distributed shared memory. (MPI, Tracers, math libraries, but nothing >deeper.) > >http://software.intel.com/en-us/intel-cluster-toolkit/ > >Can you give me a more specific pointer? No, sorry. I know for certain that they supported it, but I never had time to investigate myself. Regards, Nick Maclaren. |