OpenMP & MPI [Fortran]

Prev: From Scripting to Scaling: Multi-core is challenging even the most battle-scared programmer
Next: Coarray Fortran

From: sturlamolden on 14 Jul 2010 10:02

On 2 Jul, 21:35, "Colin Watters" <b...(a)qomputing.com> wrote:

> OpenMP's single process means little or no communications overhead.

That is not true. It is just hidden from the programmer. Processors
must communicate to maintain cache coherence, but it's in the hardware
so it is invisible to the programmer. But it is there and it has a
significant overhead.

And there is the issue of "false sharing", whcih can lead to erratic
communication of junk. This is what happens when different processors
update different parts of the same cache line. A processor updates a
variable that is not shared, but cannot not know it'snot shared, so
the other copies of the cache line becomes invalidated. And now the
other processors must update their cache line, to synchronize a
variable they don't use, and this has to be communicated from the
processor that did the update.

Modern processors are also pipelined (i.e. work like a assembly line)
and use branch-prediction. The benefit from this may go away if they
work on the same data. Pentium IV has a 20 stage pipeline. That is
very important to keep in mind. If you for example have 4 such
processors working in parallel, each one of them could in the worst
case do 5 times as much work on it's own. It depends on the algorithm
and the predictability. Pipelining is too easily forgotten when
parallel computing is discussed.

> MPI's multiple processes demand Inter Process Communication (IPC), which is
> a big learning curve, and a lot of work up front

MPI hide away all the details. You don't have to care if it is shared
memory, tcp/ip, Unix sockets, pipes, or whatever. But using IPC
instead of just adding "!$omp parallel do" above a loop takes more
work, yes.

From: sturlamolden on 14 Jul 2010 11:09

On 14 Jul, 15:56, n...(a)cam.ac.uk wrote:

> Er, no, sorry. There is MORE room for mistakes, deadlocks, livelocks
> and race conditions. Most of the people I know of who have tried
> OpenMP have hit a failure caused by those, completely failed to
> locate it, and gone back to MPI because it's easier. You either
> have a very simple task, or are doing very well.

Short ansver: I mostly use OpenMP to vectorize loops in my Fortran and
C codes. Any fancy task and i/o I do in Python.

I also perfer to use parallel libraries if I can. Most of the time I
benefit from parallel processing it is due to linear algebra or FFTs.
That can be delegated to libraries (e.g. GotoBLAS, MKL, and FFTW has
this built in). My proposition is that >50% of cases for "multi-
threading" due to speed concerns are actually lack of sophistication
in linear algebra, resulting in failures to use BLAS or LAPACK
instead.

From: nmm1 on 14 Jul 2010 11:16

In article <a058c73a-72b7-4c37-b009-2c080b8f49ea(a)q22g2000yqm.googlegroups.com>,
sturlamolden <sturlamolden(a)yahoo.no> wrote:
>
>> Er, no, sorry. =A0There is MORE room for mistakes, deadlocks, livelocks
>> and race conditions. =A0Most of the people I know of who have tried
>> OpenMP have hit a failure caused by those, completely failed to
>> locate it, and gone back to MPI because it's easier. =A0You either
>> have a very simple task, or are doing very well.
>
>Short ansver: I mostly use OpenMP to vectorize loops in my Fortran and
>C codes. Any fancy task and i/o I do in Python.

Right. Vectorising loops is the the case where it is a LOT easier
and more reliable to use OpenMP, provided that the loops are purely
numeric (i.e. include no I/O or anything tricky).

Regards,
Nick Maclaren.

From: Victor Eijkhout on 14 Jul 2010 15:50

<nmm1(a)cam.ac.uk> wrote:

> In article <1jlm3dl.110jatrru0ngcN%see(a)sig.for.address>,
> Victor Eijkhout <see(a)sig.for.address> wrote:
> >sturlamolden <sturlamolden(a)yahoo.no> wrote:
> >
> >> OpenMP uses a
> >> "shared memory" model, which is harder to implement on a cluster
> >> architecture. But it has been done too.
> >
> >What are you thinking of?
>
> Intel Cluster Tools.

Hm. I watched a video on the Intel site, and there is no hint of
distributed shared memory. (MPI, Tracers, math libraries, but nothing
deeper.)

http://software.intel.com/en-us/intel-cluster-toolkit/

Can you give me a more specific pointer?

Victor.
--
Victor Eijkhout -- eijkhout at tacc utexas edu

From: nmm1 on 14 Jul 2010 16:23

In article <1jlmjr4.1h9muj0pihqb1N%see(a)sig.for.address>,
Victor Eijkhout <see(a)sig.for.address> wrote:
>
>> >> OpenMP uses a
>> >> "shared memory" model, which is harder to implement on a cluster
>> >> architecture. But it has been done too.
>> >
>> >What are you thinking of?
>>
>> Intel Cluster Tools.
>
>Hm. I watched a video on the Intel site, and there is no hint of
>distributed shared memory. (MPI, Tracers, math libraries, but nothing
>deeper.)
>
>http://software.intel.com/en-us/intel-cluster-toolkit/
>
>Can you give me a more specific pointer?

No, sorry. I know for certain that they supported it, but I never
had time to investigate myself.

Regards,
Nick Maclaren.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: From Scripting to Scaling: Multi-core is challenging even the most battle-scared programmer
Next: Coarray Fortran