Larrabee delayed: anyone know what's happening? [Computer Architecture]

Prev: PEEEEEEP
Next: Texture units as a general function

From: nmm1 on 29 Dec 2009 05:07

In article <600p07-9ns.ln1(a)ntp.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>
>>> Sorry; don't see how this suffers from any kind of degradation. Could
>>> you elaborate?
>>
>> Because, if simultaneous updates of the same cache line are not
>> coherent (in whatever sense required by the language), that code is
>> incorrect. This has nothing to do with performance, and a great
>> deal to do with programs giving occasional, non-repeatable wrong
>> answers.
>
>So then you need the compiler to know that all accesses within a cache
>line (or whatever the coherency size is) of both edges of the range are
>special, and must be handled with totally separate instructions, right?

Right.

>Sounds like something it should be trivial to add to any given C
>compiler! :-)
>
>Actually it seems to me that a compiler which supports OpenMP or similar
>must do quite a bit of this when automatically partitioning a loop...

Usable OpenMP support appeared in Fortran two years before it appeared
in C; even today, OpenMP C and C++ compilers are, at best, likely to
produce code that fails in obscure ways (even for 'correct' OpenMP
usage) and/or generate inefficient code. A lot of experts recommend
using only Fortran with OpenMP.

Regards,
Nick Maclaren.

From: Mayan Moudgill on 29 Dec 2009 08:34

nmm1(a)cam.ac.uk wrote:
> In article <MMOdnSSpXI_wwaTWnZ2dnUVZ_tGdnZ2d(a)bestweb.net>,
> Mayan Moudgill <mayan(a)bestweb.net> wrote:
>
>>>>Sorry, don't understand. Can you give a more concrete example? What do
>>>>you mean by "byte-separation accesses"?
>>>
>
> Because, if simultaneous updates of the same cache line are not
> coherent (in whatever sense required by the language), that code is
> incorrect.

I see. You are worried that, given code of the form:

for p in processors
x[p] = foo(p,...)

in the case of *non-coherent* memory, the writeback of x[M] may
overwrite the already written value of x[M-1] (or any other value of x
sharing a cache line with x[M]).

This is exacerbated if x[] is a byte array, since it increases the
chances of collisions, and because one hardware workaround (write masks)
would require more bits (you need to keep track of dirty bytes as
opposed to dirty words).

If I have understood your problem properly, why would making the entire
array write-through not be an adequate solution?

From: nmm1 on 29 Dec 2009 09:04

In article <wq-dnZv7vorjmKfWnZ2dnUVZ_uWdnZ2d(a)bestweb.net>,
Mayan Moudgill <mayan(a)bestweb.net> wrote:
>
>I see. You are worried that, given code of the form:
>
> for p in processors
> x[p] = foo(p,...)
>
>in the case of *non-coherent* memory, the writeback of x[M] may
>overwrite the already written value of x[M-1] (or any other value of x
>sharing a cache line with x[M]).

Not really. That's only the case where simultaneous use of a cache
line by two threads is undefined (except for both read-only). That
is the extreme case of non-coherence, but not the only one.

> This is exacerbated if x[] is a byte array, since it increases the
>chances of collisions, and because one hardware workaround (write masks)
>would require more bits (you need to keep track of dirty bytes as
>opposed to dirty words).

Yes.

>If I have understood your problem properly, why would making the entire
>array write-through not be an adequate solution?

Because the real problem is consistency. If threads A and B update
the same cache line, and threads C and D read it, then thread C may
see a sequence of events that is incompatible with that seen by
thread D. Almost always, that will go unnoticed, but occasionally
it will cause the program to go wrong.

Yes, synchronous write-through would solve it, but that's equivalent
to disabling caching on updates. NOT good for performance.

It's a foul problem, and current architectures merely kludge it up
enough for the problems to be very rare indeed, so almost everyone
who gets caught by it puts it down to transient hardware errors, or
simply gremlins. There are probably only a few hundred people who
have positively identified it as the cause of a particular failure;
yes, it's that foul.

Regards,
Nick Maclaren.

From: Mayan Moudgill on 29 Dec 2009 09:22

nmm1(a)cam.ac.uk wrote:
> In article <wq-dnZv7vorjmKfWnZ2dnUVZ_uWdnZ2d(a)bestweb.net>,
> Mayan Moudgill <mayan(a)bestweb.net> wrote:
>
>>I see. You are worried that, given code of the form:
>>
>> for p in processors
>> x[p] = foo(p,...)
>>
>>in the case of *non-coherent* memory, the writeback of x[M] may
>>overwrite the already written value of x[M-1] (or any other value of x
>>sharing a cache line with x[M]).
>
> Yes.
>
>
>>If I have understood your problem properly, why would making the entire
>>array write-through not be an adequate solution?
>
>
> Because the real problem is consistency. If threads A and B update
> the same cache line, and threads C and D read it, then thread C may
> see a sequence of events that is incompatible with that seen by
> thread D. Almost always, that will go unnoticed, but occasionally
> it will cause the program to go wrong.

OK, so lets denote the new values of the array as x[]', and the old
values plain x[]. Then, the (ordering) inconsistency would arise when C
reads x[A]',x[B] whilst D reads x[B]',x[A].

I find it somewhat hard to imagine that occuring; what it would mean is
that there was no synchronization for the reads; i.e. the program did
not care if it read x[A] or x[A]'. If, however, there were some form of
synchronization, then it would be guaranteed that the correct version of
each variable was read and you wouldn't get this inconsistency.

I am sure there is something more to the problem; could you expand on it
a little more.

From: nmm1 on 29 Dec 2009 09:33

In article <3b6dneYekdE0jafWnZ2dnUVZ_t2dnZ2d(a)bestweb.net>,
Mayan Moudgill <mayan(a)bestweb.net> wrote:
>>
>>>If I have understood your problem properly, why would making the entire
>>>array write-through not be an adequate solution?
>>
>> Because the real problem is consistency. If threads A and B update
>> the same cache line, and threads C and D read it, then thread C may
>> see a sequence of events that is incompatible with that seen by
>> thread D. Almost always, that will go unnoticed, but occasionally
>> it will cause the program to go wrong.
>
>OK, so lets denote the new values of the array as x[]', and the old
>values plain x[]. Then, the (ordering) inconsistency would arise when C
>reads x[A]',x[B] whilst D reads x[B]',x[A].

Yes, roughly.

>I find it somewhat hard to imagine that occuring; what it would mean is
>that there was no synchronization for the reads; i.e. the program did
>not care if it read x[A] or x[A]'. If, however, there were some form of
>synchronization, then it would be guaranteed that the correct version of
>each variable was read and you wouldn't get this inconsistency.

Not at all. What happens is that the inconsistency shows up in more
complicated code, and breaks a (reasonable) constraint assumed by the
code. It is possible to describe the problem very simply, but not to
explain why it causes so much trouble. However, it does, and it is
one of the classic problems of shared-memory parallelism.

Regards,
Nick Maclaren.

First | Prev | Next | Last
Pages: 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
Prev: PEEEEEEP
Next: Texture units as a general function