Prev: PEEEEEEP
Next: Texture units as a general function
From: nmm1 on 29 Dec 2009 05:07 In article <600p07-9ns.ln1(a)ntp.tmsw.no>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > >>> Sorry; don't see how this suffers from any kind of degradation. Could >>> you elaborate? >> >> Because, if simultaneous updates of the same cache line are not >> coherent (in whatever sense required by the language), that code is >> incorrect. This has nothing to do with performance, and a great >> deal to do with programs giving occasional, non-repeatable wrong >> answers. > >So then you need the compiler to know that all accesses within a cache >line (or whatever the coherency size is) of both edges of the range are >special, and must be handled with totally separate instructions, right? Right. >Sounds like something it should be trivial to add to any given C >compiler! :-) > >Actually it seems to me that a compiler which supports OpenMP or similar >must do quite a bit of this when automatically partitioning a loop... Usable OpenMP support appeared in Fortran two years before it appeared in C; even today, OpenMP C and C++ compilers are, at best, likely to produce code that fails in obscure ways (even for 'correct' OpenMP usage) and/or generate inefficient code. A lot of experts recommend using only Fortran with OpenMP. Regards, Nick Maclaren.
From: Mayan Moudgill on 29 Dec 2009 08:34 nmm1(a)cam.ac.uk wrote: > In article <MMOdnSSpXI_wwaTWnZ2dnUVZ_tGdnZ2d(a)bestweb.net>, > Mayan Moudgill <mayan(a)bestweb.net> wrote: > >>>>Sorry, don't understand. Can you give a more concrete example? What do >>>>you mean by "byte-separation accesses"? >>> > > Because, if simultaneous updates of the same cache line are not > coherent (in whatever sense required by the language), that code is > incorrect. I see. You are worried that, given code of the form: for p in processors x[p] = foo(p,...) in the case of *non-coherent* memory, the writeback of x[M] may overwrite the already written value of x[M-1] (or any other value of x sharing a cache line with x[M]). This is exacerbated if x[] is a byte array, since it increases the chances of collisions, and because one hardware workaround (write masks) would require more bits (you need to keep track of dirty bytes as opposed to dirty words). If I have understood your problem properly, why would making the entire array write-through not be an adequate solution?
From: nmm1 on 29 Dec 2009 09:04 In article <wq-dnZv7vorjmKfWnZ2dnUVZ_uWdnZ2d(a)bestweb.net>, Mayan Moudgill <mayan(a)bestweb.net> wrote: > >I see. You are worried that, given code of the form: > > for p in processors > x[p] = foo(p,...) > >in the case of *non-coherent* memory, the writeback of x[M] may >overwrite the already written value of x[M-1] (or any other value of x >sharing a cache line with x[M]). Not really. That's only the case where simultaneous use of a cache line by two threads is undefined (except for both read-only). That is the extreme case of non-coherence, but not the only one. > This is exacerbated if x[] is a byte array, since it increases the >chances of collisions, and because one hardware workaround (write masks) >would require more bits (you need to keep track of dirty bytes as >opposed to dirty words). Yes. >If I have understood your problem properly, why would making the entire >array write-through not be an adequate solution? Because the real problem is consistency. If threads A and B update the same cache line, and threads C and D read it, then thread C may see a sequence of events that is incompatible with that seen by thread D. Almost always, that will go unnoticed, but occasionally it will cause the program to go wrong. Yes, synchronous write-through would solve it, but that's equivalent to disabling caching on updates. NOT good for performance. It's a foul problem, and current architectures merely kludge it up enough for the problems to be very rare indeed, so almost everyone who gets caught by it puts it down to transient hardware errors, or simply gremlins. There are probably only a few hundred people who have positively identified it as the cause of a particular failure; yes, it's that foul. Regards, Nick Maclaren.
From: Mayan Moudgill on 29 Dec 2009 09:22 nmm1(a)cam.ac.uk wrote: > In article <wq-dnZv7vorjmKfWnZ2dnUVZ_uWdnZ2d(a)bestweb.net>, > Mayan Moudgill <mayan(a)bestweb.net> wrote: > >>I see. You are worried that, given code of the form: >> >> for p in processors >> x[p] = foo(p,...) >> >>in the case of *non-coherent* memory, the writeback of x[M] may >>overwrite the already written value of x[M-1] (or any other value of x >>sharing a cache line with x[M]). > > Yes. > > >>If I have understood your problem properly, why would making the entire >>array write-through not be an adequate solution? > > > Because the real problem is consistency. If threads A and B update > the same cache line, and threads C and D read it, then thread C may > see a sequence of events that is incompatible with that seen by > thread D. Almost always, that will go unnoticed, but occasionally > it will cause the program to go wrong. OK, so lets denote the new values of the array as x[]', and the old values plain x[]. Then, the (ordering) inconsistency would arise when C reads x[A]',x[B] whilst D reads x[B]',x[A]. I find it somewhat hard to imagine that occuring; what it would mean is that there was no synchronization for the reads; i.e. the program did not care if it read x[A] or x[A]'. If, however, there were some form of synchronization, then it would be guaranteed that the correct version of each variable was read and you wouldn't get this inconsistency. I am sure there is something more to the problem; could you expand on it a little more.
From: nmm1 on 29 Dec 2009 09:33
In article <3b6dneYekdE0jafWnZ2dnUVZ_t2dnZ2d(a)bestweb.net>, Mayan Moudgill <mayan(a)bestweb.net> wrote: >> >>>If I have understood your problem properly, why would making the entire >>>array write-through not be an adequate solution? >> >> Because the real problem is consistency. If threads A and B update >> the same cache line, and threads C and D read it, then thread C may >> see a sequence of events that is incompatible with that seen by >> thread D. Almost always, that will go unnoticed, but occasionally >> it will cause the program to go wrong. > >OK, so lets denote the new values of the array as x[]', and the old >values plain x[]. Then, the (ordering) inconsistency would arise when C >reads x[A]',x[B] whilst D reads x[B]',x[A]. Yes, roughly. >I find it somewhat hard to imagine that occuring; what it would mean is >that there was no synchronization for the reads; i.e. the program did >not care if it read x[A] or x[A]'. If, however, there were some form of >synchronization, then it would be guaranteed that the correct version of >each variable was read and you wouldn't get this inconsistency. Not at all. What happens is that the inconsistency shows up in more complicated code, and breaks a (reasonable) constraint assumed by the code. It is possible to describe the problem very simply, but not to explain why it causes so much trouble. However, it does, and it is one of the classic problems of shared-memory parallelism. Regards, Nick Maclaren. |