Prev: PEEEEEEP
Next: Texture units as a general function
From: nmm1 on 29 Dec 2009 10:08 In article <hhd3vr$cq3$1(a)smaug.linux.pwf.cam.ac.uk>, <nmm1(a)cam.ac.uk> wrote: >In article <3b6dneYekdE0jafWnZ2dnUVZ_t2dnZ2d(a)bestweb.net>, >Mayan Moudgill <mayan(a)bestweb.net> wrote: > >>I find it somewhat hard to imagine that occuring; what it would mean is >>that there was no synchronization for the reads; i.e. the program did >>not care if it read x[A] or x[A]'. If, however, there were some form of >>synchronization, then it would be guaranteed that the correct version of >>each variable was read and you wouldn't get this inconsistency. > >Not at all. What happens is that the inconsistency shows up in more >complicated code, and breaks a (reasonable) constraint assumed by the >code. It is possible to describe the problem very simply, but not to >explain why it causes so much trouble. However, it does, and it is >one of the classic problems of shared-memory parallelism. Try the following, which is about the simplest 'real' example I can think of: Thread A creates a data object X, and then stores a pointer to it in P. Thread B creates a data object Y, which contains a copy of P, and then stores a pointer to it in Q. Thread C reads Q, and indirects through P to X. All well-defined, with sequential consistency and some other such models. But, without that, the update of X need not have been completed, as seen by thread C, just because the update of Q has been. And, yes, it really happens. Regards, Nick Maclaren.
From: Mayan Moudgill on 29 Dec 2009 12:53 nmm1(a)cam.ac.uk wrote: > In article <hhd3vr$cq3$1(a)smaug.linux.pwf.cam.ac.uk>, <nmm1(a)cam.ac.uk> wrote: > >>In article <3b6dneYekdE0jafWnZ2dnUVZ_t2dnZ2d(a)bestweb.net>, >>Mayan Moudgill <mayan(a)bestweb.net> wrote: >> >> >>>I find it somewhat hard to imagine that occuring; what it would mean is >>>that there was no synchronization for the reads; i.e. the program did >>>not care if it read x[A] or x[A]'. If, however, there were some form of >>>synchronization, then it would be guaranteed that the correct version of >>>each variable was read and you wouldn't get this inconsistency. >> >>Not at all. What happens is that the inconsistency shows up in more >>complicated code, and breaks a (reasonable) constraint assumed by the >>code. It is possible to describe the problem very simply, but not to >>explain why it causes so much trouble. However, it does, and it is >>one of the classic problems of shared-memory parallelism. > > > Try the following, which is about the simplest 'real' example I can > think of: > > Thread A creates a data object X, and then stores a pointer to > it in P. > Thread B creates a data object Y, which contains a copy of P, > and then stores a pointer to it in Q. > Thread C reads Q, and indirects through P to X. > > All well-defined, with sequential consistency and some other such > models. But, without that, the update of X need not have been > completed, as seen by thread C, just because the update of Q has > been. And, yes, it really happens. > > Thanks for the example. It does highlight a potential problem [though it appears completely unrelated to cache line sharing]. Writing out the time-line: A: writes X, synch#1 B: synch#1, writes Q pointing to X, synch#2 C: synch#2, derefs Q where synch is a pairwise synchronizaton + barrier. - synch#1 reconciles A's and B's writes, so B sees the writes to X, and B can safely deref Q. - synch#2 reconciles B & C, so C sees the writes to Q, *BUT* there is no guarantee that A's writes are visible to C, so C derefing Q can access the old value of X. I guess this kind of situation will arise whenever barriers are not global, but apply to only a subset of all processors. But isn't that just a variation of an already existing category of bug - failure to respect the memory model?
From: nmm1 on 29 Dec 2009 13:16 In article <sZWdnUZRf9Gd36fWnZ2dnUVZ_u-dnZ2d(a)bestweb.net>, Mayan Moudgill <mayan(a)bestweb.net> wrote: > >Thanks for the example. It does highlight a potential problem [though it >appears completely unrelated to cache line sharing]. ... It may appear to be, but it isn't. There are several ways of ensuring that writes to part of cache lines never get lost, which are scalable to lots of cores (Andy Glew mentioned one a long while back in this thread, if I recall). Unfortunately, there aren't any that preserve sequential consistency, which is why no (?) current hardware supports it. Sequential consistency is inherently global, and hence inherently non-scalable. >I guess this kind of situation will arise whenever barriers are not >global, but apply to only a subset of all processors. Yes. >But isn't that just a variation of an already existing category of bug - >failure to respect the memory model? THE memory model? That's my point - there isn't one, there are at least two - the one provided by the hardware and the one assumed by the programming language. What I am trying to explain is that every method of reducing cache synchronisation cost and increasing its scalability has the downside that it reduces the consistency of the memory model. All computer architecture is a compromise between efficiency and usability, and this is no exception. Currently, almost all shared-memory parallel models being introduced by programming languages are MORE consistent than what the hardware currently provides, rather than less (which would allow freedom to change caching technology). Therefore, either the language memory model has to be reduced to match the hardware one or the language has to be disciplined enough that the compiler can detect the cases that need special attention. All right, or it can be left as the programmer's problem, but that is a shortcut to vastly less reliable programs than we have today. The main exceptions are the PGAS models, which are sort of semi- shared memory with explicit synchronisation. They are popular with some of the HPC people, but that's a subset of a subset. The main ones assume sequential consistency, with or without fencing. Regards, Nick Maclaren.
From: Mayan Moudgill on 29 Dec 2009 14:46 nmm1(a)cam.ac.uk wrote: > In article <sZWdnUZRf9Gd36fWnZ2dnUVZ_u-dnZ2d(a)bestweb.net>, > Mayan Moudgill <mayan(a)bestweb.net> wrote: > > >>But isn't that just a variation of an already existing category of bug - >>failure to respect the memory model? > > > THE memory model? That's my point - there isn't one, there are at > least two - the one provided by the hardware and the one assumed by > the programming language. > > What I am trying to explain is that every method of reducing cache > synchronisation cost and increasing its scalability has the downside > that it reduces the consistency of the memory model. > > Therefore, either the language memory model has to be reduced to > match the hardware one or the language has to be disciplined enough > that the compiler can detect the cases that need special attention. > All right, or it can be left as the programmer's problem, but that > is a shortcut to vastly less reliable programs than we have today. > Thanks for clearing things up.
From: Robert Myers on 29 Dec 2009 18:48
On Dec 29, 2:46 pm, Mayan Moudgill <ma...(a)bestweb.net> wrote: > Thanks for clearing things up. This exchange, while impressive for its depth of experience and understanding, is a bit like a discussion of how to save the Ptolemaic universe. Keeping track of things by memory locations that can be written to and accessed by multiple processes and that can have multiple meanings based on context is a losing enterprise. No matter what you propose, you'll wind up with specifications that read like the health care bill now before Congress. And, as I think Nick is pointing out with current solutions, the bugs won't always be the result of programmer incompetence. You've both got so much invested in thinking about these things that you'll beat it to death forever, rather than admitting that the entire model of stored variables simply doesn't work. Other models have been proposed. Time to get on with it. Robert. |