From: rohit.nadig@gmail.com on 17 Oct 2006 13:15 Pardon my intervening, but it seems like we are beating the point to death. It is painful to see Dennis and Nick build a mountain of animosity over nothing. Architect, micro-architect, I could care less. Its the ideas, innovation, and "fun" that really matters.
From: Nick Maclaren on 17 Oct 2006 13:33 In article <AKOdndCG_uFZkqjYnZ2dnUVZ8tSdnZ2d(a)pipex.net>, kenney(a)cix.compulink.co.uk writes: |> |> > Much of the load/store atomicity is unspecified, especially |> > with regard to multiple cores and data objects which need a |> > lower alignment than their size. That is a common source of |> > problem. |> |> Are you talking solely about factors that affect program |> optimisation? I ask this because most high level languages |> isolate the programmer from the architecture anyway. Come to that |> so do operating systems with the HAL. Presumably this is most |> important for compiler writers. No. And I am afraid that they don't. Let's take a specific example that has been discussed at length in the context of SC22WG21 (C++). A threaded program has a global object that is read frequently and updated (by simply storing into it) rarely. Under what circumstances is it permissible to put a lock around the update and leave the reads unlocked? According to most architectures and most languages, the answer is typically 'never', which has a MAJOR impact on the complexity and performance of many codes, as well as being a very common source of error. But, in practice, there are some pretty safe circumstances, such as: 1) If the object has an alignment equal to its size, which is 1, 2, 4 or 8 bytes, and is being BOTH stored and read by 'simple' loads and stores, then it is also safe. 2) If the object is normally all zero, is stored once between resets and is never reset to zero without a global lock, then it is safe for the reads to test for it going non-zero. They still may need to lock it to read the value, but the zero/non-zero test is safe unlocked. 3) If the new value stored has either a superset or a subset of the bits set in the previous value, and similar conditions to (2) apply, then a similar conclusion holds. And it turns out that those three are enough to make a MAJOR difference, so a huge number of parallel and interrupt handling codes rely on them, even when the architectures (as such) don't specify them. Regards, Nick Maclaren.
From: Del Cecchi on 17 Oct 2006 14:13 Nick Maclaren wrote: > In article <AKOdndCG_uFZkqjYnZ2dnUVZ8tSdnZ2d(a)pipex.net>, > kenney(a)cix.compulink.co.uk writes: > |> > |> > Much of the load/store atomicity is unspecified, especially > |> > with regard to multiple cores and data objects which need a > |> > lower alignment than their size. That is a common source of > |> > problem. > |> > |> Are you talking solely about factors that affect program > |> optimisation? I ask this because most high level languages > |> isolate the programmer from the architecture anyway. Come to that > |> so do operating systems with the HAL. Presumably this is most > |> important for compiler writers. > > No. And I am afraid that they don't. Let's take a specific example > that has been discussed at length in the context of SC22WG21 (C++). > > A threaded program has a global object that is read frequently > and updated (by simply storing into it) rarely. Under what > circumstances is it permissible to put a lock around the update > and leave the reads unlocked? > > According to most architectures and most languages, the answer is > typically 'never', which has a MAJOR impact on the complexity and > performance of many codes, as well as being a very common source of > error. But, in practice, there are some pretty safe circumstances, > such as: > > 1) If the object has an alignment equal to its size, which is 1, 2, 4 > or 8 bytes, and is being BOTH stored and read by 'simple' loads and > stores, then it is also safe. > > 2) If the object is normally all zero, is stored once between resets > and is never reset to zero without a global lock, then it is safe > for the reads to test for it going non-zero. They still may need to > lock it to read the value, but the zero/non-zero test is safe unlocked. > > 3) If the new value stored has either a superset or a subset of the > bits set in the previous value, and similar conditions to (2) apply, > then a similar conclusion holds. > > And it turns out that those three are enough to make a MAJOR difference, > so a huge number of parallel and interrupt handling codes rely on them, > even when the architectures (as such) don't specify them. > > > Regards, > Nick Maclaren. And they will work until they break. And if the application is important enough then their behaviour becomes a defacto part of the architecture. This has been true for many years. del -- Del Cecchi "This post is my own and doesn?t necessarily represent IBM?s positions, strategies or opinions.?
From: Nick Maclaren on 17 Oct 2006 14:31 In article <4pkkq2Fje65mU1(a)individual.net>, Del Cecchi <cecchinospam(a)us.ibm.com> writes: |> > |> > According to most architectures and most languages, the answer is |> > typically 'never', which has a MAJOR impact on the complexity and |> > performance of many codes, as well as being a very common source of |> > error. But, in practice, there are some pretty safe circumstances, |> > such as: ... |> |> And they will work until they break. And if the application is |> important enough then their behaviour becomes a defacto part of the |> architecture. This has been true for many years. To be precise, a good four decades .... As you say, the problem isn't new. Whether it is practicable to define an architecture so that NO program needs to rely on such aspects is unclear. I am pretty sure that it has never been done (even Java has had some problems in that respect), but whether it could be done is another matter. Regards, Nick Maclaren.
From: Joe Seigh on 17 Oct 2006 14:50
Del Cecchi wrote: > Nick Maclaren wrote: > >> In article <AKOdndCG_uFZkqjYnZ2dnUVZ8tSdnZ2d(a)pipex.net>, >> kenney(a)cix.compulink.co.uk writes: >> |> |> > Much of the load/store atomicity is unspecified, especially |> >> > with regard to multiple cores and data objects which need a |> > >> lower alignment than their size. That is a common source of |> > >> problem. >> |> |> Are you talking solely about factors that affect program |> >> optimisation? I ask this because most high level languages |> isolate >> the programmer from the architecture anyway. Come to that |> so do >> operating systems with the HAL. Presumably this is most |> important >> for compiler writers. >> No. And I am afraid that they don't. Let's take a specific example >> that has been discussed at length in the context of SC22WG21 (C++). >> >> A threaded program has a global object that is read frequently >> and updated (by simply storing into it) rarely. Under what >> circumstances is it permissible to put a lock around the update >> and leave the reads unlocked? >> >> According to most architectures and most languages, the answer is >> typically 'never', which has a MAJOR impact on the complexity and >> performance of many codes, as well as being a very common source of >> error. But, in practice, there are some pretty safe circumstances, >> such as: >> [...] > And they will work until they break. And if the application is > important enough then their behaviour becomes a defacto part of the > architecture. This has been true for many years. > On a different but related example of the same problem, RCU in Linux uses dependent loads instead of load acquire for performance reasons. To prevent it from breaking they use a macro which is just a plain load (w/ compiler optimization hints) on most architectures. So if the architecture changes, RCU won't break. The hardware might be considered broken at that point since it will take a huge performance hit on Linux anyway. AFAIK, none of the dependent load stuff is part of the formal memory models when those do exist. That info is usually tucked away in some performance documentation. The performance stuff to me seems like it's stuff that can change over an architectures lifetime necessitating a rewrite of some performance sensitive code from time to time. That it's required reflects different viewpoints on ROI. Companies like Intel are hardware oriented and view software as tied to specific models of processors, usually at the driver and library model. The idea that some software may have a lifetime of 10 years or more is alien to them since none fo the driver code has an expected lifetime of active development like that. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. |