Prev: CPU <> Memory chip communication interface
Next: interrupting for overflow and loop termination
From: Joe Seigh on 2 Sep 2005 15:42 Andy Glew wrote: > > Bottom quoting: asbestos donned! > > I think that Joe Seigh has incorrectly assumed that processor > consistency implies (a) a global ordering of all loads, and (b) causal > ordering. I think I was trying to prove that you couldn't imply global ordering of loads. Part of the problem is there's two target groups of programmers for the memory model here. The processor consistency is alright if you're doing HPC/parallel programming but isn't very useful if you're doing general multi-threaded programming. There, all you really care about is what the implicit global ordering between the various combinations of loads and stores, and what memory barriers to use for the combinations where ordering isn't defined. In the ia32 docs, it's a little muddied because of the mention of speculative loads. None the less I had assumed that loads weren't ordered and that LFENCE or some other memory barrier or serializing instruction was needed for global ordering of loads. However there were some that claimed LFENCE wasn't needed. And the documentation wasn't explicit enough to definitively counter their claims. And it had to be really explicit given the rather incomprehensible arguments they were presenting. I've basically decided to ignore these people for now and stick with my orginal interpretation of the ia32 memory model. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software.
From: Alexander Terekhov on 3 Sep 2005 07:58 Joe Seigh wrote: [...] > In the ia32 docs, it's a little muddied because of the mention of > speculative loads. None the less I had assumed that loads weren't > ordered and that LFENCE or some other memory barrier or serializing > instruction was needed for global ordering of loads. Neither will give you "global ordering of loads". Loads on ia32 are in-order with respect to other loads and subsequent stores (by the same processor). The only thing that differentiates PC from TSO is the lack of remote write atomicity (in IA64 formal memory model speak). Implementations (e.g. SPO) of course can do all sorts of tricks to improve performance, but that doesn't change the memory model. You're in denial. regards, alexander.
From: Joe Seigh on 3 Sep 2005 08:35 Alexander Terekhov wrote: > Joe Seigh wrote: > [...] > >>In the ia32 docs, it's a little muddied because of the mention of >>speculative loads. None the less I had assumed that loads weren't >>ordered and that LFENCE or some other memory barrier or serializing >>instruction was needed for global ordering of loads. > > > Neither will give you "global ordering of loads". Loads on ia32 are > in-order with respect to other loads and subsequent stores (by the > same processor). The only thing that differentiates PC from TSO is > the lack of remote write atomicity (in IA64 formal memory model > speak). Implementations (e.g. SPO) of course can do all sorts of > tricks to improve performance, but that doesn't change the memory > model. You're in denial. > Whatever. I'm going to use LFENCE for situations where I'd use #LoadLoad on sparc (generic, not assuming TSO). And it's not because I'm in denial. It's because nothing you say is comprehensible. It's possible you are making some kind of valid technical point but I have no way of telling. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software.
From: Alexander Terekhov on 3 Sep 2005 08:52 Joe Seigh wrote: [...] > Whatever. I'm going to use LFENCE for situations where I'd use > #LoadLoad on sparc (generic, not assuming TSO). You mean RMO? Reportedly, RMO is vaporware, so yeah, you'll get the same "useful" effect on Sparc as on ia32 (weakly ordered WC memory aside for a moment): none whatsoever. regards, alexander.
From: Joe Seigh on 3 Sep 2005 09:46
Alexander Terekhov wrote: > Joe Seigh wrote: > [...] > >>Whatever. I'm going to use LFENCE for situations where I'd use >>#LoadLoad on sparc (generic, not assuming TSO). > > > You mean RMO? Reportedly, RMO is vaporware, so yeah, you'll get the > same "useful" effect on Sparc as on ia32 (weakly ordered WC memory > aside for a moment): none whatsoever. > In the same sense that Sparc documentation assumes the weakest possbile architected memory model when documenting usage of its memory barriers. I know that some sparc processors only implement TSO and Solaris assumes and requires TSO (so far). It's possible Intel processors are all effectivly implemented as TSO, but we're talking about the architected memory model and have to assume that unless writing model dependent code. I like how you sidestepped whether LFENCE or some serializing instruction is required in some situations between sucessive loads on Intel ia32 processors. We're assuming weakly ordered memory I think, whatever the typical multiprocessor Intel box meant to run Linux or windows uses. Whatever "write-back cacheable" is. : This whole thing is bizarre. Any other architecture, e.g. IBM Z architecture, powerpc, sparc, alpha, ... and there's no problem in discussing whether memory barriers are needed in certain situations. Only in Intel ia32 and only when Alexander participates. However, if you filter out any comments by Alexander then the problem goes away. I should have put in an Alexander filter earlier. Then I wouldn't have raised this issue in the first place, which has probably put *me* in a few filters. :) -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. |