Prev: CPU <> Memory chip communication interface
Next: interrupting for overflow and loop termination
From: Alexander Terekhov on 6 Sep 2005 05:01 Andy Glew wrote: [...] > I think that the overall intention is that placing MFENCE before and > after every memory reference is supposed to get you SC semantics. But without remote write atomicity, I suppose. And, BTW, that's what revised Java volatiles do. I mean JSR-133 memory model. > However, MFENCE, LFENCE, and SFENCE were defined after my time, and I > suspect that their definitions are not quite complete enough for what > you want. In particular, *FENCE really only work wrt WC cacheable > memory, and do not drain external buffers such as may occur in bus > bridges. My reading of the specs is that MFENCE is guaranteed to provide store-load barrier. P1: X = 1; R1 = Y; P2: Y = 1; R2 = X; (R1, R2) = (0, 0) is allowed under pure PC, but P1: X = 1; MFENCE; R1 = Y; P2: Y = 1; MFENCE; R2 = X; (R1, R2) = (0, 0) is NOT allowed. > In general, the P6 and Wmt families' mechanism for ensuring > ordering, waiting for global observability, only works for perfectly > vanilla WC cacheable memory, and is frequently violated wrt other > memory types. So I do not want to guarantee that it will work for > things like WC cached memory that is private to a graphics > accelerator. I want to know whether MFENCE provides store-load barrier for WB memory. > > You may be right that using the cmpxchg as you describe achieves SC on > x86. However, I need to think about it a bit more, since the > reasoning you provide is implementation specific, not architectural. I'm just reading the specs. CMPXCHG on x86 always performs a (hopefully StoreLoad+LoadLoad fenced) load followed by a (LoadStore+StoreStore fenced) store (plus trailing MFENCE, so to speak). Locked CMPXCHG is supposed to be "fully fenced". Regarding safety net for remote write atomicity, I rely on the following CMPXCHG wording: "The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)" I suspect that (locked) XADD(addr, 0) will also work... but I'm somewhat missing strong language about mandatory write as in CMPXCHG. [... cmpxchg could well be implemented without any fencing ...] "Locked operations are atomic with respect to all other memory operations and all externally visible events. Only instruction fetch and page table accesses can pass locked instructions. Locked instructions can be used to synchronize data written by one processor and read by another processor. For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete). This rule is also true for the Pentium 4 and Intel Xeon processors, with one exception: load operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized." > You are confusing implementation with semantics. Fix the specs, then. And explain how can one achieve classic SC semantics for WB memory. regards, alexander.
From: David Hopwood on 6 Sep 2005 07:26 Joe Seigh wrote: > David Hopwood wrote: >> Joe Seigh wrote: >> >>> I'm not sure what you're saying here. That all future processors >>> from Intel that don't have processor ordering won't be x86? >> >> Well, they won't be x86-as-we-know-it. OSes, compilers, etc. will >> have to be changed to run on or generate code for this new x86-like >> thing, and changes in the memory model will probably be only one issue >> they need to deal with. >> >>> And that the synchronization intructions in these future processors >>> won't be similar to the one's in x86? That Intel is telling people >>> in an x86 manual to start writing portable code not now but when >>> they get to the future processor? >> >> Of course not. Read what they actually wrote. > > I did. It sounded to me like they said if you want to write > portable code, don't assume processor ordering but use the > locking and serializing instructions instead on the current > processors. But OSes, thread libraries and language implementations *aren't* portable code. -- David Hopwood <david.nospam.hopwood(a)blueyonder.co.uk>
From: Joe Seigh on 6 Sep 2005 07:53 David Hopwood wrote: > Joe Seigh wrote: > >> David Hopwood wrote: >>> >>> Of course not. Read what they actually wrote. >> >> >> I did. It sounded to me like they said if you want to write >> portable code, don't assume processor ordering but use the >> locking and serializing instructions instead on the current >> processors. > > > But OSes, thread libraries and language implementations *aren't* portable > code. > I do not think that word means what you think it means. Note that I am an ex-kernel developer and have created enough sychronization api's that run on totally different platforms. I've created an atomically threadsafe reference counted smart pointer that has two totally different implmentations on two different architectures. Given that Sun Microsystems' research division couldn't manage to do this and could only do it is on a obsolete architecture, I'd say I have a pretty good idea what portability is and what its issues are. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software.
From: Joe Seigh on 6 Sep 2005 08:02 Alexander Terekhov wrote: > Andy Glew wrote: > > >>You are confusing implementation with semantics. > > > Fix the specs, then. I think you can assume that the serializing stuff does the right thing. If not and you have strong reason to believe otherwise, then you should short Intel stock as you'd stand a pretty good chance of making a fortune. Basically, no OS would work correctly on an Intel based multi-processor server and Intel would be out of that business. Also Intel would be screwed in the multi-core workstation and desktop market as it would be too late to fix the current processors going into production. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software.
From: David Hopwood on 6 Sep 2005 08:54
Joe Seigh wrote: > David Hopwood wrote: >> Joe Seigh wrote: >>> David Hopwood wrote: >>> >>>> Of course not. Read what they actually wrote. >>> >>> I did. It sounded to me like they said if you want to write >>> portable code, don't assume processor ordering but use the >>> locking and serializing instructions instead on the current >>> processors. >> >> But OSes, thread libraries and language implementations *aren't* portable >> code. > > I do not think that word means what you think it means. > > Note that I am an ex-kernel developer and have created enough > sychronization api's that run on totally different platforms. You are totally missing the point. OSes, thread libraries and language implementations have some code that needs to be adapted to each hardware architecture. If the memory model were to change in future processors that are otherwise x86-like, this code would have to change. It's not a big deal, because this platform-specific code is maintained by people who know how to change it, and because there are few enough OSes, thread libraries, and language implementations for the total effort involved not to be very great. It would, however, be a big deal if existing x86 *applications* stopped working on an otherwise x86-compatible processor. -- David Hopwood <david.nospam.hopwood(a)blueyonder.co.uk> |