Prev: CPU <> Memory chip communication interface
Next: interrupting for overflow and loop termination
From: David Hopwood on 5 Sep 2005 17:21 Joe Seigh wrote: > David Hopwood wrote: >> Joe Seigh wrote: >>> Alexander Terekhov wrote: >>> >>>> So where do you put the fence, then? >>>> >>>> : processor 1 stores into X >>>> : processor 2 see the store by 1 into X and stores into Y >>>> : processor 3 loads from Y >>>> : processor 3 loads from X >>> >>> Since this was my example I should clarify. It was meant to >>> show that PC alone wasn't sufficient to guarantee that if processor >>> 3 saw the store into Y by processor 2 that it would see the >>> store into X by processor 1. >>> >>> My understanding of the ia32 memory model is that you >>> need a fence instruction between the loads by processor 3 >>> and a fence between the load and store by processor 2 to >>> make the guarantee work. >> >> My understanding is that if the claimed problem exists at all, adding >> these fences won't fix it (as far as the model is concerned, possibly >> as opposed to implementation details of specific chips). > > The architected memory model as opposed to the implemented one? Yes, that's what I said. > "Despite the fact that Pentium 4, Intel Xeon, and P6 family > processors support processor ordering, Intel does not guarantee that > future processors will support this model. To make software portable > to future processors, it is recommended that operating systems provide > critical region and resource control constructs and API’s (application > program interfaces) based on I/O, locking, and/or serializing > instructions be used to synchronize access to shared areas of > memory in multiple-processor systems." This is all perfectly sensible. "Future processors" from Intel are not necessarily ISA-compatible with x86 anyway. For example, you need to recompile to use long mode in EM64T. Also note that it doesn't say "future x86 processors". Maybe they were talking about Itanic. Even if they weren't talking about IA-64 or a different mode, it's still a good idea to avoid dependencies on the memory model in *applications*, since it is more difficult to change all apps that have such dependencies than it is to change threading libraries in OS and language implementations. In fact OS/lang-impl maintainers half expect stuff to rot on new hardware, and hopefully remember what they depended on. Application maintainers generally don't (if they ever understood it in the first place). This is what I've been saying consistently. Anyway, this issue doesn't have anything to do with what we were talking about, which is whether the current architected x86 model allows a particular behaviour. > That one? And what do people think the memory model that only > "I/O, locking, and/or serializing instructions" can synchronize is? You're overanalysing a fairly loosely worded recommendation. -- David Hopwood <david.nospam.hopwood(a)blueyonder.co.uk>
From: Joe Seigh on 5 Sep 2005 18:32 David Hopwood wrote: > Joe Seigh wrote: > >> "Despite the fact that Pentium 4, Intel Xeon, and P6 family >> processors support processor ordering, Intel does not guarantee that >> future processors will support this model. To make software portable >> to future processors, it is recommended that operating systems provide >> critical region and resource control constructs and APIýs (application >> program interfaces) based on I/O, locking, and/or serializing >> instructions be used to synchronize access to shared areas of >> memory in multiple-processor systems." > > > This is all perfectly sensible. "Future processors" from Intel are not > necessarily ISA-compatible with x86 anyway. For example, you need to > recompile to use long mode in EM64T. Also note that it doesn't say > "future x86 processors". Maybe they were talking about Itanic. > > Even if they weren't talking about IA-64 or a different mode, it's > still a good idea to avoid dependencies on the memory model in > *applications*, since it is more difficult to change all apps that > have such dependencies than it is to change threading libraries in OS > and language implementations. In fact OS/lang-impl maintainers half > expect stuff to rot on new hardware, and hopefully remember what they > depended on. Application maintainers generally don't (if they ever > understood it in the first place). This is what I've been saying > consistently. Yes, your adversion to anarchist application programmers doing their own thing is well known. :) > > Anyway, this issue doesn't have anything to do with what we were talking > about, which is whether the current architected x86 model allows a > particular behaviour. > >> That one? And what do people think the memory model that only >> "I/O, locking, and/or serializing instructions" can synchronize is? > > > You're overanalysing a fairly loosely worded recommendation. > I'm not sure what you're saying here. That all future processors from Intel that don't have processor ordering won't be x86? And that the synchronization intructions in these future processors won't be similar to the one's in x86? That Intel is telling people in an x86 manual to start writing portable code not now but when they get to the future processor? That's a little strange even for Intel. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software.
From: David Hopwood on 5 Sep 2005 20:26 Joe Seigh wrote: > David Hopwood wrote: >> Joe Seigh wrote: >> >>> "Despite the fact that Pentium 4, Intel Xeon, and P6 family >>> processors support processor ordering, Intel does not guarantee that >>> future processors will support this model. To make software portable >>> to future processors, it is recommended that operating systems provide >>> critical region and resource control constructs and API’s (application >>> program interfaces) based on I/O, locking, and/or serializing >>> instructions be used to synchronize access to shared areas of >>> memory in multiple-processor systems." >> >> This is all perfectly sensible. "Future processors" from Intel are not >> necessarily ISA-compatible with x86 anyway. For example, you need to >> recompile to use long mode in EM64T. Also note that it doesn't say >> "future x86 processors". Maybe they were talking about Itanic. >> >> Even if they weren't talking about IA-64 or a different mode, it's >> still a good idea to avoid dependencies on the memory model in >> *applications*, since it is more difficult to change all apps that >> have such dependencies than it is to change threading libraries in OS >> and language implementations. In fact OS/lang-impl maintainers half >> expect stuff to rot on new hardware, and hopefully remember what they >> depended on. Application maintainers generally don't (if they ever >> understood it in the first place). This is what I've been saying >> consistently. > > Yes, your adversion to anarchist application programmers doing their > own thing is well known. :) Right, I am absolutely convinced that the roles of application programmer and infrastructure programmer should be clearly separated (even if there are a few people with the ability and expertise needed to successfully do both). >> Anyway, this issue doesn't have anything to do with what we were talking >> about, which is whether the current architected x86 model allows a >> particular behaviour. >> >>> That one? And what do people think the memory model that only >>> "I/O, locking, and/or serializing instructions" can synchronize is? >> >> You're overanalysing a fairly loosely worded recommendation. > > I'm not sure what you're saying here. That all future processors > from Intel that don't have processor ordering won't be x86? Well, they won't be x86-as-we-know-it. OSes, compilers, etc. will have to be changed to run on or generate code for this new x86-like thing, and changes in the memory model will probably be only one issue they need to deal with. > And that the synchronization intructions in these future processors > won't be similar to the one's in x86? That Intel is telling people > in an x86 manual to start writing portable code not now but when > they get to the future processor? Of course not. Read what they actually wrote. -- David Hopwood <david.nospam.hopwood(a)blueyonder.co.uk>
From: Andy Glew on 5 Sep 2005 20:22 Alexander Terekhov <terekhov(a)web.de> writes: > So just do cmpxchg(&X, 42, 42) which will perform locked read-write > (with its read part store-load fenced from prior writes, I infer). > You'll get classic SC if you replace all loads with cmpxchg(&X, 42, > 42). That's my understanding, and I'm eagerly awaiting confirmation > from Andy Glew and/or someone from Intel hanging at C++ memory model > mailing list. 42, eh? Sounds like a joke: Goodbye, and thanks for all the thrash... I think that the overall intention is that placing MFENCE before and after every memory reference is supposed to get you SC semantics. However, MFENCE, LFENCE, and SFENCE were defined after my time, and I suspect that their definitions are not quite complete enough for what you want. In particular, *FENCE really only work wrt WC cacheable memory, and do not drain external buffers such as may occur in bus bridges. In general, the P6 and Wmt families' mechanism for ensuring ordering, waiting for global observability, only works for perfectly vanilla WC cacheable memory, and is frequently violated wrt other memory types. So I do not want to guarantee that it will work for things like WC cached memory that is private to a graphics accelerator. You may be right that using the cmpxchg as you describe achieves SC on x86. However, I need to think about it a bit more, since the reasoning you provide is implementation specific, not architectural. (Note that an atomic RMW like cmpxchg could well be implemented without any fencing semantics. I.e. atomic RMWs and memory ordering/fencing are independent concepts. I argued for this in Itanium; I am trying to remember if x86 required that the two be mixed up together. I can't see why it should have... I.e. I am sure that using cmpxchg as you describe need not provide SC on a reasonable computer architecture. I just need to find out if x86 mixed the two up for some legacy reasons. In the meantime: use the fences would be my recommendation.) > > 4) The only way to guarantee that a processor has the most recent > > value of a location is to take ownership of the variable, > > and that requires a write. Since we actually want to read X, > ^^^^^^^^^^^^^^^^^^^^^^^^^ > > That's the key. > > > we use CAS (x86 LOCK CMPXCHG) to read the most recent value. Flawed argument. It is entirely possible to imagine implementations of CAS that do not write the variable if the value is unchanged. > That will work too, but you don't really need to LD X and loop on > CAS compare failure given that x86's cmpxchg always makes a write. > "The destination operand is written back if the comparison fails; > otherwise, the source operand is written into the destination. (The > processor never produces a locked read without also producing a > locked write.)" You are confusing implementation with semantics.
From: Joe Seigh on 5 Sep 2005 21:13
David Hopwood wrote: > Joe Seigh wrote: > >> David Hopwood wrote: >>> >>>> That one? And what do people think the memory model that only >>>> "I/O, locking, and/or serializing instructions" can synchronize is? >>> >>> >>> You're overanalysing a fairly loosely worded recommendation. >> >> >> I'm not sure what you're saying here. That all future processors >> from Intel that don't have processor ordering won't be x86? > > > Well, they won't be x86-as-we-know-it. OSes, compilers, etc. will > have to be changed to run on or generate code for this new x86-like > thing, and changes in the memory model will probably be only one issue > they need to deal with. > >> And that the synchronization intructions in these future processors >> won't be similar to the one's in x86? That Intel is telling people >> in an x86 manual to start writing portable code not now but when >> they get to the future processor? > > > Of course not. Read what they actually wrote. > I did. It sounded to me like they said if you want to write portable code, don't assume processor ordering but use the locking and serializing instructions instead on the current processors. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. |