From: James Kanze on 23 Mar 2010 20:40 On Mar 20, 8:22 am, Tony Jorgenson <tonytinker2...(a)yahoo.com> wrote: [...] > I understand that volatile does not guarantee that the order > of memory writes performed by one thread are seen in the same > order by another thread doing memory reads of the same > locations. I do understand the need for memory barriers > (mutexes, atomic variables, etc) to guarantee order, but there > are still 2 questions that have never been completely > answered, at least to my satisfaction, in all of the > discussion I have read on this group (and the non moderated > group) on these issues. > First of all, I believe that volatile is supposed to guarantee the > following: > Volatile forces the compiler to generate code that performs > actual memory reads and writes rather than caching values in > processor registers. In other words, I believe that there is a > one-to-one correspondence between volatile variable reads and > writes in the source code and actual memory read and write > instructions executed by the generated code. Is this correct? Sort of. The standard uses a lot of weasel words (for good reasons) with regards to volatile, and in particular, leaves it up to the implementation to define exactly what it means by "access". Still, it's hard to imagine an interpretation that doesn't imply a machine instruction which loads or stores. Of course, on modern machines, a store instruction doesn't necessarily result in a write to physical memory; you typically need additional instructions to ensure that. And on the compilers I know (g++, Sun CC and VC++), volatile doesn't cause them to be generated. (My most concrete experience is with Sun CC on a Sparc, where volatile doesn't ensure that memory mapped I/O works correctly.) > Question 1: > My first question is with regard to using volatile instead of > memory barriers in some restricted multi-threaded cases. If my > above statements are correct, is it possible to use _only_ > volatile with no memory barriers to signal between threads in > a reliable way if only a single word (perhaps a single byte) > is written by one thread and read by another? No. Storing a byte (at the machine code level) on one processor or core doesn't mean that the results of the store will be seen on another processor. Modern processors reorder memory writes in hardware, so the given the sequence: volatile int a = 0, b = 0; // suppose int atomic void f() { a = 1; b = 1; } another thread may still see b == 1 and a == 0. > Question 1a: > First of all, please correct me if I am wrong, but I believe > volatile _must_always_ work as described above on any single > core CPU. One CPU means one cache (or one hierarchy of caches) > meaning one view of actual memory through the cache(s) that > the CPU sees, regardless of which thread is running. Is this > much correct for any CPU in existence? If not please mention a > situation where this is not true (for single core). The standard doesn't make any guarantees, but all of the processor architectures I know do guarantee coherence within a single core. The real question here is rather: who has a single core machine anymore? The last Sparc I worked on had 32 core, and I got it because it was deemed to slow for production work (where we had 128 core). And even my small laptop is a dual core. > Question 1b: > Secondly, the only way I could see this not working on a > multi-core CPU, with individual caches for each core, is if a > memory write performed by one CPU is allowed to never be > updated in the caches of other CPU cores. Is this possible? > Are there any multi-core CPUs that allow this? Doesn�t the > MESI protocol guarantee that eventually memory cached in one > CPU core is seen by all others? I know that there may be > delays in the propagation from one CPU cache to the others, > but doesn�t it eventually have to be propagated? Can it be > delayed indefinitely due to activity in the cores involved? The problem occurs upstream of the cache. Modern processors access memory through a pipeline. And optimize the accesses in hardware. Reading and writing a cache line at a time. So if you read a, then b, but the hardware finds that b is already in the read pipeline (because you've recently accessed something near it), then the hardware won't issue a new bus access for b; it will simply use the value already in the pipeline. Which may be older than the value of a, if the hardware does have to go to memory for a. All processors have instructions to force ordering: fence on an Intel (and IIRC, a lock prefix creates an implicit fence), membar on a Sparc. But the compilers I know don't issue these instructions in case of volatile access. So the hardware still remains free to do the optimizations that volation has forbid the compiler. > Question 2: > My second question is with regard to if volatile is necessary > for multi-threaded code in addition to memory barriers. I know > that it has been stated that volatile is not necessary in this > case, and I do believe this, but I don�t completely understand > why. The issue as I see it is that using memory barriers, > perhaps through use of mutex OS calls, does not in itself > prevent the compiler from generating code that caches > non-volatile variable writes in registers. Whether it prevents it or not is implementation defined. As soon as you start doing this, you're formally in undefined behavior as far as C or C++ are concerned. Posix and Windows, however, make additional guarantees, and if the compiler is Posix compliant or Windows compliant, you're safe with regards to code movement accross any of the API's which forbid it. If you're using things like inline assembler, or functions written in assembler, you'll have to check your compiler documentation, but in practice, the compiler will assume that the inline code modifies all visible variables (and so ensure that they are correctly written and read with regards to it) unless it has some means to know better, and those means will also allow it to take a possible fence or membar instruction into account. > I have heard it written in this group that posix, for example, > supports additional guarantees that make mutex lock/unlock > (for example) sufficient for correct inter-thread > communication through memory without the use of volatile. I > believe I read here once (from James Kanze I believe) that > �volatile is neither sufficient nor necessary for proper > multi- threaded code� (quote from memory). This seems to imply > that posix is in cahoots with the compiler to make sure that > this works. Posix imposes additional constraints on C compilers, in addition to what the C standard does. Technically, Posix doesn't know that C++ exists (and vice versa); practically, C++ compilers do claim Posix compliance, and exterpolate the C guarantees in a logical fashion. (Given that they generally concern basic types like int, this really isn't too difficult.) I've seen less formal specification with regards to Windows (and heaven knows, I'm looking, now that I'm working in an almost exclusively Windows environment). But practically speaking, VC++ behaves under Windows like Posix compliant compilers under Posix, and you won't find any other compiler breaking things that work with VC++. > If you add mutex locks and unlocks (I know RAII, so please > don�t derail my question) around some variable reads and > writes, how do the mutex calls force the compiler to generate > actual memory reads and writes in the generated code rather > than register reads and writes? That's the problem of the compiler implementor. Posix (explicitly) and Windows (implicitly, at least) say that it has to work, so it's up to the compiler implementor to make it work. (In practice, most won't look into a function for which they don't have the source code, and won't move code accross a function whose semantics they don't know.) > I understand that compilation optimization affects these > issues, but if I optimize the hell out of my code, how do > posix calls (or any other OS threading calls) force the > compiler to do the right thing? My only conjecture is that > this is just an accident of the fact that the compiler can�t > really know what the mutex calls do and therefore the compiler > must make sure that all globally accessible variables are > pushed to memory (if they are in registers) in case _any_ > called function might access them. Is this what makes it work? In practice, in a lot of cases, yes:-). It's an easy and safe solution for the implementor, and it really doesn't affect optimization that much---critical zones which include system calls or other functions for which the compiler doesn't have the source code aren't that common. In theory, however, a compiler could know the list of system requests which guarantee memory synchronization, and disassemble the object files of any functions for which it didn't have the sources, to see if they made any such requests. I just don't know of any compilers which do this. > If not, then how do mutex call guarantee the compiler doesn�t > cache data in registers, because this would surely make the > mutexes worthless without volatile (which I know from > experience that they are not). The system API says that they have to work. It's up to the compiler implementor to ensure that they do. Most adopt the simple solution: I don't know what this function does, so I'll assume the worst. But at least in theory, more elaborate strategies are possible. -- James Kanze -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Michael Doubez on 24 Mar 2010 04:12 On 24 mar, 12:33, James Kanze <james.ka...(a)gmail.com> wrote: > On Mar 23, 1:42 pm, Michael Doubez <michael.dou...(a)free.fr> wrote: > > > On 23 mar, 00:22, "Bo Persson" <b...(a)gmb.dk> wrote: > > [...] > > > Still it does say something of the semantic of the memory > > location. In practice the compiler will cut the optimizations > > regarding the volatile location; I don't see a compiler > > ignoring this kind of notification. > > Not really. It makes some vague statements concerning "access", > while not defining what it really means by access. And "memory > location", without further qualifiers, has no real meaning on > modern processors, with their five or six levels of memory---is > the memory the core specific cache, the memory shared by all the > cores, or the virtual backup store (which maintains its values > even after the machine has been shut down)? > > And of course, what really counts is what the compilers > implement: neither g++, nor Sun CC, nor VC++ (at least through > 8.0) give volatile any more semantics that issuing a load or > store instruction---which the hardware will execute when it gets > around to it. Maybe. > > > Which means that the memory value will eventually (after an > > undetermined amount of time) be flushed to the location and > > not kept around in the stack or somewhere else for > > optimization reasons. > > Sorry, but executing a store instruction (or a mov with a > destination in memory) does NOT guarantee that there will be a > write cycle in main memory, ever. At least not on modern Sparc > and Intel architectures. (I'm less familiar with others, but > from what I've heard, Sparc and Intel are among the most strict > in this regard.) I am surprised. I would have expected cache lines to be flushed after a given amount of time in order to avoid coherency issues. 'volatile' making it worse by *forcing* a flush per modification (although without guaranteeing ordering with other non-volatile memory access). [snip] -- Michael -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: James Kanze on 24 Mar 2010 15:06 On Mar 24, 7:12 pm, Michael Doubez <michael.dou...(a)free.fr> wrote: > On 24 mar, 12:33, James Kanze <james.ka...(a)gmail.com> wrote: [...] > > Sorry, but executing a store instruction (or a mov with a > > destination in memory) does NOT guarantee that there will be > > a write cycle in main memory, ever. At least not on modern > > Sparc and Intel architectures. (I'm less familiar with > > others, but from what I've heard, Sparc and Intel are among > > the most strict in this regard.) > I am surprised. I would have expected cache lines to be > flushed after a given amount of time in order to avoid > coherency issues. 'volatile' making it worse by *forcing* a > flush per modification (although without guaranteeing ordering > with other non-volatile memory access). Cache lines are only part of the picture, but similar concerns apply to them. All of the coherency issues are addressed by considering values, not store instructions. So if you modify the same value several times before it makes it out of the processor, some of those "writes" are lost. (This is generally not an issue for threading, but it definitely affects things like memory mapped I/O.) And for better or for worse, volatile doesn't force any flushing on any of the compilers I know; all it does is ensure that a store instruction is executed. So that given something like: int volatile a; int volatile b; // ... a = 1; b = 2; , the compiler will ensure that the store instruction to a is executed before the store instruction to b, but the hardware (write pipeline, typically) may reorder the modifications to main memory, or even in some extreme cases suppress one of them. -- James Kanze -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Andy Venikov on 24 Mar 2010 15:20 Joshua Maurice wrote: > On Mar 21, 2:32 pm, Andy Venikov <swojchelo...(a)gmail.com> wrote: <snip> >> >> The standard places a requirement on conforming implementations that: >> >> 1.9.6 >> The observable behavior of the abstract machine is its sequence of reads >> and writes to volatile data and calls to library I/O functions >> >> 1.9.7 >> Accessing an object designated by a volatile lvalue (3.10), modifying an >> object, calling a library I/O function, or calling a function that does >> any of those operations are all side effects, which are changes in the >> state of the execution environment. Evaluation of an expression might >> produce side effects. At certain specified points in the execution >> sequence called sequence points, all side effects of previous >> evaluations shall be complete and no side effects of subsequent >> evaluations shall have taken place >> >> 1.9.11 >> The least requirements on a conforming implementation are: >> � At sequence points, volatile objects are stable in the sense that >> previous evaluations are complete and >> subsequent evaluations have not yet occurred. >> >> That to me sounds like a complete enough requirement that compilers >> don't perform optimizations that produce "surprising" results in so far >> as observable behavior in an abstract (single-threaded) machine are >> concerned. This requirement happens to be very useful for multi-threaded >> programs that can augment volatile with hardware fences to produce >> meaningful results. > That is one interpretation. Unfortunately / fortunately (?), that > interpretation is not the prevailing interpretation. Thus far in this > thread, we have members of the C++ standards committee or its > affiliates explicitly disagreeing on the committee's website with that > interpretation (linked else-thread). The POSIX standard explicitly > disagrees with your interpretation (see google). The > comp.programming.threads FAQ explicitly disagrees with you several > times (linked else-thread). We have gcc docs and implementation > disagreeing with your interpretation (see google). We have an official > blog from intel, the biggest maker of chips in the world, and a major > compiler writer, explicitly disagreeing with your interpretation > (linked else-thread). We have experts in the C++ community explicitly > disagreeing with your interpretation. All the sources that you listed were saying that volatile isn't sufficient. And some went on as far as to say that it's "mostly" useless. That "mostly", however, covers an area that is real and I was talking about that area. None of them disagreed with what I said. Here's a brief example that I hope will put this issue to rest: volatile int n; n = 5; n = 6; volatile guarantees (note: no interpretation here, it's just what it says) that the compiler will issue two store instructions in the correct order (5 then 6). And that is a very useful quality for multi-threaded programs that chose not to use synchronization primitives like mutexes and such. Of course it doesn't mean that the processor executes them in that order, that's why we'd use memory fences. But to stop the compiler from messing around with these sequences, the volatile is necessary. >(Thanks Andrei, and his paper "C+ > + And The Perils Of Double Checked Locking". > >Andy, have you even read it? Of course I have. It's no secret that I admire works of both of the authors. I have read a lot of other papers as well. Magued Michael (who co-authored an article on lock-free algorithms with Andrei) and Tim Harris in particular are my favorites. But it wasn't the point of the discussion, was it? It's a great article. Among other things, it talks about the non-portability of a solution that relies solely on volatile. How is it different from what I have said in my earlier post? Quoting: "Is volatile sufficient - absolutely not. Portable - hardly. Necessary in certain conditions - absolutely." <snip> Thanks, Andy. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: George Neuner on 25 Mar 2010 04:10
On Thu, 25 Mar 2010 00:20:43 CST, Andy Venikov <swojchelowek(a)gmail.com> wrote: > >All the sources that [Joshua Maurice] listed were saying that volatile >isn't sufficient. And some went on as far as to say that it's "mostly" >useless. That "mostly", however, covers an area that is real and I was >talking about that area. None of them disagreed with what I said. > >Here's a brief example that I hope will put this issue to rest: > > >volatile int n; > >n = 5; >n = 6; > > >volatile guarantees (note: no interpretation here, it's just what it >says) that the compiler will issue two store instructions in the correct >order (5 then 6). And that is a very useful quality for multi-threaded >programs that chose not to use synchronization primitives like mutexes >and such. Of course it doesn't mean that the processor executes them in >that order, that's why we'd use memory fences. But to stop the >compiler from messing around with these sequences, the volatile is >necessary. Not exactly. 'volatile' is necessary to force the compiler to actually emit store instructions, else optimization would elide the useless first assignment and simply set n = 6. Beyond that constant propagation and/or value tracking might also eliminate the remaining assignment and the variable altogether. As you noted, 'volatile' does not guarantee that an OoO CPU will execute the stores in program order ... for that you need to add a write fence between them. However, neither 'volatile' nor write fence guarantees that any written value will be flushed all the way to memory - depending on other factors - cache snooping by another CPU/core, cache write back policies and/or delays, the span to the next use of the variable, etc. - the value may only reach to some level of cache before the variable is referenced again. The value may never reach memory at all. OoO execution and cache behavior are the reasons 'volatile' doesn't work as intended for many systems even in single-threaded use with memory-mapped peripherals. A shared (atomically writable) communication channel in the case of interrupts or concurrent threads is actually a safer, more predictable use of 'volatile' because, in general, it does not require values to be written all the way to main memory. >It's a great article. Among other things, it talks about the >non-portability of a solution that relies solely on volatile. How is it >different from what I have said in my earlier post? Quoting: > >"Is volatile sufficient - absolutely not. >Portable - hardly. >Necessary in certain conditions - absolutely." I haven't seen the whole thread and I'm not sure of the post to which you are referring. I think you might not be giving enough thought to the way cache behavior can complicate the standard's simple memory model. But it's possible that you have considered this and simply have not explained yourself thoroughly enough for [me and others] to see it. 'volatile' is necessary for certain uses but is not sufficient for (al)most (all) uses. I would say that for expert uses, some are portable and some are not. For non-expert uses ... I would say that most uses contemplated by non-experts will be neither portable nor sound. > Andy. George -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |