Prev: utime() and GMT
Next: PeekNamedPipe and ReadFile
From: gast128 on 29 Oct 2009 09:55 Hello all, When I wrote a little test application, I noticed that the memory access was completely removed in release builds (both vstudio 2003 as 2008): void TestIntlIppiImplFlood(/*volatile*/ long* pContinue) { while (*pContinue) { } } In dissambly: void TestIntlIppiImplFlood(/*volatile*/ long* pContinue) { 00981270 mov eax,dword ptr [esp+4] 00981274 mov eax,dword ptr [eax] while (*pContinue) 00981276 test eax,eax 00981278 jne TestIntlIppiImplFlood+6 (981276h) { } } One can notice that only the value of *pContinue is stored in eax, and this is tested. Even if pContinue is modified in another thread, the function never ends. This is quite an optimization, but I think it is too agressive. Ofc I can use volatile for the address, but in effect this means that every shared variable over threads must get the volatile keyword. I am aware that one should use boost::mutex or other stuff to prevent data race conditions, but this was just a simple test in which the variable was atomicly changed (thru InterlockedIncrement) in one thread and read in another thread. can anyone shed light in this? thx
From: John Keenan on 29 Oct 2009 12:09 <gast128(a)hotmail.com> wrote: > This is quite an optimization, but I think it is too agressive. Sometimes you can use a do-nothing function to stop this optimization (you must test with each compiler). For example: void doNothing( long* pContinue ) { return; } Then add a call to doNothing to your original function: void TestIntlIppiImplFlood(/*volatile*/ long* pContinue) { while( *pContinue ){ doNothing( pContinue ); } } While a compiler could optimize this to your original assembly code my experience is that today's compilers do not. John
From: Igor Tandetnik on 29 Oct 2009 11:24 gast128(a)hotmail.com wrote: > When I wrote a little test application, I noticed that the memory > access was completely removed in release builds (both vstudio 2003 as > 2008): > > void TestIntlIppiImplFlood(/*volatile*/ long* pContinue) > { > while (*pContinue) > { } > } > > In dissambly: > void TestIntlIppiImplFlood(/*volatile*/ long* pContinue) > { > 00981270 mov eax,dword ptr [esp+4] > 00981274 mov eax,dword ptr [eax] > while (*pContinue) > 00981276 test eax,eax > 00981278 jne TestIntlIppiImplFlood+6 (981276h) > { > } > } > > One can notice that only the value of *pContinue is stored in eax, and > this is tested. Even if pContinue is modified in another thread, the > function never ends. You should use Interlocked* family of functions to access variables shared between threads. Alternatively, use proper synchronization primitives such as critical sections. > This is quite an optimization, but I think it is too agressive. Ofc I > can use volatile for the address, but in effect this means that every > shared variable over threads must get the volatile keyword. I am aware > that one should use boost::mutex or other stuff to prevent data race > conditions, but this was just a simple test in which the variable was > atomicly changed (thru InterlockedIncrement) in one thread and read in > another thread. Synchronizing access to shared data only works when all threads do it. It's pointless to do it in some places but not in others. Use InterlockedCompareExchange to atomically read your variable - like this: while (InterlockedCompareExchange(pContinue, 0, 0)) {...} -- With best wishes, Igor Tandetnik With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
From: gast128 on 29 Oct 2009 13:06 Thx. Yes we can fool the optimizer by using dummy functions. I am aware of the threading issues and that one should lock or atomically exchange values. Even if the read isn't atomic, one might expect that the atomic write at least flushes it to memory. I am not sure if this guarantees a correct read (not sure if the processor updates its cache for all processors after a memory write, maybe multicore machine behave here differently then multiprocessor machines). Still the compiler has completely optimized away the read, so I was wondering if this is always correct. If I put any dummy object in the call, the compiler already produces code in which the memory gets accessed, so I was wondering why in this simple case the compiler decided to completely optimize the memory access away and if this is correct in all cases.
From: Igor Tandetnik on 29 Oct 2009 13:25
gast128(a)hotmail.com wrote: > I am aware of the threading issues and that one should lock or > atomically exchange values. Even if the read isn't atomic, one might > expect that the atomic write at least flushes it to memory. .... but that doesn't mean that a different CPU reads it from memory and not, say, from its own cache. > I am not > sure if this guarantees a correct read On many modern multicore architectures, it doesn't. See also http://en.wikipedia.org/wiki/Memory_barrier > Still the compiler has completely optimized away the read, so I was > wondering if this is always correct. Yes. It's your responsibility to be careful with shared data, and use appropriate access patterns. You don't want the compiler to automatically penalize access to all variables in the program, just in case some of them are shared. That would effectively disable most optimizations. > If I put any dummy object in the > call, the compiler already produces code in which the memory gets > accessed I'm not sure what you mean by "dummy object". My guess is, you are putting a call into the loop whose source code the compiler doesn't see at this point. Now, even in a single-threaded program, it's possible to do this: void TestIntlIppiImplFlood(/*volatile*/ long* pContinue) { while (*pContinue) { DoSomething(); } } // in a different source file long global_continue = 1; void DoSomething() { global_continue = 0; } TestIntlIppiImplFlood(&global_continue); This effect is called "aliasing" ( http://en.wikipedia.org/wiki/Aliasing_(computing) ). The compiler has to assume the presence of aliasing unless proven otherwise (e.g. local variables whose address is never given out can't be aliased), and optimize accordingly. -- With best wishes, Igor Tandetnik With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925 |