Prev: Continuation of the MacOS networking thread...
Next: Continuation of the MacOS networking thread...
From: David Schwartz on 24 Dec 2009 14:38 On Dec 24, 7:54 am, "novicki...(a)gmail.com" <novicki...(a)gmail.com> wrote: > Ok, I feel convinced, that it is NOT safe to read and write to a > variable from multiple threads without locks even if you are not > concerned with missing the latest updates to the values. At the risk > of starting a whole new thread, I guess the best performance would be > to use atomic memory access functions like these: > http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html or > the equivalent. Sadly, atomic operations on x86 CPUs tend to be ridiculously expensive, on the order of 200 clock cycles or so. This is because even a simple atomic increment implements a full CPU barrier. If that's not a problem, then that's probably your best choice. But in my experience, it's almost always possible to architect these kinds of things out of all performance-critical paths. If your threads need to exchange information so frequently that it's causing a performance problem, you're probably doing something wrong. DS
From: Chris M. Thomasson on 24 Dec 2009 17:37 "David Schwartz" <davids(a)webmaster.com> wrote in message news:5053e4d8-d400-4570-92da-581e57438392(a)v15g2000prn.googlegroups.com... On Dec 24, 7:54 am, "novicki...(a)gmail.com" <novicki...(a)gmail.com> wrote: > > Ok, I feel convinced, that it is NOT safe to read and write to a > > variable from multiple threads without locks even if you are not > > concerned with missing the latest updates to the values. At the risk > > of starting a whole new thread, I guess the best performance would be > > to use atomic memory access functions like these: > > http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html or > > the equivalent. > Sadly, atomic operations on x86 CPUs tend to be ridiculously > expensive, on the order of 200 clock cycles or so. This is because > even a simple atomic increment implements a full CPU barrier. If > that's not a problem, then that's probably your best choice. > But in my experience, it's almost always possible to architect these > kinds of things out of all performance-critical paths. If your threads > need to exchange information so frequently that it's causing a > performance problem, you're probably doing something wrong. FWIW, there is distributed counting algorithms in which each thread has it's own counter. A thread simply mutates it's own counter. A reader thread sums up all the counters from each thread in order to get the actual value. It would scale far better than using a single global counter. There are places where these types of algorithms can fit in fairly well.
From: David Schwartz on 25 Dec 2009 02:35 On Dec 23, 10:49 pm, David Schwartz <dav...(a)webmaster.com> wrote: > Sadly, there are similar known cases on x86. If I had the time, I'd > track down the GCC bug report. I found the example I was thinking of. If GCC sees code like this: if(mutex!=NULL) acquire_lock(mutex); for(int i=0; i<100; i++) { some_stuff(i); if(mutex!=NULL) shared_variable+=i; } if(mutex!=NULL) release_lock(mutex); GCC assumes that 'mutex' is most likely not NULL, so it can optimize the code to: saved_copy=shared_variable; if(mutex!=NULL) acquire_lock(mutex); for(int i=0; i<100; i++) { some_stuff(i); shared_variable+=i; } if(mutex!=NULL) release_lock(mutex); else shared_variable=saved_copy; See how this gets a conditional out of the loop? That can actually be an optimization, especially since GCC assumes that a variable is rarely NULL. However, this means that the shared variable is modified (and put back) if mutex is NULL. Ack. GCC does not promise that it won't write to variables your code flow doesn't write to. :( http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31862 DS
From: Moi on 25 Dec 2009 08:05 On Wed, 23 Dec 2009 20:43:10 -0800, David Schwartz wrote: > On Dec 23, 8:34 pm, "novicki...(a)gmail.com" <novicki...(a)gmail.com> wrote: > >> However, I am still curious if due to the design of modern hardware in >> general, it is possible to read a value for the variable that was never >> actually set. The code is trying to set values between 1 and 1 >> million. But could you read a value of 8 million if 8 million is never >> set. The reason would be because you read an incompletely written >> value at some point. Or is the computer such that you can only read >> complete values and never a partially written value. > > Most likely, word tearing would be the only way. I don't know of any > POSIX platform where integers are subject to word tearing. The problem > is that you would be producing software that might break when a new CPU, > operating system, compiler, or something else comes out. On the bright > side, at least on x86, a lot of software makes the assumption that read > and writes to and from aligned integers are atomic (I think it may even > be guaranteed by Intel in the x86 specification) so you certainly > wouldn't be the only thing that would break. > In short: memory reads or writes are atomic for word-sized variables. But "read-modify-write"s are not. eg, you cannot use while (i++) {i--;} .... critical section here ... i--; as a spinlock. I would take a look at the linux kernel source code to see how locks and spinlocks are implemented on various platforms. HTH, AvK
From: David Schwartz on 25 Dec 2009 14:26
On Dec 25, 5:05 am, Moi <r...(a)invalid.address.org> wrote: > In short: memory reads or writes are atomic for word-sized variables. > But "read-modify-write"s are not. Inside the CPU, yes. But there's no guarantee the compiler will issue a read just because your code calls for one and similarly no guarantee the compiler will issue a write just because your code calls for one. So the fact that read and write assembly operations are atomic on the CPU doesn't help a C programmer much unless he has some way to get the compiler to issue the right assembly instructions. (Which, of course, there are ways of doing.) DS |