Prev: Howto convert IPv6 from WTSClientAddress to string?
Next: Application Error, The memory could not be "read". How to debug?
From: Chris Thomasson on 14 Nov 2007 12:29 "Ben Voigt [C++ MVP]" <rbv(a)nospam.nospam> wrote in message news:uEfnFpsJIHA.4044(a)TK2MSFTNGP02.phx.gbl... > > "Pops" <dude(a)nospamnospamnospam.com> wrote in message > news:%23tndYNsJIHA.4712(a)TK2MSFTNGP04.phx.gbl... >> Ben (C++ MVP) recommended using the Interlocked Singly Linked Lists API: >> >> http://msdn2.microsoft.com/en-us/library/ms686962.aspx >> >> as a more efficient synchronization method for a recent thread discussion >> in regards to a FIFO queue design. >> >> If Ben is reading or if anyone else is familiar with this API, I have a >> few questions: I have a question as well. Do you know if Microsoft uses a pointer compression algorithm to implement their SList API on 64-bit systems? I think I remember Neill Clift, who works on the Windows Kernel, mention something about that, but I can't seem to find the message on Google right now.
From: Chris Thomasson on 14 Nov 2007 12:55 "Chris Thomasson" <cristom(a)comcast.net> wrote in message news:Q9adnRAzYrses6banZ2dnUVZ_oWdnZ2d(a)comcast.com... > > "Pops" <dude(a)nospamnospamnospam.com> wrote in message > news:%23tndYNsJIHA.4712(a)TK2MSFTNGP04.phx.gbl... >> Ben (C++ MVP) recommended using the Interlocked Singly Linked Lists API: >> >> http://msdn2.microsoft.com/en-us/library/ms686962.aspx >> >> as a more efficient synchronization method for a recent thread discussion >> in regards to a FIFO queue design. >> >> If Ben is reading or if anyone else is familiar with this API, I have a >> few questions: >> >> First, based on what I see, this is for LIFO designs (i.e., stacks)? Not >> FIFO. Am I missing something? I only see Push/Pop functions, no >> head/tail functions. >> >> Seconds, can you tell how it works with IPC, can it work for IPC? > > FWIW, if you want to create a blocking lock-less queue (e.g., consumer > waits while queue is empty), well here is an algorithm I invented a while > back: > > http://groups.google.de/group/comp.programming.threads/msg/632b6bdc2a137459 > [...] I should point out that the dwcas used in that algorithm uses an IBM-Style implementation. It automatically updates the comprand on failure. Here is my implementation in x86: int np_ac_i686_atomic_dwcas_fence(void*, void*, void*); ..globl np_ac_i686_atomic_dwcas_fence np_ac_i686_atomic_dwcas_fence: pushl %esi pushl %ebx movl 16(%esp), %esi movl (%esi), %eax movl 4(%esi), %edx movl 20(%esp), %esi movl (%esi), %ebx movl 4(%esi), %ecx movl 12(%esp), %esi lock cmpxchg8b (%esi) jne np_ac_i686_atomic_dwcas_fence_fail xorl %eax, %eax popl %ebx popl %esi ret np_ac_i686_atomic_dwcas_fence_fail: movl 16(%esp), %esi movl %eax, (%esi) movl %edx, 4(%esi) movl $1, %eax popl %ebx popl %esi ret
From: Pops on 14 Nov 2007 13:00 Ben Voigt [C++ MVP] wrote: > We were discussing a multiple writer, single consumer queue. For this the > writers each use the InterlockedPushEntrySList function, while the consumer > uses the InterlockedFlushSList function to retrieve the entire queue as an > atomic operation. The consumer is then responsible for iterating through > the list of items retrieved in FIFO order. Ben, just a few more questions. Sorry if I seem dense about this to you, but I would to like to get this clarification. When I see you throwing in the multiple writer, single consumer queue premise above, I am now getting the idea if you introduced this Interlocked Singly Linked List API to me for this particular scenario only - one consumer. In other words, is this API thread safe? The docs say "nonblocking algorithm to provide atomic synchronization" which I presume it is using some internal lock-free shared memory method. But it is protected from multiple threads? I don't see how it can guard a flush once the initial pointer is read without some lock. TIA -- HLS
From: Chris Thomasson on 14 Nov 2007 14:16 [comp.programming.threads included because the discussion is on lock-free algorithms, and there no place like cpt wrt that subject... ;^] "Pops" <dude(a)nospamnospamnospam.com> wrote in message news:%23uMo9huJIHA.6108(a)TK2MSFTNGP03.phx.gbl... > Ben Voigt [C++ MVP] wrote: > >> We were discussing a multiple writer, single consumer queue. For this >> the writers each use the InterlockedPushEntrySList function, while the >> consumer uses the InterlockedFlushSList function to retrieve the entire >> queue as an atomic operation. The consumer is then responsible for >> iterating through the list of items retrieved in FIFO order. You can use multiple consumers with that paticiular technique: http://groups.google.com/group/comp.arch/msg/6ac06e17e5138820 http://groups.google.ca/group/comp.programming.threads/msg/a359bfb41c68b98b > Ben, just a few more questions. > > Sorry if I seem dense about this to you, but I would to like to get this > clarification. > > When I see you throwing in the multiple writer, single consumer queue > premise above, I am now getting the idea if you introduced this > Interlocked Singly Linked List API to me for this particular scenario > only - one consumer. > > In other words, is this API thread safe? The docs say "nonblocking > algorithm to provide atomic synchronization" which I presume it is using > some internal lock-free shared memory method. > > But it is protected from multiple threads? The API is 100% atomically thread-safe. You can hit the following functions with any number of threads concurrently: ______________________________________ InterlockedFlushSList InterlockedPopEntrySList InterlockedPushEntrySList QueryDepthSList ______________________________________ BTW, this algorithm has been around for ages as you probably know. In fact, an IBM Principals of Operations give a more detailed description of why it atomically thread-safe: http://publibz.boulder.ibm.com/epubs/pdf/dz9zr002.pdf (Appendix A/Multi-Processing Examples/Free-Pool Manipulation) Did that help clear some things up for you? > I don't see how it can guard a flush once the initial pointer is read > without some lock. The flush is implemented with double-width compare-and-swap function that can atomically mutate the entire SLIST_HEADER data-structure: http://groups.google.co.uk/group/comp.programming.threads/browse_frm/thread/b5bedbe04e939f55 (I would advise to read entire thread...) On x86-32 the instruction is CMPXCHG8B. On some x86-64's the instruction is CMPXCHG16B. On the Itanium, there is a weird instruction called CMP8XCHG16B. This is instruction is tricky to use because you cannot count on it being available to most 64-bit systems. For instance, I do a lot of work on the SPARC that has the CASX instruction which works on 64-bit datum in 32-bit mode, however, for SPARC64 the CASX instruction only works on 64-bit datum. This can be a royal pain in the neck if your into inventing new lock-free algorithms: http://groups.google.com/group/comp.arch/browse_frm/thread/71f8e0094e353e5 (I would advise to read entire thread...) There are some nifty tricks you can do to get around this nightmare. I outline some of them here: http://groups.google.com/group/comp.arch/msg/a3ebfe80363a4399 (examples toward end of message...) Any thoughts on this?
From: Chris Thomasson on 14 Nov 2007 14:50
"Chris Thomasson" <cristom(a)comcast.net> wrote in message news:uqednbJACNZ0sKbanZ2dnUVZ_ualnZ2d(a)comcast.com... > "Pops" <dude(a)nospamnospamnospam.com> wrote in message > news:%23tndYNsJIHA.4712(a)TK2MSFTNGP04.phx.gbl... [...] >> Seconds, can you tell how it works with IPC, can it work for IPC? > > Yes. You can create a node cache in shared memory and allocate/deallocate > your SLIST_ENTRY from there. Something like: > > > (quick and dirty pseudo-code sketch) > ____________________________________________________________ > [...] > ____________________________________________________________ > > > > > Also, for what its worth, here is how Microsoft implements the SList API > under 32-bit systems: > > > > typedef struct slist_anchor_s slist_anchor; > typedef struct slist_node_s slist_node; > > struct slist_node_s { > slist_node* nx; > }; > > struct slist_anchor_s { > slist_node* head; > /* monotonic counter to avoid the infamous aba problem */ > __int32 aba; > }; > > > void > slist_anchor_init( > slist_anchor* const _this > ) { > _this->head = 0; > _this->aba = 0; > } > > > void > slist_anchor_push( > slist_anchor* const _this, > slist_node* node > ) { > slist_node* cmp; > do { > cmp = _this->head; > node->nx = cmp; > } while (! CAS(&_this->anchor, cmp, node)); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ should read as: } while (! CAS(&_this->head, cmp, node)); > } I forgot to show the flush. Here is how to do that: slist_node* slist_anchor_flush( slist_anchor* const _this ) { return SWAP(&_this->head, 0); } The reason I don't need to bump the monotonic aba counter for the flush is because its immune to the problem in the first place, here is why: LIFO producers are inherently not affected by aba. On the other hand, LIFO consumers are affected, but only if there is a "compare" type method being used for synchronization. Using an atomic swap to set the stack anchor to zero can get around aba wrt LIFO lock-free stacks. Was that similar to your initial solution? > slist_node* > slist_anchor_pop( > slist_anchor* const _this > ) { > slist_anchor cmp, xchg; > do { > cmp = *_this; > if (! cmp.head) { return 0; } > > /* hazard! refer to PDR algorihtm for more details... */ > xchg.head = cmp.head->nx; > > /* increment ABA */ > xchg.aba = cmp.aba + 1; > > } while (! DWCAS(_this, &cmp, &xchg)); > > return cmp.head; > } > > > > > Any thoughts? BTW, would anybody like to see an actual implementation of this in shared memory? If so, I will stick it on my to-do list. ;^) |