Interlocked Singly Linked Lists (for Ben or anyone else). [Win32 Kernel]

Prev: Howto convert IPv6 from WTSClientAddress to string?
Next: Application Error, The memory could not be "read". How to debug?

From: Chris Thomasson on 14 Nov 2007 22:44

"Chris Thomasson" <cristom(a)comcast.net> wrote in message
news:T7Kdnd74edSNNabanZ2dnUVZ_sqinZ2d(a)comcast.com...
> "m" <m(a)b.c> wrote in message news:eFgEsjyJIHA.4228(a)TK2MSFTNGP02.phx.gbl...
>> BTW: This form is not a blog - you should really try to consolidate your
>> posts ;)
>>
>>
>>
>> As you say: for LIFO or unordered lists, SCAS is sufficient in the
>> general case.
>
> Single CAS wrt LIFO is only good for the method that Ben uses:

Well, you can also use Single CAS if you provide some other solution to the
ABA problem. The method that Bens code uses is one of many.

From: Joe Seigh on 15 Nov 2007 06:16

m wrote:
> BTW: This form is not a blog - you should really try to consolidate your
> posts ;)
>
>
>
> As you say: for LIFO or unordered lists, SCAS is sufficient in the general
> case. For FIFO or binary data structures, an MCAS is required (DCAS or
> TCAS). Under certain conditions a control variable can be used with CAS to
> generate FIFO � the assumption being that fewer than 2^32 (or 2^64)
> operations are performed on the list between the time that the control
> variable is checked and when the CAS is executed.
>
>
>
> Additionally, on windows, FIFO & LIFO are mostly useless for multiple
> producer / multiple consumer modes because the strict ordering that they
> imply is lost when thread pre-emption occurs. So other than the vague
> benefit of LIFO on memory residence (recently used pages are more likely to
> be in the working set), lock-free ordered lists should be avoided.
>
>
Lock-free is a forward progress guarantee. It has no effect on semantics.
You shouldn't be able to tell the difference between lock-free and locked
queues. The observed order of dequeues will be the same order of any
corresponding enqueues that were observed. This is for queues which is
what are inferred when you use the term producer/consumer. If you're talking
about performing arbitrary functions on an arbitrary collection, e.g.
ordered list, then they're really just the same as an arbitrary data
structure where the operations on them consist of multiple accesses on
the data. You need some form of synchronization to coordinate those
accesses. This could be lock based, MCAS, or STM.

Thread preemption was only a significant distinction when stuff ran on
single processor systems and only one thread could run at a time. That's
no longer true on multi-cores.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

From: Pops on 15 Nov 2007 07:27

Joe Seigh wrote:
> m wrote:
>> BTW: This form is not a blog - you should really try to consolidate
>> your posts ;)
>>
>>
>>
>> As you say: for LIFO or unordered lists, SCAS is sufficient in the
>> general case. For FIFO or binary data structures, an MCAS is required
>> (DCAS or TCAS). Under certain conditions a control variable can be
>> used with CAS to generate FIFO � the assumption being that fewer than
>> 2^32 (or 2^64) operations are performed on the list between the time
>> that the control variable is checked and when the CAS is executed.
>>
>>
>>
>> Additionally, on windows, FIFO & LIFO are mostly useless for multiple
>> producer / multiple consumer modes because the strict ordering that
>> they imply is lost when thread pre-emption occurs. So other than the
>> vague benefit of LIFO on memory residence (recently used pages are
>> more likely to be in the working set), lock-free ordered lists should
>> be avoided.
>>
>>
> Lock-free is a forward progress guarantee. It has no effect on semantics.
> You shouldn't be able to tell the difference between lock-free and locked
> queues. The observed order of dequeues will be the same order of any
> corresponding enqueues that were observed. This is for queues which is
> what are inferred when you use the term producer/consumer. If you're
> talking
> about performing arbitrary functions on an arbitrary collection, e.g.
> ordered list, then they're really just the same as an arbitrary data
> structure where the operations on them consist of multiple accesses on
> the data. You need some form of synchronization to coordinate those
> accesses. This could be lock based, MCAS, or STM.
>
> Thread preemption was only a significant distinction when stuff ran on
> single processor systems and only one thread could run at a time. That's
> no longer true on multi-cores.

Thanks for bringing back the focus and a level of talking I can relate
to. :-)

(This is a side note, I believe at least under WIN32, it is possible to
progammatically delegate your threads so a specific processor. Never
tried it myself, but I recall a reference to it, in this case, under the
guise of potential security conflicts. Maybe that as for hyper-thread
processors only.).

I should make it clear, especially for Chris, that all our work is 100%
for WIN32 only. Very little dependency on its sub-systems specifics,
NT-based only methods, staying clear of APIs with hidden logic,
experimental ideas and/or fuzzy considerations related to HW.

With our current charter to improve performance/scalability, we are
looking at taking advantage of some of the NT-based only offerings.

Part of the issue is the design decision made in 1996 (so those OS/HW
realities of the day were important) to assure quality, maybe at the
expense of performance, I can broadly say, our RPC client/server
framework uses critical sections, rw-locks in short, semaphores, etc, in
much of our multi-context, multi-threaded, data queuing, circular
buffers, FIFO/LIFO, etc, synchronization needs.

So there is much room for improvement in many areas.

While I look for those NT-based improvements, I also have to measure
disk vs memory (or both) queuing, local vs remote IPC, context switching
issues, single vs multi vs hyper-threaded, affinity considerations, etc.
Most of the time, we look for and/or design generalize, reusable,
"practical" solutions, methods.

That said, this Interlocked Singly List API seems to be useful but it
still seems to require a lock-sync when using its FLUSH function, not
for to be confused a PURGE consideration. To me, a FLUSH is a purge if
the data is not available or saved during the flush.

In this case, it provides a pointer to a list. To me, in order to be
thread safe, the function has to be make a COPY of the data and provide
a new pointer list, then it can safely purge the internal list it is
maintaining.

That is basically all I was looking for here, a confirmation of the
workings of this flush.

The order of the list is only important depending on how on where it is
implemented. As as suggested by Ben, it would be easier to transform a
LIFO to a FIFO. But we can not be apply in areas were FIFO is required
if the LIFO order can not be guaranteed.

Thanks

--
HLS

From: Ben Voigt [C++ MVP] on 15 Nov 2007 10:28

"Chris Thomasson" <cristom(a)comcast.net> wrote in message
news:I5-dnZi55r10FabanZ2dnUVZ_judnZ2d(a)comcast.com...
> "Ben Voigt [C++ MVP]" <rbv(a)nospam.nospam> wrote in message
> news:O2Xn5DxJIHA.3400(a)TK2MSFTNGP03.phx.gbl...
>>> This explains the ABA race-condition in detail. After reading that, you
>>> will
>>> understand why its important to have some sort of solution to the ABA.
>>> DWCAS
>>> is just one way to do it.
>>
>> After thinking about it for a bit, I do not think ABA is actually a
>> problem.
>>
>> The only way that InterlockedCompareExchange will succeed is if the old
>> head pointer is what I have stored in the next-node pointer of the new
>> head, therefore I have correctly inserted a new node regardless of what
>> manipulations took place in the meantime.
>>
>> Please take a look at my queue implementation and let me know if you see
>> any problem (I realize it isn't truely lock-free until I replace the
>> allocator):
>>
> [...]
>
> Good news! ;^)
>
>
> From what I could gather at the first glance is that you have solved the
> ABA problem by using the pattern described here:
>
> http://groups.google.ca/group/comp.programming.threads/msg/49a5db082507cb87
>
> http://groups.google.ca/group/comp.programming.threads/msg/a359bfb41c68b98b
>
> http://groups.google.ca/group/comp.programming.threads/browse_frm/thread/fafe5f528700407
>
>
> If you stick to that, you will be able to deallocate nodes and not have to
> worry about it. BTW, when you do it this way, the Get function is not only
> lock-free... Its wait-free. That's a nice property to strive for
> sometimes. As long as your method tries to cut down on overheads.

Thanks for the review. I was pretty sure it was correct but it's very nice
to have concurrence from an expert.

As far as I'm concerned, the only (tenable) way to write multi-threaded code
is to have a couple of lock-free thread-safe components (like my squeue)
that fit entirely on a single page, and elsewhere avoid accessing the same
variable from more than one thread. In any non-trivial project, trying to
protect shared data structures any other way is just too complex to reason
about, with too many opportunities for deadlocks and/or race conditions, and
even if those are solved, then contention to a degree that removes the
benefits of multithreading.

>
> Humm... Well, FWIW, if your into low-overhead queuing, I suggest you skim
> through the following message and follow links:
>
> http://groups.google.com/group/comp.arch/msg/a7501814f19731cd
>

Thanks, I'll take a look.

>
> What do you think about using distributed message passing algorithms w/
> work-stealing scheme for natural auto-balance? I have a library I can
> license... Actually, never mind that.
>
> ;^)

From: Ben Voigt [C++ MVP] on 15 Nov 2007 10:29

"Chris Thomasson" <cristom(a)comcast.net> wrote in message
news:ReednVXXxKKhHabanZ2dnUVZ_uGknZ2d(a)comcast.com...
> "Ben Voigt [C++ MVP]" <rbv(a)nospam.nospam> wrote in message
> news:O2Xn5DxJIHA.3400(a)TK2MSFTNGP03.phx.gbl...
>>> This explains the ABA race-condition in detail. After reading that, you
>>> will
>>> understand why its important to have some sort of solution to the ABA.
>>> DWCAS
>>> is just one way to do it.
>>
>> After thinking about it for a bit, I do not think ABA is actually a
>> problem.
>
> It is. There are numerous papers that deal with the subject. If you have a
> soultion for it thats great. I look forward to looking at the code you
> attached.

I meant "a problem for me". Sorry about the overly general statement.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Howto convert IPv6 from WTSClientAddress to string?
Next: Application Error, The memory could not be "read". How to debug?