From: Chris M. Thomasson on 21 Dec 2009 14:18 "Hugo gleaves(a)hotmail.com>" <hugh<underbar> wrote in message news:D8509436-6341-44FE-BB85-385327E8A41B(a)microsoft.com... > The current code does not cater for that scenario. I have designed a way > to > do it, on paper and I have a very solid "IsProcessRunning" function (that > wasn't as trivial to write as it may first seem). > > The idea is that after a number of failed tries (e.g. 100 spins) call > IsProcessRunning (the lock struct currently records the PID and > DateTimeStarted of the locker, these two items uniquely identify a > process) > and if it has died, execute an "unlock on behalf of" type operation. Are you packing all that information into a 32-bit or 64-bit word? I guess you could use the low-part of the typedef struct _FILETIME { DWORD dwLowDateTime; DWORD dwHighDateTime; }FILETIME, *PFILETIME; And the PID in a lock anchor like: struct anchor { DWORD time; DWORD pid; }; And use DWCAS to atomically mutate it on a 32-bit system, and normal CAS to mutate it on a 64-bit system. Is that what you are doing? [...]
From: tanix on 21 Dec 2009 15:53 In article <59F509C6-D5A5-4049-A075-53DF0E2656DE(a)microsoft.com>, =?Utf-8?B?SHVnbw==?= gleaves(a)hotmail.com> <hugh<underbar> wrote: >Since you have drifted off into insults Not really, if you comprehend what was being said. > and did not bother to read my orignal >post Not sure how original was the post I was following up on. > (I have said three times here that I do not in an way suggest user mode >code should use kernel mode mechanisms, Spinlocks ARE a kernel mode mechanism. There is a reason Microsoft suggests to hold them for extremely short periods of time. These guys are not exactly dumb. Anyway, I really have not much time or interest to deal with this. Just one point: realize that by top posting (writing your response BEFORE the stuff you are following up, is one of the most despised ways to post. You are presenting your position out of context and readers do not know what in your response relates to what in the article you are following up. That single thing by itself indicates to me you have a problem with slopiness, and I would not be surprised it will eventually translate into how you design and code things. It is best to reply to some specific statement in place. Beyond that, do as thou wilt. Cya. > but you simply didn't bother to read >that) I will not bother commenting on what you wrote, except for this: >You seem to think that a spinlock is is something that only the kernel can >do? is that your understanding of sofware engineering? Yes, the kernel has a >spinlock API designed for use within the kernel. > >But anyone can write an equivalent API for user mode, all you need to do is >use the Win32 InterlockXXX functions and you can easily create a real, >working spinlock API. > >This is what I have been discussing. > >What if my shared memory contains thousands of distinct data items? What if >we have a need to read/update these in real-time from multiple processes each >running many threads? perhaps tens of thousands of times per second? > >Do you naively think a Mutex or two or a hundred is going to help? > >Is it not better to simply better to define a tny struct typedef "SpinLock" >and have a tiny API that uses InterlockXXX to "lock" and the struct? spinning >if the item is already locked? > >Is this not a very good solution especially if we KNOW that the updates are >very short indeed? > >Once again I am not in way suggesting access too the kernel, you have failed >to read my posts. > >Hugh > > > > >Hugh > > >"tanix" wrote: > >> In article <6B8D1539-D8C5-44AF-9646-261E8DDBEEBB(a)microsoft.com>, > =?Utf-8?B?SHVnbw==?= gleaves(a)hotmail.com> <hugh<underbar> wrote: >> >Well let me be perfectly clear I am not suggesting or advocating any >> >interaction with any kernel mode mechanisms. >> > >> >If you work with shared mapped memory a great deal as we do, then you need >> >very fast means of locking shared data structures, especially of this is >> >happening very often. >> >> Fine. I had to deal with that kind of thing. >> And then? >> >> >For example we have a shared heap library that we designed, it obvioulsy >> >must handle safe updates to heap control metadata by many threads and many >> >processes at the same time, perhaps tens of thousands of times a second. >> >> Understood. I had to deal with a similar situation on a global >> voice messaging system, just like many people do with their products. >> >> >A mutex simply can't deliver the speed and efficiency. >> >> How do you know where is your REAL bottlenecks? >> >> Are you sure your system architecture is up to snuff? >> >> May be the whole thing is not correctly archtected? >> >> You can not just go for any critical point without seeing the whole >> picture. ESPECIALLY, once you start suggesting things that FUNDAMENTALLY >> violate the generations old principles of separation of kernel and >> user mode? >> >> Do you think people that created these concepts are stoopid? >> Infantile? >> >> You know better? >> >> Well, can YOU propose the solution then? >> Do you think the idea of separating the user mode and kernel mode >> is flawed? How so? >> >> >Getting/releaseing the Mutex costs far far more CPU cycles then updating >> >four fields in a little structure and a pointer or two. >> >> Yep, that is what ALL of them say. >> It is called looking for the trees and not seeing the forest. >> OBVIOUSLY. >> >> Do you have an estimate on what kinds of problems your approach may >> cause the the whole LIVELIHOOD of your system? >> >> Do you have an estimate on how many light years is it going to take >> to deal with problems are you going to create as a result of your >> great "improvement"? >> >> How many man years others would have to spend, forever fixing the >> problems are you going to create, pretty much inevitably? >> >> Do you have a mathematical proof that your concept is going to be >> stable? >> >> What DO you have beyond these small greedy things, tring to >> squeeze some CPU cycles even from the places that clearly have >> a label: >> >> DO NOT ENTER. KERNEL LEVEL. NO MORONS ALLOWED. >> >> Enough. >> >> >Most people perhaps use shared memory for the odd neat trick or some useful >> >little mechanism, but if you rely on it as a vehicle for sharing/storing > many >> >gigs of data then you really do need speedy locks. >> > >> >Hugo >> > >> > >> >"tanix" wrote: >> > >> >> In article <OFiPAJ$fKHA.1112(a)TK2MSFTNGP04.phx.gbl>, "Alexander Grigoriev" >> > <alegr(a)earthlink.net> wrote: >> >> >By definition, spinlock is a busy loop. It only makes sense to do if the >> >> >spinlock owner is guaranteed to never get preempted by another thread. In > >> >> >kernel mode, this is done by setting IRQL to DISPATCH_LEVEL. In user > mode, >> >> >this is not possible. >> >> >> >> Good point. >> >> >> >> Actually, when I was doing a contract with Intel on a multi-gigabit >> >> network card, there was one guy that was suggesting to do something >> >> totally out of the wall in my view. Pretty much the same idea as in >> >> this thread. >> >> >> >> They were trying to squeze out as much performance as possible. >> >> So, this "smart", fat blob, came up with the idea to manipulate >> >> the kernel level synchronization objects from the app level, such >> >> as resetting the count of kernel level semaphores under some contitions. >> >> >> >> On a weekly project meeting I told him: you will never prove this >> >> thing is going to work. That is the whole point with semaphores. >> >> They have been mathematically proven to work. Once you start resetting >> >> the counts at will, outside of the normal semaphore mechanics, >> >> you'll be fixing this thing for years to come. >> >> >> >> The result? >> >> >> >> They fired me. Not him. >> >> I wonder if they were ever able to make that network card to work. >> >> >> >> When they gave me this project to create a hack to manipulate the >> >> kernel level structures from the user mode, and gave me a couple >> >> of weeks to do that, I could not believe these smart donkeys. >> >> >> >> It takes YEARS to work on things like these. These are some of >> >> the central concepts in O/S design and the very idea of separation >> >> of kernel and user mode. I nearly flipped out when manager told >> >> me my "project". >> >> >> >> Some people really need their brains examined. >> >> Where did they get their estimates on "improving performance" >> >> by slaughtering some of the most fundamental concepts in o/s >> >> design and architecture just escapes me. >> >> >> >> >CRITICAL_SECTION provide an option of limited spinning in case of >> >> >contention, if you expect that the CS will mostly be held for very short >> >> >time. >> >> > >> >> > >> >> >"Hugo gleaves(a)hotmail.com>" <hugh<underbar> wrote in message >> >> >news:574400EE-AFAF-4620-BDDD-7E49608392E0(a)microsoft.com... >> >> >> The idea of a spinlock is to ensure secure access/updates to some > datum, >> >> >> usually for operations that are very very brief. So although some other >> >> >> thread/process may have to "spin" for a time, the time is deemed to > brief >> >> >> as >> >> >> to be not a serious performance issue. >> >> >> >> >> >> The kind of operations that might be suited to a spinlock protection > are: >> >> >> >> >> >> 1) Updating several fields atomically in some shared structure. >> >> >> 2) Adding/removing an entry from a shared list. >> >> >> >> >> >> As Don says a Mutex is much more sophisticated and expensive and > although >> >> >> very uself in Win32 programming is no match for a spinlock when it > comes >> >> >> to >> >> >> raw speed (this is why device drivers use them). >> >> >> >> >> >> The real question here is why has MS not provided developers with a >> >> >> spinlock >> >> >> API for user mode code? >> >> >> >> >> >> I had to hand craft such a beast in C, it has to support: >> >> >> >> >> >> 1) Nested locking (a spinlock can be acquired 'n' times and must then > be >> >> >> released 'n' times). >> >> >> 2) Read/Write locking modes. >> >> >> 3) Not end up dragging its heels because of these needs. >> >> >> >> >> >> If you do a lot of work with shared/mapped memory as we do, then such > an >> >> >> API >> >> >> is very important, if you rely on Mutexes it will cost dearly, also the >> >> >> interlock functions alone wont let you easily do the above (but they do >> >> >> provide the foundation). >> >> >> >> >> >> I've been interested for some time in seeing open source or public > domain >> >> >> implementations of a user-mode spinlock API (one that is truly tested > on >> >> >> multicore H/W) but there seems to be absolutely nothing, what we have >> >> >> seems >> >> >> to be pretty good but it is pretty complex and I worry sometimes... >> >> >> >> >> >> ;) >> >> >> >> >> >> >> >> >> "o0Zz" wrote: >> >> >> >> >> >>> Thank you very much ! >> >> >>> >> >> >>> "Daniel Terhell" <daniel(a)resplendence.com> a ecrit dans le message de >> >> >>> groupe >> >> >>> de discussion : emRp7$LdKHA.2188(a)TK2MSFTNGP04.phx.gbl... >> >> >>> > "m" <m(a)b.c> wrote in message >> >> >>> > news:eP9kilIdKHA.6096(a)TK2MSFTNGP02.phx.gbl... >> >> >>> >> The loop will consume all 100% of a single CPU until the lock is >> >> >>> >> acquired, the thread is interrupted by a higher priority thread or > the >> >> >>> >> thread quantum ends (and the OS schedules another thread). >> >> >>> >> >> >> >>> > >> >> >>> > That information is incorrect, on Windows a spinlock raises to >> >> >>> > DISPATCH_LEVEL (or higher) so it is not possible to get interrupted > by >> >> >>> > any >> >> >>> > thread. It's not even possible for the thread to finish its quantum >> >> >>> > while >> >> >>> > it spins or for the dispatcher to schedule another thread on the > same >> >> >>> > CPU. >> >> >>> > It can get interrupted by an ISR but it will return (to the spinning >> >> >>> > state) as soon as the ISR has executed and returns control. >> >> >>> > >> >> >>> > //Daniel >> >> >>> > >> >> >>> > >> >> >>> . >> >> >> >> -- >> >> Programmer's Goldmine collections: >> >> >> >> http://preciseinfo.org >> >> >> >> Tens of thousands of code examples and expert discussions on >> >> C++, MFC, VC, ATL, STL, templates, Java, Python, Javascript, >> >> organized by major topics of language, tools, methods, techniques. >> >> >> >> . >> >> >> >> -- >> Programmer's Goldmine collections: >> >> http://preciseinfo.org >> >> Tens of thousands of code examples and expert discussions on >> C++, MFC, VC, ATL, STL, templates, Java, Python, Javascript, >> organized by major topics of language, tools, methods, techniques. >> >> . >> -- Programmer's Goldmine collections: http://preciseinfo.org Tens of thousands of code examples and expert discussions on C++, MFC, VC, ATL, STL, templates, Java, Python, Javascript, organized by major topics of language, tools, methods, techniques.
From: m on 21 Dec 2009 19:37 Most likely the OP is assuming that the PID & start time (FILETIME) can be write guarded by the lock itself and then read as volatiles. The problem with this is that multiple readers can contend in the unlock path and a partial update (ie PID but not start time) is possible - leading to the conclusion that the lock holder is dead even though they aren't. This is the kind of bug that will hardly ever show up in the field, especially if processes don't die and the call pattern is well diversified, but will cause a crash or corruption if it does occur. IMHO, it is not possible to implement a reliable handler for process termination in this kind of algorithm without either KM support (slow) or a watchdog (fragile). The better way is usually to implement lock timeout / ABEND in the other tasks when data corruption might otherwise occur; but that depends on your particular operating environment. "Chris M. Thomasson" <no(a)spam.invalid> wrote in message news:aQPXm.72902$de6.39506(a)newsfe21.iad... > "Hugo gleaves(a)hotmail.com>" <hugh<underbar> wrote in message > news:D8509436-6341-44FE-BB85-385327E8A41B(a)microsoft.com... >> The current code does not cater for that scenario. I have designed a way >> to >> do it, on paper and I have a very solid "IsProcessRunning" function (that >> wasn't as trivial to write as it may first seem). >> >> The idea is that after a number of failed tries (e.g. 100 spins) call >> IsProcessRunning (the lock struct currently records the PID and >> DateTimeStarted of the locker, these two items uniquely identify a >> process) >> and if it has died, execute an "unlock on behalf of" type operation. > > Are you packing all that information into a 32-bit or 64-bit word? I guess > you could use the low-part of the > > typedef struct _FILETIME { > DWORD dwLowDateTime; > DWORD dwHighDateTime; > }FILETIME, *PFILETIME; > > > And the PID in a lock anchor like: > > > struct anchor > { > DWORD time; > DWORD pid; > }; > > > > And use DWCAS to atomically mutate it on a 32-bit system, and normal CAS > to mutate it on a 64-bit system. Is that what you are doing? > > [...]
From: m on 21 Dec 2009 19:49 I have no experience with a lock of that type. I does assume that either the set of all threads that will access the lock is known beforehand, or that a new thread must register itself, and wait for a sequence point to inject its new lock, before accessing the object for the first time. At a high level, the algorithm relies on signals (writer pending, reader pending) + a status code to maintain the lock. The only concern in my mind about this approach is the poor performance on NUMA. But as I said, it is a minimal implementation! The best attribute is that reader acquisition when uncontended by writers (99%+ of actual cases) requires only a single atomic increment. "Chris M. Thomasson" <no(a)spam.invalid> wrote in message news:qLFXm.1313$8e4.843(a)newsfe03.iad... > "m" <m(a)b.c> wrote in message news:uKbFWtcgKHA.2104(a)TK2MSFTNGP05.phx.gbl... >> "Chris M. Thomasson" <no(a)spam.invalid> wrote in message >> news:Z0_Wm.9220$ft1.2525(a)newsfe10.iad... >>> "m" <m(a)b.c> wrote in message >>> news:u1ahCxEgKHA.2780(a)TK2MSFTNGP05.phx.gbl... >>>> FYI reference implementations of shared reader, single writer locks are >>>> widely available on usernet - the archives of this NG have at least two >>>> threads on the topic and I am sure there is more out there. >>>> >>>> IMHO, a minimal implementation can be created with three volatile long >>>> variable in less than 50 lines of code so the APIs provided by MS >>>> (AcquireSRWLockExclusive and friends) are almost redundant >>>> >>>> http://msdn.microsoft.com/en-us/library/aa904937(VS.85).aspx >>> >>> What implementation are you referring to? > >> It is a custom proprietary algorithm and I am not at liberty to reveal >> the source or owner. > > Okay. > > > > >> It is not a fair algorithm and guarantees only forward progress in a >> non-deterministic way. In practice, it works well for the specific job >> it was designed to do - guarding access by _many_ fast readers and _few_ >> slow writers to in memory indices for a data cache with remote coherency. > > Have you tried out a distributed rw-mutex? The idea is simple in that each > thread has a lock and read-access is comprised of a thread taking it's own > lock. Write access is achieved when a thread takes all the locks. > > [...]
From: Alexander Grigoriev on 21 Dec 2009 22:17
As implemented, yes. Mostly because it keeps a single wakeup event handle. But if you roll your own implementation, with an event handle duplicated for each client process, you can make it work. Moreover, if you create it in your "original" process as a inheritable handle, you'll be able to use the same handle in all descendants of the process, without need to duplicate. "Hugo gleaves(a)hotmail.com>" <hugh<underbar> wrote in message news:697530AB-81F0-4B1B-9B4D-8E39B29236B9(a)microsoft.com... >A critical section is useless if the data to be locked is in shared memory, > shared by multiple processes. > > Hugh > > > "Alexander Grigoriev" wrote: > >> CRITICAL_SECTION is VERY fast when there is no contention. >> IF ther is a contention, it gives you _limited_ busy spin wait, and then, >> if >> it doesn't work out, you have to go to kernel anyway. >> |