From: Pavel A. on 19 Dec 2009 17:08 "tanix" <tanix(a)mongo.net> wrote in message news:hghveu$cj5$2(a)news.eternal-september.org... > And how would you describe someone that weighs about 3 times the > size of a "normal" human being? > > That thing can hardly walk! Hmm. How about "a wide-profiled professional" ? Apologies, my previous advice was wrong... you probably need to see a doctor. --pa
From: m on 20 Dec 2009 18:22 It is a custom proprietary algorithm and I am not at liberty to reveal the source or owner. It is not a fair algorithm and guarantees only forward progress in a non-deterministic way. In practice, it works well for the specific job it was designed to do - guarding access by _many_ fast readers and _few_ slow writers to in memory indices for a data cache with remote coherency. "Chris M. Thomasson" <no(a)spam.invalid> wrote in message news:Z0_Wm.9220$ft1.2525(a)newsfe10.iad... > "m" <m(a)b.c> wrote in message news:u1ahCxEgKHA.2780(a)TK2MSFTNGP05.phx.gbl... >> FYI reference implementations of shared reader, single writer locks are >> widely available on usernet - the archives of this NG have at least two >> threads on the topic and I am sure there is more out there. >> >> IMHO, a minimal implementation can be created with three volatile long >> variable in less than 50 lines of code so the APIs provided by MS >> (AcquireSRWLockExclusive and friends) are almost redundant >> >> http://msdn.microsoft.com/en-us/library/aa904937(VS.85).aspx > > What implementation are you referring to? > > > > FWIW, here is the best I could do so far: > > > http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/65822 > (please read all...) > > > This algorithm is 100% fair and starvation free, and has wait-free > fast-paths for readers. It even has wait-free starvation freedom for > writers as well in the case in which I removed the CRITICAL_SECTION: > > > http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/65822/reply/87849 > > > Also, this performs better than a lot of native POSIX rw-mutex > implementations out there. Unfortunately, the Windows implementation is > completely non-deterministic wrt access patterns...
From: Chris M. Thomasson on 21 Dec 2009 02:02 "m" <m(a)b.c> wrote in message news:uKbFWtcgKHA.2104(a)TK2MSFTNGP05.phx.gbl... > "Chris M. Thomasson" <no(a)spam.invalid> wrote in message > news:Z0_Wm.9220$ft1.2525(a)newsfe10.iad... >> "m" <m(a)b.c> wrote in message >> news:u1ahCxEgKHA.2780(a)TK2MSFTNGP05.phx.gbl... >>> FYI reference implementations of shared reader, single writer locks are >>> widely available on usernet - the archives of this NG have at least two >>> threads on the topic and I am sure there is more out there. >>> >>> IMHO, a minimal implementation can be created with three volatile long >>> variable in less than 50 lines of code so the APIs provided by MS >>> (AcquireSRWLockExclusive and friends) are almost redundant >>> >>> http://msdn.microsoft.com/en-us/library/aa904937(VS.85).aspx >> >> What implementation are you referring to? > It is a custom proprietary algorithm and I am not at liberty to reveal the > source or owner. Okay. > It is not a fair algorithm and guarantees only forward progress in a > non-deterministic way. In practice, it works well for the specific job it > was designed to do - guarding access by _many_ fast readers and _few_ slow > writers to in memory indices for a data cache with remote coherency. Have you tried out a distributed rw-mutex? The idea is simple in that each thread has a lock and read-access is comprised of a thread taking it's own lock. Write access is achieved when a thread takes all the locks. [...]
From: Hugo gleaves on 21 Dec 2009 06:55 Since you have drifted off into insults and did not bother to read my orignal post (I have said three times here that I do not in an way suggest user mode code should use kernel mode mechanisms, but you simply didn't bother to read that) I will not bother commenting on what you wrote, except for this: You seem to think that a spinlock is is something that only the kernel can do? is that your understanding of sofware engineering? Yes, the kernel has a spinlock API designed for use within the kernel. But anyone can write an equivalent API for user mode, all you need to do is use the Win32 InterlockXXX functions and you can easily create a real, working spinlock API. This is what I have been discussing. What if my shared memory contains thousands of distinct data items? What if we have a need to read/update these in real-time from multiple processes each running many threads? perhaps tens of thousands of times per second? Do you naively think a Mutex or two or a hundred is going to help? Is it not better to simply better to define a tny struct typedef "SpinLock" and have a tiny API that uses InterlockXXX to "lock" and the struct? spinning if the item is already locked? Is this not a very good solution especially if we KNOW that the updates are very short indeed? Once again I am not in way suggesting access too the kernel, you have failed to read my posts. Hugh Hugh "tanix" wrote: > In article <6B8D1539-D8C5-44AF-9646-261E8DDBEEBB(a)microsoft.com>, =?Utf-8?B?SHVnbw==?= gleaves(a)hotmail.com> <hugh<underbar> wrote: > >Well let me be perfectly clear I am not suggesting or advocating any > >interaction with any kernel mode mechanisms. > > > >If you work with shared mapped memory a great deal as we do, then you need > >very fast means of locking shared data structures, especially of this is > >happening very often. > > Fine. I had to deal with that kind of thing. > And then? > > >For example we have a shared heap library that we designed, it obvioulsy > >must handle safe updates to heap control metadata by many threads and many > >processes at the same time, perhaps tens of thousands of times a second. > > Understood. I had to deal with a similar situation on a global > voice messaging system, just like many people do with their products. > > >A mutex simply can't deliver the speed and efficiency. > > How do you know where is your REAL bottlenecks? > > Are you sure your system architecture is up to snuff? > > May be the whole thing is not correctly archtected? > > You can not just go for any critical point without seeing the whole > picture. ESPECIALLY, once you start suggesting things that FUNDAMENTALLY > violate the generations old principles of separation of kernel and > user mode? > > Do you think people that created these concepts are stoopid? > Infantile? > > You know better? > > Well, can YOU propose the solution then? > Do you think the idea of separating the user mode and kernel mode > is flawed? How so? > > >Getting/releaseing the Mutex costs far far more CPU cycles then updating > >four fields in a little structure and a pointer or two. > > Yep, that is what ALL of them say. > It is called looking for the trees and not seeing the forest. > OBVIOUSLY. > > Do you have an estimate on what kinds of problems your approach may > cause the the whole LIVELIHOOD of your system? > > Do you have an estimate on how many light years is it going to take > to deal with problems are you going to create as a result of your > great "improvement"? > > How many man years others would have to spend, forever fixing the > problems are you going to create, pretty much inevitably? > > Do you have a mathematical proof that your concept is going to be > stable? > > What DO you have beyond these small greedy things, tring to > squeeze some CPU cycles even from the places that clearly have > a label: > > DO NOT ENTER. KERNEL LEVEL. NO MORONS ALLOWED. > > Enough. > > >Most people perhaps use shared memory for the odd neat trick or some useful > >little mechanism, but if you rely on it as a vehicle for sharing/storing many > >gigs of data then you really do need speedy locks. > > > >Hugo > > > > > >"tanix" wrote: > > > >> In article <OFiPAJ$fKHA.1112(a)TK2MSFTNGP04.phx.gbl>, "Alexander Grigoriev" > > <alegr(a)earthlink.net> wrote: > >> >By definition, spinlock is a busy loop. It only makes sense to do if the > >> >spinlock owner is guaranteed to never get preempted by another thread. In > >> >kernel mode, this is done by setting IRQL to DISPATCH_LEVEL. In user mode, > >> >this is not possible. > >> > >> Good point. > >> > >> Actually, when I was doing a contract with Intel on a multi-gigabit > >> network card, there was one guy that was suggesting to do something > >> totally out of the wall in my view. Pretty much the same idea as in > >> this thread. > >> > >> They were trying to squeze out as much performance as possible. > >> So, this "smart", fat blob, came up with the idea to manipulate > >> the kernel level synchronization objects from the app level, such > >> as resetting the count of kernel level semaphores under some contitions. > >> > >> On a weekly project meeting I told him: you will never prove this > >> thing is going to work. That is the whole point with semaphores. > >> They have been mathematically proven to work. Once you start resetting > >> the counts at will, outside of the normal semaphore mechanics, > >> you'll be fixing this thing for years to come. > >> > >> The result? > >> > >> They fired me. Not him. > >> I wonder if they were ever able to make that network card to work. > >> > >> When they gave me this project to create a hack to manipulate the > >> kernel level structures from the user mode, and gave me a couple > >> of weeks to do that, I could not believe these smart donkeys. > >> > >> It takes YEARS to work on things like these. These are some of > >> the central concepts in O/S design and the very idea of separation > >> of kernel and user mode. I nearly flipped out when manager told > >> me my "project". > >> > >> Some people really need their brains examined. > >> Where did they get their estimates on "improving performance" > >> by slaughtering some of the most fundamental concepts in o/s > >> design and architecture just escapes me. > >> > >> >CRITICAL_SECTION provide an option of limited spinning in case of > >> >contention, if you expect that the CS will mostly be held for very short > >> >time. > >> > > >> > > >> >"Hugo gleaves(a)hotmail.com>" <hugh<underbar> wrote in message > >> >news:574400EE-AFAF-4620-BDDD-7E49608392E0(a)microsoft.com... > >> >> The idea of a spinlock is to ensure secure access/updates to some datum, > >> >> usually for operations that are very very brief. So although some other > >> >> thread/process may have to "spin" for a time, the time is deemed to brief > >> >> as > >> >> to be not a serious performance issue. > >> >> > >> >> The kind of operations that might be suited to a spinlock protection are: > >> >> > >> >> 1) Updating several fields atomically in some shared structure. > >> >> 2) Adding/removing an entry from a shared list. > >> >> > >> >> As Don says a Mutex is much more sophisticated and expensive and although > >> >> very uself in Win32 programming is no match for a spinlock when it comes > >> >> to > >> >> raw speed (this is why device drivers use them). > >> >> > >> >> The real question here is why has MS not provided developers with a > >> >> spinlock > >> >> API for user mode code? > >> >> > >> >> I had to hand craft such a beast in C, it has to support: > >> >> > >> >> 1) Nested locking (a spinlock can be acquired 'n' times and must then be > >> >> released 'n' times). > >> >> 2) Read/Write locking modes. > >> >> 3) Not end up dragging its heels because of these needs. > >> >> > >> >> If you do a lot of work with shared/mapped memory as we do, then such an > >> >> API > >> >> is very important, if you rely on Mutexes it will cost dearly, also the > >> >> interlock functions alone wont let you easily do the above (but they do > >> >> provide the foundation). > >> >> > >> >> I've been interested for some time in seeing open source or public domain > >> >> implementations of a user-mode spinlock API (one that is truly tested on > >> >> multicore H/W) but there seems to be absolutely nothing, what we have > >> >> seems > >> >> to be pretty good but it is pretty complex and I worry sometimes... > >> >> > >> >> ;) > >> >> > >> >> > >> >> "o0Zz" wrote: > >> >> > >> >>> Thank you very much ! > >> >>> > >> >>> "Daniel Terhell" <daniel(a)resplendence.com> a ecrit dans le message de > >> >>> groupe > >> >>> de discussion : emRp7$LdKHA.2188(a)TK2MSFTNGP04.phx.gbl... > >> >>> > "m" <m(a)b.c> wrote in message > >> >>> > news:eP9kilIdKHA.6096(a)TK2MSFTNGP02.phx.gbl... > >> >>> >> The loop will consume all 100% of a single CPU until the lock is > >> >>> >> acquired, the thread is interrupted by a higher priority thread or the > >> >>> >> thread quantum ends (and the OS schedules another thread). > >> >>> >> > >> >>> > > >> >>> > That information is incorrect, on Windows a spinlock raises to > >> >>> > DISPATCH_LEVEL (or higher) so it is not possible to get interrupted by > >> >>> > any > >> >>> > thread. It's not even possible for the thread to finish its quantum > >> >>> > while > >> >>> > it spins or for the dispatcher to schedule another thread on the same > >> >>> > CPU. > >> >>> > It can get interrupted by an ISR but it will return (to the spinning > >> >>> > state) as soon as the ISR has executed and returns control. > >> >>> > > >> >>> > //Daniel > >> >>> > > >> >>> > > >> >>> . > >> > >> -- > >> Programmer's Goldmine collections: > >> > >> http://preciseinfo.org > >> > >> Tens of thousands of code examples and expert discussions on > >> C++, MFC, VC, ATL, STL, templates, Java, Python, Javascript, > >> organized by major topics of language, tools, methods, techniques. > >> > >> . > >> > > -- > Programmer's Goldmine collections: > > http://preciseinfo.org > > Tens of thousands of code examples and expert discussions on > C++, MFC, VC, ATL, STL, templates, Java, Python, Javascript, > organized by major topics of language, tools, methods, techniques. > > . >
From: Hugo gleaves on 21 Dec 2009 06:56
A critical section is useless if the data to be locked is in shared memory, shared by multiple processes. Hugh "Alexander Grigoriev" wrote: > CRITICAL_SECTION is VERY fast when there is no contention. > IF ther is a contention, it gives you _limited_ busy spin wait, and then, if > it doesn't work out, you have to go to kernel anyway. > > "Hugo gleaves(a)hotmail.com>" <hugh<underbar> wrote in message > news:D3ADE2F2-4CC9-474A-BCCF-DD560849C259(a)microsoft.com... > > Man, what a story, I have to say I've seen that kind of thing too. In my > > view > > anyone who suggests anything like that has no appreciation of the > > engineering > > nor any professional maturity. > > > > However, I have to ask how is that fool's suggestion "pretty much the same > > idea as in this thread"? > > > > I am totally opposed to the kind of thing he was suggesting. > > > > All I said is that user-mode code sometimes has a real need for a very > > fast > > locking mechanism, I never once said that user mode code should in any way > > access or use or interfere with kernel mode mechanisms, I would never > > suggest > > such a thing. > > > > But it is naive to believe that non-kernel mode developers never need very > > fast read/write locks. > > > > Even the OS kernel does not support read/write spinlocks, but would it not > > be good if it did? Surely there are scenarios within driver design where a > > read spinlock would be useful? > > > > Hugo > > > > > > "Pavel A." wrote: > > > >> For working in huge companies like that, pure technical merits too often > >> are not enough to stay in. Need to have strong "communication skills" > >> in order to sell your idea. Good luck. > >> --pa > >> > >> > >> "tanix" <tanix(a)mongo.net> wrote in message > >> news:hggafe$ot7$2(a)news.eternal-september.org... > >> > In article <OFiPAJ$fKHA.1112(a)TK2MSFTNGP04.phx.gbl>, "Alexander > >> > Grigoriev" > >> > <alegr(a)earthlink.net> wrote: > >> >>By definition, spinlock is a busy loop. It only makes sense to do if > >> >>the > >> >>spinlock owner is guaranteed to never get preempted by another thread. > >> >>In > >> >>kernel mode, this is done by setting IRQL to DISPATCH_LEVEL. In user > >> >>mode, > >> >>this is not possible. > >> > > >> > Good point. > >> > > >> > Actually, when I was doing a contract with Intel on a multi-gigabit > >> > network card, there was one guy that was suggesting to do something > >> > totally out of the wall in my view. Pretty much the same idea as in > >> > this thread. > >> > > >> > They were trying to squeze out as much performance as possible. > >> > So, this "smart", fat blob, came up with the idea to manipulate > >> > the kernel level synchronization objects from the app level, such > >> > as resetting the count of kernel level semaphores under some > >> > contitions. > >> > > >> > On a weekly project meeting I told him: you will never prove this > >> > thing is going to work. That is the whole point with semaphores. > >> > They have been mathematically proven to work. Once you start resetting > >> > the counts at will, outside of the normal semaphore mechanics, > >> > you'll be fixing this thing for years to come. > >> > > >> > The result? > >> > > >> > They fired me. Not him. > >> > I wonder if they were ever able to make that network card to work. > >> > > >> > When they gave me this project to create a hack to manipulate the > >> > kernel level structures from the user mode, and gave me a couple > >> > of weeks to do that, I could not believe these smart donkeys. > >> > > >> > It takes YEARS to work on things like these. These are some of > >> > the central concepts in O/S design and the very idea of separation > >> > of kernel and user mode. I nearly flipped out when manager told > >> > me my "project". > >> > > >> > Some people really need their brains examined. > >> > Where did they get their estimates on "improving performance" > >> > by slaughtering some of the most fundamental concepts in o/s > >> > design and architecture just escapes me. > >> > > >> >>CRITICAL_SECTION provide an option of limited spinning in case of > >> >>contention, if you expect that the CS will mostly be held for very > >> >>short > >> >>time. > >> >> > >> >> > >> >>"Hugo gleaves(a)hotmail.com>" <hugh<underbar> wrote in message > >> >>news:574400EE-AFAF-4620-BDDD-7E49608392E0(a)microsoft.com... > >> >>> The idea of a spinlock is to ensure secure access/updates to some > >> >>> datum, > >> >>> usually for operations that are very very brief. So although some > >> >>> other > >> >>> thread/process may have to "spin" for a time, the time is deemed to > >> >>> brief > >> >>> as > >> >>> to be not a serious performance issue. > >> >>> > >> >>> The kind of operations that might be suited to a spinlock protection > >> >>> are: > >> >>> > >> >>> 1) Updating several fields atomically in some shared structure. > >> >>> 2) Adding/removing an entry from a shared list. > >> >>> > >> >>> As Don says a Mutex is much more sophisticated and expensive and > >> >>> although > >> >>> very uself in Win32 programming is no match for a spinlock when it > >> >>> comes > >> >>> to > >> >>> raw speed (this is why device drivers use them). > >> >>> > >> >>> The real question here is why has MS not provided developers with a > >> >>> spinlock > >> >>> API for user mode code? > >> >>> > >> >>> I had to hand craft such a beast in C, it has to support: > >> >>> > >> >>> 1) Nested locking (a spinlock can be acquired 'n' times and must then > >> >>> be > >> >>> released 'n' times). > >> >>> 2) Read/Write locking modes. > >> >>> 3) Not end up dragging its heels because of these needs. > >> >>> > >> >>> If you do a lot of work with shared/mapped memory as we do, then such > >> >>> an > >> >>> API > >> >>> is very important, if you rely on Mutexes it will cost dearly, also > >> >>> the > >> >>> interlock functions alone wont let you easily do the above (but they > >> >>> do > >> >>> provide the foundation). > >> >>> > >> >>> I've been interested for some time in seeing open source or public > >> >>> domain > >> >>> implementations of a user-mode spinlock API (one that is truly tested > >> >>> on > >> >>> multicore H/W) but there seems to be absolutely nothing, what we have > >> >>> seems > >> >>> to be pretty good but it is pretty complex and I worry sometimes... > >> >>> > >> >>> ;) > >> >>> > >> >>> > >> >>> "o0Zz" wrote: > >> >>> > >> >>>> Thank you very much ! > >> >>>> > >> >>>> "Daniel Terhell" <daniel(a)resplendence.com> a ecrit dans le message > >> >>>> de > >> >>>> groupe > >> >>>> de discussion : emRp7$LdKHA.2188(a)TK2MSFTNGP04.phx.gbl... > >> >>>> > "m" <m(a)b.c> wrote in message > >> >>>> > news:eP9kilIdKHA.6096(a)TK2MSFTNGP02.phx.gbl... > >> >>>> >> The loop will consume all 100% of a single CPU until the lock is > >> >>>> >> acquired, the thread is interrupted by a higher priority thread > >> >>>> >> or > >> >>>> >> the > >> >>>> >> thread quantum ends (and the OS schedules another thread). > >> >>>> >> > >> >>>> > > >> >>>> > That information is incorrect, on Windows a spinlock raises to > >> >>>> > DISPATCH_LEVEL (or higher) so it is not possible to get > >> >>>> > interrupted > >> >>>> > by > >> >>>> > any > >> >>>> > thread. It's not even possible for the thread to finish its > >> >>>> > quantum > >> >>>> > while > >> >>>> > it spins or for the dispatcher to schedule another thread on the > >> >>>> > same > >> >>>> > CPU. > >> >>>> > It can get interrupted by an ISR but it will return (to the > >> >>>> > spinning > >> >>>> > state) as soon as the ISR has executed and returns control. > >> >>>> > > >> >>>> > //Daniel > >> >>>> > > >> >>>> > > >> >>>> . > >> > > >> > -- > >> > Programmer's Goldmine collections: > >> > > >> > http://preciseinfo.org > >> > > >> > Tens of thousands of code examples and expert discussions on > >> > C++, MFC, VC, ATL, STL, templates, Java, Python, Javascript, > >> > organized by major topics of language, tools, methods, techniques. > >> > > > > . > |