Prev: vlynq: make whole Kconfig-menu dependant on architecture
Next: arch/arm/mach-davinci: Use kzalloc
From: Linus Torvalds on 14 May 2010 11:20 On Fri, 14 May 2010, Michel Lespinasse wrote: > > I would like to sollicit comments regarding the following changes > against 2.6.34-rc7 + 91af708 (from V1 proposal) already applied. > > The motivation for this change was some cluster monitoring software we > use at google Quite frankly, I hate it the way it reads now. I think "down_read_unfair()" is a really dangerous model, and the reason I say that is we used to have _all_ mutexes work that way, and it was a disaster from a unfairness perspective. HOWEVER. I do see where you are coming from, and I do think that unfair readers are likely to be ok AS LONG AS THEY CANNOT BLOCK. And I think that _that_ is likely the much more important issue than the unfairness. IOW, I suspect that I would personally at least be perfectly ok with something like this, with the following fairly trivial changes: - Make it actually do a "preempt_disable()" _after_ getting the rwsem, so that we get a warning if something tries to sleep inside the region (see the whole "__might_sleep()" thing). This also implies that you need a separate unlock routine to pair with it, that undoes that. So you can't unlock it with a regular "up_read()" - rename the thing to be about the fact that you promise that the code that runs under the thing is nonblocking. IOW, rather than talk about "unfair", you talk about "nonpreemptible" or "critical" or something. So you'd have something like down_read_critical(); .. atomic region with no allocation, no preemption .. up_read_critical(); ratehr than talk about "unfairness". So it would have "spinlock" semantics when held (the taking of the lock itself can obviously block - but you couldn't block while _holding_ the lock). In fact, for the generic lib/rwsem-spinlock.c version, it's quite possible you should just _hold_ the spinlock over the critical region. That would potentially speed up the locking quite a lot. The reason I think the above would be acceptable is exactly because it consciously _limits_ that unfair spinlock to only ever work in cases where a certain amount of unfairness would be ok. IOW, you can't just use the unfair version in random places that you think are "more important" and are worthy of unfairness. They have to be places where you can guarantee that you release the lock with no delay. And we'd disable preemption not just to get the warning, but also to make sure that "timely release" really happens. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michel Lespinasse on 17 May 2010 17:30 On Fri, May 14, 2010 at 8:13 AM, Linus Torvalds <torvalds(a)linux-foundation.org> wrote: > I think "down_read_unfair()" is a really dangerous model, and the reason I > say that is we used to have _all_ mutexes work that way, and it was a > disaster from a unfairness perspective. > > HOWEVER. > > I do see where you are coming from, and I do think that unfair readers are > likely to be ok AS LONG AS THEY CANNOT BLOCK. Thanks for your comments. Seems reasonable to me - and as you pointed out, not very difficult to implement either. > And I think that _that_ is likely the much more important issue than the > unfairness. IOW, I suspect that I would personally at least be perfectly > ok with something like this, with the following fairly trivial changes: > > �- Make it actually do a "preempt_disable()" _after_ getting the rwsem, so > � that we get a warning if something tries to sleep inside the region > � (see the whole "__might_sleep()" thing). > > � This also implies that you need a separate unlock routine to pair with > � it, that undoes that. So you can't unlock it with a regular "up_read()" Done in V3 (will send out shortly). > �- rename the thing to be about the fact that you promise that the code > � that runs under the thing is nonblocking. IOW, rather than talk about > � "unfair", you talk about "nonpreemptible" or "critical" or something. > > � So you'd have something like > > � � � �down_read_critical(); > � � � �.. atomic region with no allocation, no preemption .. > � � � �up_read_critical(); > > � rather than talk about "unfairness". Changed the high level API to work that way. I did not change __down_read_unfair though, because at that low level it's still about unfairness - the tying with non-preemptible is done higher up. (I had actually started changing it to __down_read_critical, but then I realized that there was no __up_read_critical to pair it with). > So it would have "spinlock" semantics when held (the taking of the lock > itself can obviously block - but you couldn't block while _holding_ the > lock). > > In fact, for the generic lib/rwsem-spinlock.c version, it's quite possible > you should just _hold_ the spinlock over the critical region. That would > potentially speed up the locking quite a lot. I did not pursue that one - one issue would have been that I would have to expose the spin_lock_irqsave flags to the down_read_critical caller. It can be done but I'm not sure it makes sense given that the non-generic implementation wouldn't use them. > The reason I think the above would be acceptable is exactly because it > consciously _limits_ that unfair spinlock to only ever work in cases where > a certain amount of unfairness would be ok. > > IOW, you can't just use the unfair version in random places that you think > are "more important" and are worthy of unfairness. They have to be places > where you can guarantee that you release the lock with no delay. And we'd > disable preemption not just to get the warning, but also to make sure that > "timely release" really happens. I like the strategic thinking here :) Thanks, -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Pages: 1 Prev: vlynq: make whole Kconfig-menu dependant on architecture Next: arch/arm/mach-davinci: Use kzalloc |