Prev: futex: free_pi_state outside of hb->lock sections
Next: [PATCH] staging: ti dspbridge: fix compilation error
From: Mike Galbraith on 12 Jul 2010 08:20 On Mon, 2010-07-12 at 07:45 -0400, Steven Rostedt wrote: > On Sun, 2010-07-11 at 15:33 +0200, Mike Galbraith wrote: > > On Sat, 2010-07-10 at 21:41 +0200, Mike Galbraith wrote: > > > diff --git a/kernel/futex.c b/kernel/futex.c > > index a6cec32..ef489f3 100644 > > --- a/kernel/futex.c > > +++ b/kernel/futex.c > > @@ -2255,7 +2255,14 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, > > /* Queue the futex_q, drop the hb lock, wait for wakeup. */ > > futex_wait_queue_me(hb, &q, to); > > > > - spin_lock(&hb->lock); > > + /* > > + * Non-blocking synchronization point with futex_requeue(). > > + * > > + * We dare not block here because this will alter PI state, possibly > > + * before our waker finishes modifying same in wakeup_next_waiter(). > > + */ > > + while(!spin_trylock(&hb->lock)) > > + cpu_relax(); > > I agree that this would work. But I wonder if this should have an: > > #ifdef PREEMPT_RT > [...] > #else > spin_lock(&hb->lock); > #endif > > around it. Or encapsulate this lock in a macro that does the same thing > (just to keep the actual code cleaner) Yeah, it should. I'll wait to see what Darren/others say about holding the wakee's pi_lock across wakeup to plug it. If he submits something along that line, I can bin this. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on 12 Jul 2010 09:10 On Fri, 9 Jul 2010, Darren Hart wrote: > The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates > a scenario where a task can wake-up, not knowing it has been enqueued on an > rtmutex. In order to detect this, the task would have to be able to take either > task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately, > without already holding one of these, the pi_blocked_on variable can change > from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed > to take a sleeping lock after wakeup or it could end up trying to block on two > locks, the second overwriting a valid pi_blocked_on value. This obviously > breaks the pi mechanism. > > This patch increases latency, while running the ltp pthread_cond_many test > which Michal reported the bug with, I see double digit hrtimer latencies > (typically only on the first run after boo): > > kernel: hrtimer: interrupt took 75911 ns Eewwww. There must be some more intelligent and less intrusive way to detect this. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Darren Hart on 12 Jul 2010 15:20 On 07/10/2010 12:41 PM, Mike Galbraith wrote: > On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote: >> The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates >> a scenario where a task can wake-up, not knowing it has been enqueued on an >> rtmutex. In order to detect this, the task would have to be able to take either >> task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately, >> without already holding one of these, the pi_blocked_on variable can change >> from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed >> to take a sleeping lock after wakeup or it could end up trying to block on two >> locks, the second overwriting a valid pi_blocked_on value. This obviously >> breaks the pi mechanism. > > copy/paste offline query/reply at Darren's request.. > > On Sat, 2010-07-10 at 10:26 -0700, Darren Hart wrote: > On 07/09/2010 09:32 PM, Mike Galbraith wrote: >>> On Fri, 2010-07-09 at 13:05 -0700, Darren Hart wrote: >>> >>>> The core of the problem is that the proxy_lock blocks a task on a lock >>>> the task knows nothing about. So when it wakes up inside of >>>> futex_wait_requeue_pi, it immediately tries to block on hb->lock to >>>> check why it woke up. This has the potential to block the task on two >>>> locks (thus overwriting the pi_blocked_on). Any attempt preventing this >>>> involves a lock, and ultimiately the hb->lock. The only solution I see >>>> is to make the hb->locks raw locks (thanks to Steven Rostedt for >>>> original idea and batting this around with me in IRC). >>> >>> Hm, so wakee _was_ munging his own state after all. >>> >>> Out of curiosity, what's wrong with holding his pi_lock across the >>> wakeup? He can _try_ to block, but can't until pi state is stable. >>> >>> I presume there's a big fat gotcha that's just not obvious to futex >>> locking newbie :) Nor to some of us that have been engrossed in futexes for the last couple years! I discussed the pi_lock across the wakeup issue with Thomas. While this fixes the problem for this particular failure case, it doesn't protect against: <tglx> assume the following: <tglx> t1 is on the condvar <tglx> t2 does the requeue dance and t1 is now blocked on the outer futex <tglx> t3 takes hb->lock for a futex in the same bucket <tglx> t2 wakes due to signal/timeout <tglx> t2 blocks on hb->lock You are likely to have not hit the above scenario because you only had one condvar, so the hash_buckets were not heavily shared and you weren't likely to hit: <tglx> t3 takes hb->lock for a futex in the same bucket I'm going to roll up a patchset with your (Mike) spin_trylock patch and run it through some tests. I'd still prefer a way to detect early wakeup without having to grab the hb->lock(), but I haven't found it yet. + while(!spin_trylock(&hb->lock)) + cpu_relax(); ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); spin_unlock(&hb->lock); Thanks, -- Darren Hart IBM Linux Technology Center Real-Time Linux Team -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on 12 Jul 2010 16:50 On Mon, 12 Jul 2010, Thomas Gleixner wrote: > On Mon, 12 Jul 2010, Darren Hart wrote: > > On 07/10/2010 12:41 PM, Mike Galbraith wrote: > > > On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote: > > > > > Out of curiosity, what's wrong with holding his pi_lock across the > > > > > wakeup? He can _try_ to block, but can't until pi state is stable. > > > > > > > > > > I presume there's a big fat gotcha that's just not obvious to futex > > > > > locking newbie :) > > > > Nor to some of us that have been engrossed in futexes for the last couple > > years! I discussed the pi_lock across the wakeup issue with Thomas. While this > > fixes the problem for this particular failure case, it doesn't protect > > against: > > > > <tglx> assume the following: > > <tglx> t1 is on the condvar > > <tglx> t2 does the requeue dance and t1 is now blocked on the outer futex > > <tglx> t3 takes hb->lock for a futex in the same bucket > > <tglx> t2 wakes due to signal/timeout > > <tglx> t2 blocks on hb->lock > > > > You are likely to have not hit the above scenario because you only had one > > condvar, so the hash_buckets were not heavily shared and you weren't likely to > > hit: > > > > <tglx> t3 takes hb->lock for a futex in the same bucket > > > > > > I'm going to roll up a patchset with your (Mike) spin_trylock patch and run it > > through some tests. I'd still prefer a way to detect early wakeup without > > having to grab the hb->lock(), but I haven't found it yet. > > > > + while(!spin_trylock(&hb->lock)) > > + cpu_relax(); > > ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); > > spin_unlock(&hb->lock); > > And this is nasty as it will create unbound priority inversion :( > > We discussed another solution on IRC in meantime: > > in futex_wait_requeue_pi() > > futex_wait_queue_me(hb, &q, to); > > raw_spin_lock(current->pi_lock); > if (current->pi_blocked_on) { > /* > * We know that we can only be blocked on the outer futex > * so we can skip the early wakeup check > */ > raw_spin_unlock(current->pi_lock); > ret = 0; > } else { > current->pi_blocked_on = PI_WAKEUP_INPROGRESS; > raw_spin_unlock(current->pi_lock); > > spin_lock(&hb->lock); > ret = handle_early_requeue_pi_wakeup(); > .... > spin_lock(&hb->lock); > } > > Now in the rtmutex magic we need in task_blocks_on_rt_mutex(): > > raw_spin_lock(task->pi_lock); > > /* > * Add big fat comment why this is only relevant to futex > * requeue_pi > */ > > if (task != current && task->pi_blocked_on == PI_WAKEUP_INPROGRESS) { > raw_spin_lock(task->pi_lock); > > /* > * Returning 0 here is fine. the requeue code is just going to > * move the futex_q to the other bucket, but that'll be fixed > * up in handle_early_requeue_pi_wakeup() > */ > > return 0; We might also return a sensible error code here and just remove the waiter from all queues, which needs to be handled in handle_early_requeue_pi_wakeup() after acquiring hb->lock then. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mike Galbraith on 12 Jul 2010 23:10 On Mon, 2010-07-12 at 22:40 +0200, Thomas Gleixner wrote: > On Mon, 12 Jul 2010, Darren Hart wrote: > > On 07/10/2010 12:41 PM, Mike Galbraith wrote: > > > On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote: > > > > > Out of curiosity, what's wrong with holding his pi_lock across the > > > > > wakeup? He can _try_ to block, but can't until pi state is stable. > > > > > > > > > > I presume there's a big fat gotcha that's just not obvious to futex > > > > > locking newbie :) > > > > Nor to some of us that have been engrossed in futexes for the last couple > > years! I discussed the pi_lock across the wakeup issue with Thomas. While this > > fixes the problem for this particular failure case, it doesn't protect > > against: > > > > <tglx> assume the following: > > <tglx> t1 is on the condvar > > <tglx> t2 does the requeue dance and t1 is now blocked on the outer futex > > <tglx> t3 takes hb->lock for a futex in the same bucket > > <tglx> t2 wakes due to signal/timeout > > <tglx> t2 blocks on hb->lock > > > > You are likely to have not hit the above scenario because you only had one > > condvar, so the hash_buckets were not heavily shared and you weren't likely to > > hit: > > > > <tglx> t3 takes hb->lock for a futex in the same bucket > > > > > > I'm going to roll up a patchset with your (Mike) spin_trylock patch and run it > > through some tests. I'd still prefer a way to detect early wakeup without > > having to grab the hb->lock(), but I haven't found it yet. > > > > + while(!spin_trylock(&hb->lock)) > > + cpu_relax(); > > ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); > > spin_unlock(&hb->lock); > > And this is nasty as it will create unbound priority inversion :( Oh ma gawd, _it's a train_ :> > We discussed another solution on IRC in meantime: > > in futex_wait_requeue_pi() > > futex_wait_queue_me(hb, &q, to); > > raw_spin_lock(current->pi_lock); > if (current->pi_blocked_on) { > /* > * We know that we can only be blocked on the outer futex > * so we can skip the early wakeup check > */ > raw_spin_unlock(current->pi_lock); > ret = 0; > } else { > current->pi_blocked_on = PI_WAKEUP_INPROGRESS; > raw_spin_unlock(current->pi_lock); > > spin_lock(&hb->lock); > ret = handle_early_requeue_pi_wakeup(); > .... > spin_lock(&hb->lock); > } > > Now in the rtmutex magic we need in task_blocks_on_rt_mutex(): > > raw_spin_lock(task->pi_lock); > > /* > * Add big fat comment why this is only relevant to futex > * requeue_pi > */ > > if (task != current && task->pi_blocked_on == PI_WAKEUP_INPROGRESS) { > raw_spin_lock(task->pi_lock); > > /* > * Returning 0 here is fine. the requeue code is just going to > * move the futex_q to the other bucket, but that'll be fixed > * up in handle_early_requeue_pi_wakeup() > */ > > return 0; > } > > Thanks, > > tglx > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > the body of a message to majordomo(a)vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: futex: free_pi_state outside of hb->lock sections Next: [PATCH] staging: ti dspbridge: fix compilation error |