Prev: [PATCH 5/6] perf symbols: Avoid unnecessary symbol loading when dso list is specified
Next: proc: turn signal_struct->count into "int nr_threads"
From: Oleg Nesterov on 24 Mar 2010 14:20 On 03/24, Peter Zijlstra wrote: > > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote: > > > > - do_fork() clears PF_STARTING and then calls wake_up_new_task() > > which finally does s/WAKING/RUNNING. > > > > But. Nobody can take rq->lock in between. This means a signal > > from irq (quite possible with CLONE_THREAD) or another rt > > thread which preempts us can lockup. > > Hmm, the signal case might indeed be a problem, however I cannot see how > the RT thread can be a problem because until we do wake_up_new_task() > the child will not be runnable and can thus not be preempted. Indeed, but I meant the _parent_ can be preempted ;) In short. TASK_WAKING acts as a spinlock in fact. And since ttwu() can be called from any context, it should be irq-safe: any owner must disable inerrupts and preemption. > The reason we have that TASK_WAKING stuff for fork is because > wake_up_new_task() needs p->cpus_allowed to be stable Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for a moment, but afaics that is all? > So the below patch makes select_task_rq_fair unlock the rq when needed, > and then puts all ->select_task_rq() calls under rq->lock. This should > allow us to remove the TASK_WAKING thing from fork which in turn allows > us to remove the PF_STARTING check in task_is_waking. > > How does that look? I'll try to read this patch tomorrow. But could you please consider the suggestion above? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 25 Mar 2010 06:30 On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote: > On 03/24, Peter Zijlstra wrote: > > > > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote: > > > > > > - do_fork() clears PF_STARTING and then calls wake_up_new_task() > > > which finally does s/WAKING/RUNNING. > > > > > > But. Nobody can take rq->lock in between. This means a signal > > > from irq (quite possible with CLONE_THREAD) or another rt > > > thread which preempts us can lockup. > > > > Hmm, the signal case might indeed be a problem, however I cannot see how > > the RT thread can be a problem because until we do wake_up_new_task() > > the child will not be runnable and can thus not be preempted. > > Indeed, but I meant the _parent_ can be preempted ;) I still can't see how that would be a problem.. > In short. TASK_WAKING acts as a spinlock in fact. And since ttwu() can > be called from any context, it should be irq-safe: any owner must disable > inerrupts and preemption. Agreed, and I think that's corrected with my patch. > > The reason we have that TASK_WAKING stuff for fork is because > > wake_up_new_task() needs p->cpus_allowed to be stable > > Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING > like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for > a moment, but afaics that is all? My patch does that. > > So the below patch makes select_task_rq_fair unlock the rq when needed, > > and then puts all ->select_task_rq() calls under rq->lock. This should > > allow us to remove the TASK_WAKING thing from fork which in turn allows > > us to remove the PF_STARTING check in task_is_waking. > > > > How does that look? > > I'll try to read this patch tomorrow. But could you please consider > the suggestion above? I think I do all those :-) I was still looking at removing the TASK_WAKING check from task_rq_lock() since now we do set_task_cpu() under rq->lock again so it should be good again. Hmm, except for sched_fork() that still does set_task_cpu() without holding rq->lock, but that is before the child gets exposed so there should not be any concurrency. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on 25 Mar 2010 11:50 On 03/25, Peter Zijlstra wrote: > > On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote: > > On 03/24, Peter Zijlstra wrote: > > > > > > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote: > > > > > > > > - do_fork() clears PF_STARTING and then calls wake_up_new_task() > > > > which finally does s/WAKING/RUNNING. > > > > > > > > But. Nobody can take rq->lock in between. This means a signal > > > > from irq (quite possible with CLONE_THREAD) or another rt > > > > thread which preempts us can lockup. > > > > > > Hmm, the signal case might indeed be a problem, however I cannot see how > > > the RT thread can be a problem because until we do wake_up_new_task() > > > the child will not be runnable and can thus not be preempted. > > > > Indeed, but I meant the _parent_ can be preempted ;) > > I still can't see how that would be a problem.. The parent P does do_fork(), copy_process returns the new child C with TASK_WAKING at PF_STARTING set. do_fork() clears PF_STARTING, but TASK_WAKING is still set, and C is already visible to the user-space An rt-thread X preempts P and calls ttwu() (say, it sends a signal to C) ttwu() loops in task_rq_lock() "forever", because TASK_WAKING is still set. > > > The reason we have that TASK_WAKING stuff for fork is because > > > wake_up_new_task() needs p->cpus_allowed to be stable > > > > Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING > > like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for > > a moment, but afaics that is all? > > My patch does that. OK, that is what I meant. Now, why sched_fork() can't just set TASK_RUNNING ? This way the "spurious" wakeup can do nothing with the new child, and we do not have problems with cpuset which needs rq->lock for set_cpus_allowed(). > > > So the below patch makes select_task_rq_fair unlock the rq when needed, > > > and then puts all ->select_task_rq() calls under rq->lock. Yes, I thought about this too. I tried to preserve the current "->select_task_rq() is called without rq->lock" logic. > This should > > > allow us to remove the TASK_WAKING thing from fork Confused. Why can't we simply remove TASK_WAKING from fork without any changes except in wake_up_new_task() ? > which in turn allows > > > us to remove the PF_STARTING check in task_is_waking. Even if I do not think I understand sched.c above, I'd say we must do this in any case ;) > I was still looking at removing the TASK_WAKING check from > task_rq_lock() Peter, I can't apply your patch due to rejects (will try again later) but at first glance, it makes TASK_WAKING unneeded? Since we do not drop the lock after we set TASK_WAKING, why do we need this state at all ? Aha... select_task_rq_fair() can drop the lock, yes? Well, in this case probably select_task_rq_fair() can set TASK_WAKING before unlock? I like the current idea to call select_task_rq() without rq->lock, but of course this is up to you. However, once again, can't we make a simpler patch? - remove PF_STARTING from task_waking() - change sched_fork() to set RUNNING instead of WAKING - change wake_up_new_task() to set WAKING under rq->lock This looks simpler to me, and allows to drop rq->lock in ttwu() right after it sets WAKING. Another change which seems reasonable is to allow ttwu() to take rq->lock even if WAKING is set, it can do nothing but check task->state in this case. What do you think? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on 25 Mar 2010 12:10 On 03/25, Oleg Nesterov wrote: > > I like the current idea to call select_task_rq() without rq->lock, but > of course this is up to you. > > However, once again, can't we make a simpler patch? > > - remove PF_STARTING from task_waking() > > - change sched_fork() to set RUNNING instead of WAKING > > - change wake_up_new_task() to set WAKING under rq->lock > > This looks simpler to me, and allows to drop rq->lock in ttwu() right > after it sets WAKING. IOW, something like the (unchecked, uncompiled) patch below. What do you think? Oleg. --- x/kernel/sched.c +++ x/kernel/sched.c @@ -912,7 +912,7 @@ static inline void finish_lock_switch(st */ static inline int task_is_waking(struct task_struct *p) { - return unlikely((p->state == TASK_WAKING) && !(p->flags & PF_STARTING)); + return unlikely(p->state == TASK_WAKING); } /* @@ -2568,11 +2568,10 @@ void sched_fork(struct task_struct *p, i __sched_fork(p); /* - * We mark the process as waking here. This guarantees that - * nobody will actually run it, and a signal or other external - * event cannot wake it up and insert it on the runqueue either. + * We mark the process as running here. This guarantees that + * nobody will actually wake it up until wake_up_new_task(). */ - p->state = TASK_WAKING; + p->state = TASK_RUNNING; /* * Revert to default priority/policy on fork if requested. @@ -2638,15 +2637,18 @@ void wake_up_new_task(struct task_struct struct rq *rq; int cpu = get_cpu(); + p->state = TASK_WAKING; + smp_mb(); + raw_spin_unlock_wait(&rq->lock); + #ifdef CONFIG_SMP /* * Fork balancing, do it here and not earlier because: * - cpus_allowed can change in the fork path * - any previously selected cpu might disappear through hotplug * - * We still have TASK_WAKING but PF_STARTING is gone now, meaning - * ->cpus_allowed is stable, we have preemption disabled, meaning - * cpu_online_mask is stable. + * TASK_WAKING means ->cpus_allowed is stable, we have preemption + * disabled, meaning cpu_online_mask is stable. */ cpu = select_task_rq(p, SD_BALANCE_FORK, 0); set_task_cpu(p, cpu); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on 25 Mar 2010 12:20
Argh, sorry for noise... On 03/25, Oleg Nesterov wrote: > > On 03/25, Oleg Nesterov wrote: > > > > I like the current idea to call select_task_rq() without rq->lock, but > > of course this is up to you. > > > > However, once again, can't we make a simpler patch? > > > > - remove PF_STARTING from task_waking() > > > > - change sched_fork() to set RUNNING instead of WAKING When I reread this thread, suddenly finally I noticed you mentioned _twice_ your patch does this too ;) Not to mention the patch itself which I misread. Sorry. > IOW, something like the (unchecked, uncompiled) patch below. Still, what do you think? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |