Prev: [PATCH] btrfs: check alloc return value before use handle and struct
Next: [PATCH] POWER (BAT) of DA9052 Linux device drivers (4/9)
From: Srivatsa Vaddagiri on 24 May 2010 11:20 On Mon, May 24, 2010 at 03:28:45PM +0200, Peter Zijlstra wrote: > On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote: > > since _cpu_up() and _cpu_down() can never run in > > parallel, because of cpu_add_remove_lock. > > Ah indeed. I guess your initial patch works then. One thing I found surprising was that a cpu's rt-bandwidth renewal could be dependant on another cpu's (rt-bandwidth) timer firing ontime. In this case, we had migration/23 pulled over to CPU0 and we hung later waiting for migration/23 to exit. migration/23 was not exiting because it could not run on CPU0 (as CPU0's rt-bandwidth had expired). This situation remained forever. I would have expected CPU0's bandwidth to have been renewed independent of some timer on CPU23 to fire - maybe I am missing something not obvious in the code? - vatsa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 24 May 2010 12:00 On Mon, 2010-05-24 at 20:46 +0530, Srivatsa Vaddagiri wrote: > On Mon, May 24, 2010 at 03:28:45PM +0200, Peter Zijlstra wrote: > > On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote: > > > since _cpu_up() and _cpu_down() can never run in > > > parallel, because of cpu_add_remove_lock. > > > > Ah indeed. I guess your initial patch works then. > > One thing I found surprising was that a cpu's rt-bandwidth renewal could be > dependant on another cpu's (rt-bandwidth) timer firing ontime. In this case, we > had migration/23 pulled over to CPU0 and we hung later waiting for migration/23 > to exit. migration/23 was not exiting because it could not run on CPU0 (as > CPU0's rt-bandwidth had expired). This situation remained forever. I would have > expected CPU0's bandwidth to have been renewed independent of some timer on > CPU23 to fire - maybe I am missing something not obvious in the code? The bandwidth constraint is per cgroup, and cgroups span cpus. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 25 May 2010 07:40 On Mon, 2010-05-24 at 15:29 +0530, Amit K. Arora wrote: > > Thus, since above race can never happen, is there any other issue with > this patch ? It doesn't seem to apply nicely... > > Signed-off-by: Amit Arora <aarora(a)in.ibm.com> > Signed-off-by: Gautham R Shenoy <ego(a)in.ibm.com> > --- > diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c > --- linux-2.6.34.org/kernel/sched.c 2010-05-18 22:56:21.000000000 -0700 > +++ linux-2.6.34/kernel/sched.c 2010-05-18 22:58:31.000000000 -0700 > @@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf > cpu_rq(cpu)->migration_thread = NULL; > break; > > + case CPU_POST_DEAD: > + /* > + * Bring the migration thread down in CPU_POST_DEAD event, > + * since the timers should have got migrated by now and thus > + * we should not see a deadlock between trying to kill the > + * migration thread and the sched_rt_period_timer. > + */ > + cpuset_lock(); > + rq = cpu_rq(cpu); > + kthread_stop(rq->migration_thread); > + put_task_struct(rq->migration_thread); > + rq->migration_thread = NULL; > + cpuset_unlock(); > + break; > + > case CPU_DEAD: > case CPU_DEAD_FROZEN: > cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */ > migrate_live_tasks(cpu); > rq = cpu_rq(cpu); > - kthread_stop(rq->migration_thread); > - put_task_struct(rq->migration_thread); > - rq->migration_thread = NULL; > /* Idle task back to normal (off runqueue, low prio) */ > raw_spin_lock_irq(&rq->lock); > update_rq_clock(rq); > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 25 May 2010 10:30 On Tue, 2010-05-25 at 18:53 +0530, Amit K. Arora wrote: nice :-) Thanks! > Signed-off-by: Amit Arora <aarora(a)in.ibm.com> > Signed-off-by: Gautham R Shenoy <ego(a)in.ibm.com> > Cc: Ingo Molnar <mingo(a)elte.hu> > Cc: Peter Zijlstra <peterz(a)infradead.org> > Cc: Tejun Heo <tj(a)kernel.org> > --- > diff -Nuarp linux-2.6-next-20100525.org/kernel/stop_machine.c linux-2.6-next-20100525/kernel/stop_machine.c > --- linux-2.6-next-20100525.org/kernel/stop_machine.c 2010-05-25 14:27:51.000000000 -0400 > +++ linux-2.6-next-20100525/kernel/stop_machine.c 2010-05-25 14:30:09.000000000 -0400 > @@ -321,7 +321,7 @@ static int __cpuinit cpu_stop_cpu_callba > > #ifdef CONFIG_HOTPLUG_CPU > case CPU_UP_CANCELED: > - case CPU_DEAD: > + case CPU_POST_DEAD: > { > struct cpu_stop_work *work; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on 25 May 2010 16:30
On Thu, 20 May 2010, Peter Zijlstra wrote: > On Wed, 2010-05-19 at 17:43 +0530, Amit K. Arora wrote: > > Alternate Solution considered : Another option considered was to > > increase the priority of the hrtimer cpu offline notifier, such that it > > gets to run before scheduler's migration cpu offline notifier. In this > > way we are sure that the timers will get migrated before migration_call > > tries to kill migration_thread. But, this can have some non-obvious > > implications, suggested Srivatsa. > > > > On Wed, May 19, 2010 at 11:31:55AM +0200, Peter Zijlstra wrote: > > > The other problem is more urgent though, CPU_POST_DEAD runs outside of > > > the hotplug lock and thus the above becomes a race where we could > > > possible kill off the migration thread of a newly brought up cpu: > > > > > > cpu0 - down 2 > > > cpu1 - up 2 (allocs a new migration thread, and leaks the old one) > > > cpu0 - post_down 2 - frees the migration thread -- oops! > > > > Ok. So, how about adding a check in CPU_UP_PREPARE event handling too ? > > The cpuset_lock will synchronize, and thus avoid race between killing of > > migration_thread in up_prepare and post_dead events. > > > > Here is the updated patch. If you don't like this one too, do you mind > > suggesting an alternate approach to tackle the problem ? Thanks ! > > Right, so this isn't pretty at all.. > > Ingo, the comment near the migration_notifier says that migration_call > should happen before all else, but can you see anything that would break > if we let the timer migration happen first? > > Thomas? That should work, though what is killing the scheduler per rq hrtimers _before_ we migrate stuff ? We don't want to migrate them, right ? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |