From: Sachin Sant on 12 Nov 2009 07:00 Stephen Rothwell wrote: > Hi all, > > Changes since 20091111: > I came across the following bug while executing cpu hotplug tests on a x86_64 box. This is with next version 2.6.32-rc6-20091112. (20280eab85704dcd05a20903f0de80be1c761c6e) This is a 4 way box. The problem is not always reproducible and can be recreated only after some amount of activity. ------------[ cut here ]------------ kernel BUG at kernel/sched.c:7359! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu1/online CPU 0 Modules linked in: ipv6 fuse loop dm_mod sg mptctl bnx2 rtc_cmos rtc_core rtc_lib i2c_piix4 tpm_tis serio_raw button shpchp pcspkr tpm i2c_core pci_hotplug k8temp tpm_bios ohci_hcd ehci_hcd sd_mod crc_t10dif usbcore edd fan thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod Pid: 11504, comm: hotplug04.sh Not tainted 2.6.32-rc6-autotest-next-20091112 #1 BladeCenter LS21 -[79716AA]- RIP: 0010:[<ffffffff8134a744>] [<ffffffff8134a744>] migration_call+0x381/0x51a RSP: 0018:ffff8801159fdd48 EFLAGS: 00010046 RAX: 0000000000000001 RBX: ffff88011e2de180 RCX: ffffffffff8d8f20 RDX: ffff880028280000 RSI: ffff880028293f88 RDI: ffff880127a3e708 RBP: ffff8801159fdd98 R08: 0000000000000000 R09: 000000046c250cb4 R10: dead000000100100 R11: 7fffffffffffffff R12: ffffffff816d7020 R13: ffff880028293f00 R14: ffff880127a3e6c0 R15: ffff880028293f00 FS: 00007f782aef66f0(0000) GS:ffff880028200000(0000) knlGS:0000000055731b00 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000061f4f0 CR3: 00000001271a0000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process hotplug04.sh (pid: 11504, threadinfo ffff8801159fc000, task ffff8801293e2600) Stack: 0000000000000001 0000000000013f00 0000000100000000 0000000000000001 <0> ffff8801159fddb8 0000000000000000 00000000fffffffe ffffffff8176c800 <0> 0000000000000001 0000000000000007 ffff8801159fddd8 ffffffff81351b16 Call Trace: [<ffffffff81351b16>] notifier_call_chain+0x33/0x5b [<ffffffff8105db28>] raw_notifier_call_chain+0xf/0x11 [<ffffffff8133db32>] _cpu_down+0x1f7/0x2f1 [<ffffffff8134cb6a>] ? wait_for_completion+0x18/0x1a [<ffffffff8133dc74>] cpu_down+0x48/0x80 [<ffffffff8133f89a>] store_online+0x2c/0x6f [<ffffffff8128c44b>] sysdev_store+0x1b/0x1d [<ffffffff811382f8>] sysfs_write_file+0xdf/0x114 [<ffffffff810e4401>] vfs_write+0xb4/0x186 [<ffffffff810e4597>] sys_write+0x47/0x6e [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b Code: c6 75 05 48 8b 1b eb ed 49 8b 46 30 4c 89 f6 4c 89 ff ff 50 30 41 83 be 78 04 00 00 00 48 8b 45 b0 48 8b 14 c5 70 4d 77 81 75 04 <0f> 0b eb fe 49 8b 06 48 83 f8 40 75 04 0f 0b eb fe 48 8b 5d b8 RIP [<ffffffff8134a744>] migration_call+0x381/0x51a kernel/sched.c:7359 corresponds to /* called under rq->lock with disabled interrupts */ static void migrate_dead(unsigned int dead_cpu, struct task_struct *p) { struct rq *rq = cpu_rq(dead_cpu); /* Must be exiting, otherwise would be on tasklist. */ BUG_ON(!p->exit_state); <<==== Thanks -Sachin -- --------------------------------- Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India --------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 12 Nov 2009 07:20 On Thu, 2009-11-12 at 17:23 +0530, Sachin Sant wrote: > Stephen Rothwell wrote: > > Hi all, > > > > Changes since 20091111: > > > I came across the following bug while executing cpu hotplug tests > on a x86_64 box. This is with next version 2.6.32-rc6-20091112. > (20280eab85704dcd05a20903f0de80be1c761c6e) > > This is a 4 way box. The problem is not always reproducible and > can be recreated only after some amount of activity. > > ------------[ cut here ]------------ > kernel BUG at kernel/sched.c:7359! > invalid opcode: 0000 [#1] SMP > last sysfs file: /sys/devices/system/cpu/cpu1/online > CPU 0 > Modules linked in: ipv6 fuse loop dm_mod sg mptctl bnx2 rtc_cmos rtc_core > rtc_lib i2c_piix4 tpm_tis serio_raw button shpchp pcspkr tpm i2c_core > pci_hotplug k8temp tpm_bios ohci_hcd ehci_hcd sd_mod crc_t10dif usbcore edd fan > thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas > scsi_mod > Pid: 11504, comm: hotplug04.sh Not tainted 2.6.32-rc6-autotest-next-20091112 #1 > BladeCenter LS21 -[79716AA]- > RIP: 0010:[<ffffffff8134a744>] [<ffffffff8134a744>] migration_call+0x381/0x51a > RSP: 0018:ffff8801159fdd48 EFLAGS: 00010046 > RAX: 0000000000000001 RBX: ffff88011e2de180 RCX: ffffffffff8d8f20 > RDX: ffff880028280000 RSI: ffff880028293f88 RDI: ffff880127a3e708 > RBP: ffff8801159fdd98 R08: 0000000000000000 R09: 000000046c250cb4 > R10: dead000000100100 R11: 7fffffffffffffff R12: ffffffff816d7020 > R13: ffff880028293f00 R14: ffff880127a3e6c0 R15: ffff880028293f00 > FS: 00007f782aef66f0(0000) GS:ffff880028200000(0000) knlGS:0000000055731b00 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000061f4f0 CR3: 00000001271a0000 CR4: 00000000000006f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process hotplug04.sh (pid: 11504, threadinfo ffff8801159fc000, task > ffff8801293e2600) > Stack: > 0000000000000001 0000000000013f00 0000000100000000 0000000000000001 > <0> ffff8801159fddb8 0000000000000000 00000000fffffffe ffffffff8176c800 > <0> 0000000000000001 0000000000000007 ffff8801159fddd8 ffffffff81351b16 > Call Trace: > [<ffffffff81351b16>] notifier_call_chain+0x33/0x5b > [<ffffffff8105db28>] raw_notifier_call_chain+0xf/0x11 > [<ffffffff8133db32>] _cpu_down+0x1f7/0x2f1 > [<ffffffff8134cb6a>] ? wait_for_completion+0x18/0x1a > [<ffffffff8133dc74>] cpu_down+0x48/0x80 > [<ffffffff8133f89a>] store_online+0x2c/0x6f > [<ffffffff8128c44b>] sysdev_store+0x1b/0x1d > [<ffffffff811382f8>] sysfs_write_file+0xdf/0x114 > [<ffffffff810e4401>] vfs_write+0xb4/0x186 > [<ffffffff810e4597>] sys_write+0x47/0x6e > [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b > Code: c6 75 05 48 8b 1b eb ed 49 8b 46 30 4c 89 f6 4c 89 ff ff 50 30 41 83 be > 78 04 00 00 00 48 8b 45 b0 48 8b 14 c5 70 4d 77 81 75 04 <0f> 0b eb fe 49 8b 06 > 48 83 f8 40 75 04 0f 0b eb fe 48 8b 5d b8 > RIP [<ffffffff8134a744>] migration_call+0x381/0x51a > > kernel/sched.c:7359 corresponds to > > /* called under rq->lock with disabled interrupts */ > static void migrate_dead(unsigned int dead_cpu, struct task_struct *p) > { > struct rq *rq = cpu_rq(dead_cpu); > > /* Must be exiting, otherwise would be on tasklist. */ > BUG_ON(!p->exit_state); <<==== I'm pretty sure we stumbled on a TASK_WAKING task there, trying to sort out the locking there, its a bit of a maze :/ How reproducable is this? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 12 Nov 2009 07:30 On Thu, 2009-11-12 at 17:53 +0530, Sachin Sant wrote: > > How reproducable is this? > > > I was able to recreate this once out of three tries. > > When i was able to recreate this bug, the box had been > running for a while and i had executed series of tests > (kernbench, hackbench, hugetlbfs) before cpu_hotplug. OK good, its easier to test patches when the thing is relatively easy to reproduce. I'll send you something to test once I've got a handle on it. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Sachin Sant on 12 Nov 2009 07:30 Peter Zijlstra wrote: >> ------------[ cut here ]------------ >> kernel BUG at kernel/sched.c:7359! >> invalid opcode: 0000 [#1] SMP >> last sysfs file: /sys/devices/system/cpu/cpu1/online >> CPU 0 >> Modules linked in: ipv6 fuse loop dm_mod sg mptctl bnx2 rtc_cmos rtc_core >> rtc_lib i2c_piix4 tpm_tis serio_raw button shpchp pcspkr tpm i2c_core >> pci_hotplug k8temp tpm_bios ohci_hcd ehci_hcd sd_mod crc_t10dif usbcore edd fan >> thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas >> scsi_mod >> Pid: 11504, comm: hotplug04.sh Not tainted 2.6.32-rc6-autotest-next-20091112 #1 >> BladeCenter LS21 -[79716AA]- >> RIP: 0010:[<ffffffff8134a744>] [<ffffffff8134a744>] migration_call+0x381/0x51a >> RSP: 0018:ffff8801159fdd48 EFLAGS: 00010046 >> RAX: 0000000000000001 RBX: ffff88011e2de180 RCX: ffffffffff8d8f20 >> RDX: ffff880028280000 RSI: ffff880028293f88 RDI: ffff880127a3e708 >> RBP: ffff8801159fdd98 R08: 0000000000000000 R09: 000000046c250cb4 >> R10: dead000000100100 R11: 7fffffffffffffff R12: ffffffff816d7020 >> R13: ffff880028293f00 R14: ffff880127a3e6c0 R15: ffff880028293f00 >> FS: 00007f782aef66f0(0000) GS:ffff880028200000(0000) knlGS:0000000055731b00 >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> CR2: 000000000061f4f0 CR3: 00000001271a0000 CR4: 00000000000006f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process hotplug04.sh (pid: 11504, threadinfo ffff8801159fc000, task >> ffff8801293e2600) >> Stack: >> 0000000000000001 0000000000013f00 0000000100000000 0000000000000001 >> <0> ffff8801159fddb8 0000000000000000 00000000fffffffe ffffffff8176c800 >> <0> 0000000000000001 0000000000000007 ffff8801159fddd8 ffffffff81351b16 >> Call Trace: >> [<ffffffff81351b16>] notifier_call_chain+0x33/0x5b >> [<ffffffff8105db28>] raw_notifier_call_chain+0xf/0x11 >> [<ffffffff8133db32>] _cpu_down+0x1f7/0x2f1 >> [<ffffffff8134cb6a>] ? wait_for_completion+0x18/0x1a >> [<ffffffff8133dc74>] cpu_down+0x48/0x80 >> [<ffffffff8133f89a>] store_online+0x2c/0x6f >> [<ffffffff8128c44b>] sysdev_store+0x1b/0x1d >> [<ffffffff811382f8>] sysfs_write_file+0xdf/0x114 >> [<ffffffff810e4401>] vfs_write+0xb4/0x186 >> [<ffffffff810e4597>] sys_write+0x47/0x6e >> [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b >> Code: c6 75 05 48 8b 1b eb ed 49 8b 46 30 4c 89 f6 4c 89 ff ff 50 30 41 83 be >> 78 04 00 00 00 48 8b 45 b0 48 8b 14 c5 70 4d 77 81 75 04 <0f> 0b eb fe 49 8b 06 >> 48 83 f8 40 75 04 0f 0b eb fe 48 8b 5d b8 >> RIP [<ffffffff8134a744>] migration_call+0x381/0x51a > I'm pretty sure we stumbled on a TASK_WAKING task there, trying to sort > out the locking there, its a bit of a maze :/ > > How reproducable is this? > I was able to recreate this once out of three tries. When i was able to recreate this bug, the box had been running for a while and i had executed series of tests (kernbench, hackbench, hugetlbfs) before cpu_hotplug. Thanks -Sachin -- --------------------------------- Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India --------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 12 Nov 2009 12:20 On Thu, 2009-11-12 at 13:27 +0100, Peter Zijlstra wrote: > On Thu, 2009-11-12 at 17:53 +0530, Sachin Sant wrote: > > > How reproducable is this? > > > > > I was able to recreate this once out of three tries. > > > > When i was able to recreate this bug, the box had been > > running for a while and i had executed series of tests > > (kernbench, hackbench, hugetlbfs) before cpu_hotplug. > > OK good, its easier to test patches when the thing is relatively easy to > reproduce. I'll send you something to test once I've got a handle on it. OK.. so on hotplug we do: cpu_down set_cpu_active(false) _cpu_down notify(CPU_DOWN_PREPARE) stop_machine(take_cpu_down) __cpu_disable() set_cpu_online(false); notify(CPU_DYING) __cpu_die() /* note no more stop_machine */ notify(CPU_DEAD) Then on the scheduler hotplug notifier (migration_call), we mostly deal with CPU_DEAD, where we do: case CPU_DEAD: case CPU_DEAD_FROZEN: cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */ migrate_live_tasks(cpu); rq = cpu_rq(cpu); kthread_stop(rq->migration_thread); put_task_struct(rq->migration_thread); rq->migration_thread = NULL; /* Idle task back to normal (off runqueue, low prio) */ spin_lock_irq(&rq->lock); update_rq_clock(rq); deactivate_task(rq, rq->idle, 0); rq->idle->static_prio = MAX_PRIO; __setscheduler(rq, rq->idle, SCHED_NORMAL, 0); rq->idle->sched_class = &idle_sched_class; migrate_dead_tasks(cpu); spin_unlock_irq(&rq->lock); cpuset_unlock(); Where migrate_list_tasks() basically iterates the full task list, and for each task where task_cpu() == dead_cpu invokes __migrate_task() to move it to an online cpu. Furthermore, the sched_domain notifier (update_sched_domains), will on CPU_DEAD rebuild the sched_domain tree. Now, I think this all can race against try_to_wake_up() when select_task_rq_fair() hits the old sched_domain tree, because it only overlays the sched_domain masks against p->cpus_allowed, without regard for cpu_online_mask. It could therefore return an offline cpu and move a task onto it after migrate_live_tasks() and before migrate_dead_tasks(). The trivial solution that comes to mind is something like this: diff --git a/kernel/sched.c b/kernel/sched.c index 1f2e99d..15dcb41 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -2376,7 +2376,11 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, p->state = TASK_WAKING; task_rq_unlock(rq, &flags); +again: cpu = p->sched_class->select_task_rq(p, SD_BALANCE_WAKE, wake_flags); + if (!cpu_active(cpu)) + goto again; + if (cpu != orig_cpu) { local_irq_save(flags); rq = cpu_rq(cpu); However, Mike ran into a similar problem and we tried something similar and that deadlocked for him -- something which I can see happen when we do this from an interrupt on the machine running the CPU_DEAD notifier and the update_sched_domains() notifier will be run after the migration_call() notifier. So what we need to do is make the whole of select_task_rq_fair() cpu_online/active_mask aware, or give up and simply punt: diff --git a/kernel/sched.c b/kernel/sched.c index 1f2e99d..62df61c 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -2377,6 +2377,9 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, task_rq_unlock(rq, &flags); cpu = p->sched_class->select_task_rq(p, SD_BALANCE_WAKE, wake_flags); + if (!cpu_active(cpu)) + cpu = cpumask_any_and(&p->cpus_allowed, cpu_active_mask); + if (cpu != orig_cpu) { local_irq_save(flags); rq = cpu_rq(cpu); Something I think Mike also tried and didn't deadlock for him.. Sachin, Mike, could you try the above snippet and verify if it does indeed solve your respective issues? /me prays it does, because otherwise I'm fresh out of clue... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 3 4 Prev: security/selinux: decrement sizeof size in strncmp Next: AMI305 magnetometer driver |