From: Peter Zijlstra on 3 Aug 2010 05:10 On Thu, 2010-07-15 at 23:52 +0300, Pekka Enberg wrote: > On Thu, Jul 15, 2010 at 11:00 PM, Damien Wyart <damien.wyart(a)free.fr> wrote: > >> > For now, I can't reproduce the problem with CONFIG_NO_BOOTMEM disabled ; > >> > with the option and rc5 the problem was happening quite quickly after > >> > boot and normal use of the machine. So it seems I can confirme what Zeno > >> > has seen and I hope this will give a hint to debug the problem. I guess > >> > this has not been reported that much because many testers might not have > >> > enabled CONFIG_NO_BOOTMEM... Maybe the scheduler folks could test their > >> > benchmark with a kernel having this option enabled? > > > > * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-15 22:50]: > >> To be honest, the bug is bit odd. It's related to boot-time memory > >> allocator changes but yet it seems to manifest itself as a scheduling > >> problem. So if you have some spare time and want to speed up the > >> debugging process, please test v2.6.34 and v2.6.35-rc1 with > >> CONFIG_NO_BOOTMEM and if former is good and latter is bad, try to see > >> if you can identify the offending commit with "git bisect." > > > > Not sure I will have enough time in the coming days (doing that remotely > > is fishy since ssh access is almost stuck when the problem occurs); if > > Zeno can and would like to do it, maybe this could be done faster. > > > > As the scheduler is now very well instrumented (many debugging features > > are available), reproducing the bug on a test platform (it happens quite > > quickly for me) might also give some hints. So testers, if you have > > time, please test 2.6.35-rc5 with CONFIG_NO_BOOTMEM on a Core i7 and see > > if you can reproduce the problem! > > Yeah, there's "perf sched" tool available for that: > > http://lwn.net/Articles/353295/ > > The only problem is that we'd need a scheduler hacker to decipher the > report and all of them seem to be missing at the moment (probably at > OLS). Anyway, like I said, git bisect will probably speed up the > debugging process, that's all. Vacation.. but now I'm back ;-) Even something simple as: perf top -r 1 (make sure you're root in order to run with real-time prios) could give a clue as to what is consuming all your cpu-time. Or did the issue get sorted already? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Zeno Davatz on 3 Aug 2010 05:20 On Tue, Aug 3, 2010 at 11:15 AM, <damien.wyart(a)free.fr> wrote: >> > Vacation.. but now I'm back ;-) >> > >> > Even something simple as: perf top -r 1 (make sure you're root in order >> > to run with real-time prios) could give a clue as to what is consuming >> > all your cpu-time. >> > >> > Or did the issue get sorted already? >> >> Thank you for the hint. >> >> I am on 2.6.35 now and all seems to be fine again. > > Are you 100% sure you compiled it with CONFIG_NO_BOOTMEM enabled? > > I did not test 2.6.35 yet but I did not see anything related to this bug > commited since the discussion so I am very surprised the problem disappeared by > itself... > > Will be on vacation very soon, so not sure I will have time to test 2.6.35 > before leaving. Yes: I got: # CONFIG_PARAVIRT_SPINLOCKS is not set CONFIG_PARAVIRT_CLOCK=y # CONFIG_PARAVIRT_DEBUG is not set CONFIG_NO_BOOTMEM=y # CONFIG_MEMTEST is not set # CONFIG_M386 is not set # CONFIG_M486 is not set in my .config. Linux zenogentoo 2.6.35 #122 SMP Mon Aug 2 10:26:05 CEST 2010 i686 Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux Best Zeno -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: damien.wyart on 3 Aug 2010 05:20 > > Vacation.. but now I'm back ;-) > > > > Even something simple as: perf top -r 1 (make sure you're root in order > > to run with real-time prios) could give a clue as to what is consuming > > all your cpu-time. > > > > Or did the issue get sorted already? > > Thank you for the hint. > > I am on 2.6.35 now and all seems to be fine again. Are you 100% sure you compiled it with CONFIG_NO_BOOTMEM enabled? I did not test 2.6.35 yet but I did not see anything related to this bug commited since the discussion so I am very surprised the problem disappeared by itself... Will be on vacation very soon, so not sure I will have time to test 2.6.35 before leaving. Damien -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Zeno Davatz on 3 Aug 2010 05:20
On Tue, Aug 3, 2010 at 11:05 AM, Peter Zijlstra <peterz(a)infradead.org> wrote: > On Thu, 2010-07-15 at 23:52 +0300, Pekka Enberg wrote: >> On Thu, Jul 15, 2010 at 11:00 PM, Damien Wyart <damien.wyart(a)free.fr> wrote: >> >> > For now, I can't reproduce the problem with CONFIG_NO_BOOTMEM disabled ; >> >> > with the option and rc5 the problem was happening quite quickly after >> >> > boot and normal use of the machine. So it seems I can confirme what Zeno >> >> > has seen and I hope this will give a hint to debug the problem. I guess >> >> > this has not been reported that much because many testers might not have >> >> > enabled CONFIG_NO_BOOTMEM... Maybe the scheduler folks could test their >> >> > benchmark with a kernel having this option enabled? >> > >> > * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-15 22:50]: >> >> To be honest, the bug is bit odd. It's related to boot-time memory >> >> allocator changes but yet it seems to manifest itself as a scheduling >> >> problem. So if you have some spare time and want to speed up the >> >> debugging process, please test v2.6.34 and v2.6.35-rc1 with >> >> CONFIG_NO_BOOTMEM and if former is good and latter is bad, try to see >> >> if you can identify the offending commit with "git bisect." >> > >> > Not sure I will have enough time in the coming days (doing that remotely >> > is fishy since ssh access is almost stuck when the problem occurs); if >> > Zeno can and would like to do it, maybe this could be done faster. >> > >> > As the scheduler is now very well instrumented (many debugging features >> > are available), reproducing the bug on a test platform (it happens quite >> > quickly for me) might also give some hints. So testers, if you have >> > time, please test 2.6.35-rc5 with CONFIG_NO_BOOTMEM on a Core i7 and see >> > if you can reproduce the problem! >> >> Yeah, there's "perf sched" tool available for that: >> >> � http://lwn.net/Articles/353295/ >> >> The only problem is that we'd need a scheduler hacker to decipher the >> report and all of them seem to be missing at the moment (probably at >> OLS). Anyway, like I said, git bisect will probably speed up the >> debugging process, that's all. > > Vacation.. but now I'm back ;-) > > Even something simple as: perf top -r 1 (make sure you're root in order > to run with real-time prios) could give a clue as to what is consuming > all your cpu-time. > > Or did the issue get sorted already? Thank you for the hint. I am on 2.6.35 now and all seems to be fine again. Best Zeno -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |