From: Zeno Davatz on 16 Jul 2010 03:20 On Thu, Jul 15, 2010 at 10:50 PM, Pekka Enberg <penberg(a)cs.helsinki.fi> wrote: > On Thu, Jul 15, 2010 at 11:38 PM, Zeno Davatz <zdavatz(a)gmail.com> wrote: >> Am 15.07.2010 um 22:00 schrieb Damien Wyart <damien.wyart(a)free.fr>: >> >>>>> For now, I can't reproduce the problem with CONFIG_NO_BOOTMEM disabled ; >>>>> with the option and rc5 the problem was happening quite quickly after >>>>> boot and normal use of the machine. So it seems I can confirme what Zeno >>>>> has seen and I hope this will give a hint to debug the problem. I guess >>>>> this has not been reported that much because many testers might not have >>>>> enabled CONFIG_NO_BOOTMEM... Maybe the scheduler folks could test their >>>>> benchmark with a kernel having this option enabled? >>> >>> * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-15 22:50]: >>>> To be honest, the bug is bit odd. It's related to boot-time memory >>>> allocator changes but yet it seems to manifest itself as a scheduling >>>> problem. So if you have some spare time and want to speed up the >>>> debugging process, please test v2.6.34 and v2.6.35-rc1 with >>>> CONFIG_NO_BOOTMEM and if former is good and latter is bad, try to see >>>> if you can identify the offending commit with "git bisect." >>> >>> Not sure I will have enough time in the coming days (doing that remotely >>> is fishy since ssh access is almost stuck when the problem occurs); if >>> Zeno can and would like to do it, maybe this could be done faster. >>> >>> As the scheduler is now very well instrumented (many debugging features >>> are available), reproducing the bug on a test platform (it happens quite >>> quickly for me) might also give some hints. So testers, if you have >>> time, please test 2.6.35-rc5 with CONFIG_NO_BOOTMEM on a Core i7 and see >>> if you can reproduce the problem! >> >> Will try to do so. Can you point me to the git bisect howto with the versions you want. > > Cool. So like I said, you first want to test 2.6.34 to find a known > good version. Please remember to make sure you have CONFIG_NO_BOOTMEM > enabled. You can also try to speed up the process by testing > 2.6.35-rc1 which is likely to include the offending commit. That's not > strictly necessary as long as you are sure that you have some > 2.6.35-rc kernel that's bad. > > After that, bisecting is as simple as: > > �git bisect start > �git bisect good v2.6.34 > �git bisect bad v2.6.31-rc1 # or some other kernel you know to be bad > �<compile, boot, and try to trigger the problem> > > then > > �git bisect bad # if you were able to trigger the problem > > or > > �git bisect good # if the problem doesn't exist > > git will then find the next revision to test after which you do > > �<compile, boot, and try to trigger the problem> > > and repeat the "git bisect good/bad" step until git tells you it has > found the offending commit. > > There's more information on the git bisect man pages: > > http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html > > Let me know if you need more help with this. Ok, something sure is wrong with 2.6.34-rc8 I could not boot after I done the normal bit bisect, cp bzImage and then running lilo -v http://www.flickr.com/photos/zrr/4798077725/ I am gonna continue bisecting. 2.6.34-rc7 is fine. No CPU eaters around. Best Zeno -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Zeno Davatz on 16 Jul 2010 03:40 On Thu, Jul 15, 2010 at 10:50 PM, Pekka Enberg <penberg(a)cs.helsinki.fi> wrote: > On Thu, Jul 15, 2010 at 11:38 PM, Zeno Davatz <zdavatz(a)gmail.com> wrote: >> Am 15.07.2010 um 22:00 schrieb Damien Wyart <damien.wyart(a)free.fr>: >> >>>>> For now, I can't reproduce the problem with CONFIG_NO_BOOTMEM disabled ; >>>>> with the option and rc5 the problem was happening quite quickly after >>>>> boot and normal use of the machine. So it seems I can confirme what Zeno >>>>> has seen and I hope this will give a hint to debug the problem. I guess >>>>> this has not been reported that much because many testers might not have >>>>> enabled CONFIG_NO_BOOTMEM... Maybe the scheduler folks could test their >>>>> benchmark with a kernel having this option enabled? >>> >>> * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-15 22:50]: >>>> To be honest, the bug is bit odd. It's related to boot-time memory >>>> allocator changes but yet it seems to manifest itself as a scheduling >>>> problem. So if you have some spare time and want to speed up the >>>> debugging process, please test v2.6.34 and v2.6.35-rc1 with >>>> CONFIG_NO_BOOTMEM and if former is good and latter is bad, try to see >>>> if you can identify the offending commit with "git bisect." >>> >>> Not sure I will have enough time in the coming days (doing that remotely >>> is fishy since ssh access is almost stuck when the problem occurs); if >>> Zeno can and would like to do it, maybe this could be done faster. >>> >>> As the scheduler is now very well instrumented (many debugging features >>> are available), reproducing the bug on a test platform (it happens quite >>> quickly for me) might also give some hints. So testers, if you have >>> time, please test 2.6.35-rc5 with CONFIG_NO_BOOTMEM on a Core i7 and see >>> if you can reproduce the problem! >> >> Will try to do so. Can you point me to the git bisect howto with the versions you want. > > Cool. So like I said, you first want to test 2.6.34 to find a known > good version. Please remember to make sure you have CONFIG_NO_BOOTMEM > enabled. You can also try to speed up the process by testing > 2.6.35-rc1 which is likely to include the offending commit. That's not > strictly necessary as long as you are sure that you have some > 2.6.35-rc kernel that's bad. > > After that, bisecting is as simple as: > > �git bisect start > �git bisect good v2.6.34 > �git bisect bad v2.6.31-rc1 # or some other kernel you know to be bad > �<compile, boot, and try to trigger the problem> > > then > > �git bisect bad # if you were able to trigger the problem > > or > > �git bisect good # if the problem doesn't exist > > git will then find the next revision to test after which you do > > �<compile, boot, and try to trigger the problem> > > and repeat the "git bisect good/bad" step until git tells you it has > found the offending commit. > > There's more information on the git bisect man pages: > > http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html > > Let me know if you need more help with this. This one also causes a panic: http://www.flickr.com/photos/zrr/4798092747/in/photostream/ but this version boots just fine again: Linux zenogentoo 2.6.34-05459-gac3ee84 #102 SMP Fri Jul 16 09:22:25 CEST 2010 i686 Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux Best Zeno -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Pekka Enberg on 16 Jul 2010 04:00 On Fri, Jul 16, 2010 at 10:37 AM, Zeno Davatz <zdavatz(a)gmail.com> wrote: >> Let me know if you need more help with this. > > The next RC again hangs on me: > > http://www.flickr.com/photos/zrr/4798744700/sizes/l/ Doesn't look like a kernel bug to me. Maybe some Gentoo person knows better but the 'root' parameter you pass to the kernel in your lilo configuration looks a little strange. You should try passing "/dev/sdXX" to it where XX is whatever partition your root filesystems is on. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Zeno Davatz on 16 Jul 2010 05:20 On Fri, Jul 16, 2010 at 9:50 AM, Pekka Enberg <penberg(a)cs.helsinki.fi> wrote: > On Fri, Jul 16, 2010 at 10:37 AM, Zeno Davatz <zdavatz(a)gmail.com> wrote: >>> Let me know if you need more help with this. >> >> The next RC again hangs on me: >> >> http://www.flickr.com/photos/zrr/4798744700/sizes/l/ > > Doesn't look like a kernel bug to me. Maybe some Gentoo person knows > better but the 'root' parameter you pass to the kernel in your lilo > configuration looks a little strange. You should try passing > "/dev/sdXX" to it where XX is whatever partition your root filesystems > is on. This version has some problem with the DRM but no CPU eater yet. http://www.flickr.com/photos/zrr/4798885756/ This version boots again just fine: Linux zenogentoo 2.6.34-rc5-00059-gc2b4127 #105 SMP Fri Jul 16 11:13:21 CEST 2010 i686 Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux As I understand I am bisecting upwards. Every time it does not boot correctly I do git bisect bad after the next boot. Best Zeno -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Pekka Enberg on 16 Jul 2010 05:40
Hi Zeno, Zeno Davatz wrote: > This version has some problem with the DRM but no CPU eater yet. > > http://www.flickr.com/photos/zrr/4798885756/ > > This version boots again just fine: > > Linux zenogentoo 2.6.34-rc5-00059-gc2b4127 #105 SMP Fri Jul 16 > 11:13:21 CEST 2010 i686 Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz > GenuineIntel GNU/Linux You're going into the wrong direction. If 2.6.34-rc7 works just fine, you shouldn't be testing 2.6.34-rc5. > > As I understand I am bisecting upwards. Every time it does not boot > correctly I do > > git bisect bad after the next boot. No, you should only do "git bisect bad" if you find a CPU eater and "git bisect good" if you don't. For the non-booting kernels you should do "git bisect skip"; otherwise git gets confused as we can see here. Did you test v2.6.35-rc1? Does it have the CPU eater problem? If yes, please just reset your bisection git bisect reset git bisect start git bisect good v2.6.34-rc7 git bisect bad v2.6.35-rc1 and use 'git bisect skip' for kernels that don't boot or build. Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |