Prev: Finer granularity and task/cgroup irq time accounting
Next: drivers/gpu/drm/radeon/r600_blit.c: fix possible NULL pointer derefernce
From: KAMEZAWA Hiroyuki on 29 Jul 2010 04:10 On Thu, 29 Jul 2010 15:38:06 +0800 Luming Yu <luming.yu(a)gmail.com> wrote: > On Tue, Jul 27, 2010 at 5:03 PM, KAMEZAWA Hiroyuki > # gdb ./foo > GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5) > Copyright (C) 2009 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "ia64-redhat-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from /root/foo...done. > (gdb) break leaf > Breakpoint 1 at 0x40000000000005a1: file foo.c, line 2. > (gdb) run > Starting program: /root/foo > > Breakpoint 1, leaf () at foo.c:2 > 2 } > (gdb) gcore /tmp/save > Segmentation fault > # cat /proc/version > Linux version 2.6.35-rc3+ ... > > Hmm. What is EXEC_PAGESIZE installed in /usr/include/asm-generic/param.h ? And what happnes when modify it to 16k if it's 64k ? Thanks -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Luming Yu on 29 Jul 2010 04:50 On Thu, Jul 29, 2010 at 3:58 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote: > On Thu, 29 Jul 2010 15:38:06 +0800 > Luming Yu <luming.yu(a)gmail.com> wrote: > >> On Tue, Jul 27, 2010 at 5:03 PM, KAMEZAWA Hiroyuki > >> # gdb ./foo >> GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5) >> Copyright (C) 2009 Free Software Foundation, Inc. >> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> >> This is free software: you are free to change and redistribute it. >> There is NO WARRANTY, to the extent permitted by law. Type "show copying" >> and "show warranty" for details. >> This GDB was configured as "ia64-redhat-linux-gnu". >> For bug reporting instructions, please see: >> <http://www.gnu.org/software/gdb/bugs/>... >> Reading symbols from /root/foo...done. >> (gdb) break leaf >> Breakpoint 1 at 0x40000000000005a1: file foo.c, line 2. >> (gdb) run >> Starting program: /root/foo >> >> Breakpoint 1, leaf () at foo.c:2 >> 2 } >> (gdb) gcore /tmp/save >> Segmentation fault >> # cat /proc/version >> Linux version 2.6.35-rc3+ ... >> >> > > Hmm. What is EXEC_PAGESIZE installed in /usr/include/asm-generic/param.h ? I use stock gdb shipped with RHEL 5.5. > And what happnes when modify it to 16k if it's 64k ? Want me to repbuild a gdb with this modification? > > Thanks > -Kame > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on 29 Jul 2010 05:00 On Thu, 29 Jul 2010 16:40:50 +0800 Luming Yu <luming.yu(a)gmail.com> wrote: > On Thu, Jul 29, 2010 at 3:58 PM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu(a)jp.fujitsu.com> wrote: > > On Thu, 29 Jul 2010 15:38:06 +0800 > > Luming Yu <luming.yu(a)gmail.com> wrote: > > > >> On Tue, Jul 27, 2010 at 5:03 PM, KAMEZAWA Hiroyuki > > > >> # gdb ./foo > >> GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5) > >> Copyright (C) 2009 Free Software Foundation, Inc. > >> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > >> This is free software: you are free to change and redistribute it. > >> There is NO WARRANTY, to the extent permitted by law. Type "show copying" > >> and "show warranty" for details. > >> This GDB was configured as "ia64-redhat-linux-gnu". > >> For bug reporting instructions, please see: > >> <http://www.gnu.org/software/gdb/bugs/>... > >> Reading symbols from /root/foo...done. > >> (gdb) break leaf > >> Breakpoint 1 at 0x40000000000005a1: file foo.c, line 2. > >> (gdb) run > >> Starting program: /root/foo > >> > >> Breakpoint 1, leaf () at foo.c:2 > >> 2 } > >> (gdb) gcore /tmp/save > >> Segmentation fault > >> # cat /proc/version > >> Linux version 2.6.35-rc3+ ... > >> > >> > > > > Hmm. What is EXEC_PAGESIZE installed in /usr/include/asm-generic/param.h ? > > I use stock gdb shipped with RHEL 5.5. > Hmm. RHEL5.5's EXEC_PAGESIZE is 64k, right ? (And your kernel is 16k.) > > And what happnes when modify it to 16k if it's 64k ? > > Want me to repbuild a gdb with this modification? > Ahhh, yes. It will be required...but plz when you have free time. I don't think the difference can cause MCA or hang... Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: dann frazier on 29 Jul 2010 15:30 On Wed, Jul 28, 2010 at 08:50:18PM -0700, Hugh Dickins wrote: > On Tue, 27 Jul 2010, dann frazier wrote: > > On Tue, Jul 27, 2010 at 06:03:30PM +0900, KAMEZAWA Hiroyuki wrote: > > > On Tue, 27 Jul 2010 01:19:15 -0600 > > > dann frazier <dannf(a)debian.org> wrote: > > > > On Tue, Jul 20, 2010 at 09:19:50PM -0700, Hugh Dickins wrote: > > > > > On Tue, 20 Jul 2010, dann frazier wrote: > > > > > > On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote: > > > > > > > On Tue, 20 Jul 2010 11:35:12 -0600 > > > > > > > dann frazier <dannf(a)debian.org> wrote: > > > > > > > > > > > > > > > Debian's ia64 autobuilders have been experiencing system crashes while > > > > > > > > trying to run the gdb test suite: > > > > > > > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574 > > > > > > > > > > > > > > > > I was able to reproduce this w/ the latest git tree, and bisected it > > > > > > > > down to this commit, introduced in 2.6.32: > > > > > > > > > > > > > > > > commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1 > > > > > > > > Author: Hugh Dickins <hugh.dickins(a)tiscali.co.uk> > > > > > > > > Date: Mon Sep 21 17:03:34 2009 -0700 > > > > > > > > > > > > > > > > mm: ZERO_PAGE without PTE_SPECIAL > > > > > > > > > > > > > > > > Reinstate anonymous use of ZERO_PAGE to all architectures, not just to > > > > > > > > those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin. > > > > > > > > > > > > > > > > Contrary to how I'd imagined it, there's nothing ugly about this, just a > > > > > > > > zero_pfn test built into one or another block of vm_normal_page(). > > > > > > > > > > > > > > > > But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and > > > > > > > > my_zero_pfn() inlines. Reinstate its mremap move_pte() shuffling of > > > > > > > > ZERO_PAGEs we did from 2.6.17 to 2.6.19? Not unless someone shouts for > > > > > > > > that: it would have to take vm_flags to weed out some cases. > > > > > > > > > > > > > > > > fyi, I found this to not be reproducible on SLES11 SP1 (which is > > > > > > > > 2.6.32-based). I compared the .configs and found that the relevant > > > > > > > > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but > > > > > > > > reliably fails w/ 16KB pages. > > > > > > > > > > > > > > > > > > > > > > Sorry, I have no idea... > > > > > > > Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ? > > > > > > > > > > > > > > > > > > dannf(a)krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley > > > > > > a0000001008784c0 d __ksymtab_empty_zero_page > > > > > > a000000100882688 d __kcrctab_empty_zero_page > > > > > > a000000100884ca4 r __kstrtab_empty_zero_page > > > > > > a000000100974000 D empty_zero_page > > > > > > > > > > Thanks a lot for reporting this, but I too have no idea yet. > > > > > > > > > > It is likely that the bug is not to be found in that 62eede62, but > > > > > rather in one of the preceding patches to mm/memory.c which 62eede62 > > > > > was extending to ia64 and other architectures without PTE_SPECIAL. > > > > > > > > > > I wonder, from looking at that gdb testsuite log, is it plausible > > > > > that all these hangs/crashes occurred when writing out a coredump? > > > > > Is that something you could check for us? or rule out the possibility. > > > > > > > > Yep, seems so. I've reduced it down to this test case: > > > > > > > > dannf(a)rx2600:~> cat > foo.c > > > > int leaf(void) { > > > > return 0; > > > > } > > > > > > > > int main(void) { > > > > leaf(); > > > > } > > > > dannf(a)rx2600:~> gcc -g foo.c -o foo > > > > dannf(a)rx2600:~> gdb ./foo > > > > GNU gdb (GDB) SUSE (7.0-0.4.16) > > > > Copyright (C) 2009 Free Software Foundation, Inc. > > > > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > > > > This is free software: you are free to change and redistribute it. > > > > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > > > > and "show warranty" for details. > > > > This GDB was configured as "ia64-suse-linux". > > > > For bug reporting instructions, please see: > > > > <http://www.gnu.org/software/gdb/bugs/>... > > > > Reading symbols from /home/dannf/foo...done. > > > > (gdb) break leaf > > > > Breakpoint 1 at 0x40000000000005c1: file foo.c, line 2. > > > > (gdb) run > > > > Starting program: /home/dannf/foo > > > > Missing separate debuginfo for /lib/ld-linux-ia64.so.2 > > > > Try: zypper install -C "debuginfo(build-id)=d5bfb8b5940e174d54b978ca515dc0df76c7618c" > > > > Missing separate debuginfo for /lib/libc.so.6.1 > > > > Try: zypper install -C "debuginfo(build-id)=ca78657bd9173653d95f8504a313d2b6db8cb1d6" > > > > > > > > Breakpoint 1, leaf () at foo.c:2 > > > > 2 return 0; > > > > (gdb) gcore /tmp/save > > > > > > > > [bang] > > > > > > > > > > Does this happen on 2.6.34 or 2.6.35-rc kernel ? > > > > I've been testing w/ a 2.6.35-rc4+, though it was originally reported > > on a 2.6.32. > > Thanks a lot for narrowing down to that simple testcase, and > thanks a lot for checking it's just as bad on recent kernels. > > I'm sorry to say that I'm still just as baffled. > > Let's note that gdb's gcore is building up its own version of a > coredump, not going through the get_dump_page() code I was wondering > about. If I read gcore correctly (possibly not!), it will be reading > selected areas from /proc/<pid>/mem i.e. using access_process_vm(). This appears to be correct. I was able to collect the following stacktrace using INIT: [ 2535.074197] Backtrace of pid 4605 (gdb) [ 2535.074197] [ 2535.074197] Call Trace: [ 2535.074197] [<a00000010000bb00>] ia64_native_leave_kernel+0x0/0x270 [ 2535.074197] sp=e000004081c77c40 bsp=e000004081c71018 [ 2535.074197] [<a000000100334720>] __copy_user+0x160/0x960 [ 2535.074197] sp=e000004081c77e10 bsp=e000004081c71018 [ 2535.074197] [<a000000100176b00>] access_process_vm+0x2c0/0x380 [ 2535.074197] sp=e000004081c77e10 bsp=e000004081c70f60 > But why the (16kB but not 64kB!) zero page should make that freeze > or reboot, I have no idea. > > What would I be doing if I had an Itanium? I think I'd be trying to > narrow down exactly where it goes bad (tedious when the penalty is > a freeze or reboot). > > As it is, I'm hoping that someone with an ia64 can investigate... > > Hugh > -- dann frazier -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on 29 Jul 2010 20:50
On Thu, 29 Jul 2010 13:22:16 -0600 dann frazier <dannf(a)debian.org> wrote: > On Wed, Jul 28, 2010 at 08:50:18PM -0700, Hugh Dickins wrote: > > On Tue, 27 Jul 2010, dann frazier wrote: > > > On Tue, Jul 27, 2010 at 06:03:30PM +0900, KAMEZAWA Hiroyuki wrote: > > > > On Tue, 27 Jul 2010 01:19:15 -0600 > > > > dann frazier <dannf(a)debian.org> wrote: > > > > > On Tue, Jul 20, 2010 at 09:19:50PM -0700, Hugh Dickins wrote: > > > > > > On Tue, 20 Jul 2010, dann frazier wrote: > > > > > > > On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote: > > > > > > > > On Tue, 20 Jul 2010 11:35:12 -0600 > > > > > > > > dann frazier <dannf(a)debian.org> wrote: > > > > > > > > > > > > > > > > > Debian's ia64 autobuilders have been experiencing system crashes while > > > > > > > > > trying to run the gdb test suite: > > > > > > > > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574 > > > > > > > > > > > > > > > > > > I was able to reproduce this w/ the latest git tree, and bisected it > > > > > > > > > down to this commit, introduced in 2.6.32: > > > > > > > > > > > > > > > > > > commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1 > > > > > > > > > Author: Hugh Dickins <hugh.dickins(a)tiscali.co.uk> > > > > > > > > > Date: Mon Sep 21 17:03:34 2009 -0700 > > > > > > > > > > > > > > > > > > mm: ZERO_PAGE without PTE_SPECIAL > > > > > > > > > > > > > > > > > > Reinstate anonymous use of ZERO_PAGE to all architectures, not just to > > > > > > > > > those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin. > > > > > > > > > > > > > > > > > > Contrary to how I'd imagined it, there's nothing ugly about this, just a > > > > > > > > > zero_pfn test built into one or another block of vm_normal_page(). > > > > > > > > > > > > > > > > > > But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and > > > > > > > > > my_zero_pfn() inlines. Reinstate its mremap move_pte() shuffling of > > > > > > > > > ZERO_PAGEs we did from 2.6.17 to 2.6.19? Not unless someone shouts for > > > > > > > > > that: it would have to take vm_flags to weed out some cases. > > > > > > > > > > > > > > > > > > fyi, I found this to not be reproducible on SLES11 SP1 (which is > > > > > > > > > 2.6.32-based). I compared the .configs and found that the relevant > > > > > > > > > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but > > > > > > > > > reliably fails w/ 16KB pages. > > > > > > > > > > > > > > > > > > > > > > > > > Sorry, I have no idea... > > > > > > > > Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ? > > > > > > > > > > > > > > > > > > > > > dannf(a)krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley > > > > > > > a0000001008784c0 d __ksymtab_empty_zero_page > > > > > > > a000000100882688 d __kcrctab_empty_zero_page > > > > > > > a000000100884ca4 r __kstrtab_empty_zero_page > > > > > > > a000000100974000 D empty_zero_page > > > > > > > > > > > > Thanks a lot for reporting this, but I too have no idea yet. > > > > > > > > > > > > It is likely that the bug is not to be found in that 62eede62, but > > > > > > rather in one of the preceding patches to mm/memory.c which 62eede62 > > > > > > was extending to ia64 and other architectures without PTE_SPECIAL. > > > > > > > > > > > > I wonder, from looking at that gdb testsuite log, is it plausible > > > > > > that all these hangs/crashes occurred when writing out a coredump? > > > > > > Is that something you could check for us? or rule out the possibility. > > > > > > > > > > Yep, seems so. I've reduced it down to this test case: > > > > > > > > > > dannf(a)rx2600:~> cat > foo.c > > > > > int leaf(void) { > > > > > return 0; > > > > > } > > > > > > > > > > int main(void) { > > > > > leaf(); > > > > > } > > > > > dannf(a)rx2600:~> gcc -g foo.c -o foo > > > > > dannf(a)rx2600:~> gdb ./foo > > > > > GNU gdb (GDB) SUSE (7.0-0.4.16) > > > > > Copyright (C) 2009 Free Software Foundation, Inc. > > > > > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > > > > > This is free software: you are free to change and redistribute it. > > > > > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > > > > > and "show warranty" for details. > > > > > This GDB was configured as "ia64-suse-linux". > > > > > For bug reporting instructions, please see: > > > > > <http://www.gnu.org/software/gdb/bugs/>... > > > > > Reading symbols from /home/dannf/foo...done. > > > > > (gdb) break leaf > > > > > Breakpoint 1 at 0x40000000000005c1: file foo.c, line 2. > > > > > (gdb) run > > > > > Starting program: /home/dannf/foo > > > > > Missing separate debuginfo for /lib/ld-linux-ia64.so.2 > > > > > Try: zypper install -C "debuginfo(build-id)=d5bfb8b5940e174d54b978ca515dc0df76c7618c" > > > > > Missing separate debuginfo for /lib/libc.so.6.1 > > > > > Try: zypper install -C "debuginfo(build-id)=ca78657bd9173653d95f8504a313d2b6db8cb1d6" > > > > > > > > > > Breakpoint 1, leaf () at foo.c:2 > > > > > 2 return 0; > > > > > (gdb) gcore /tmp/save > > > > > > > > > > [bang] > > > > > > > > > > > > > Does this happen on 2.6.34 or 2.6.35-rc kernel ? > > > > > > I've been testing w/ a 2.6.35-rc4+, though it was originally reported > > > on a 2.6.32. > > > > Thanks a lot for narrowing down to that simple testcase, and > > thanks a lot for checking it's just as bad on recent kernels. > > > > I'm sorry to say that I'm still just as baffled. > > > > Let's note that gdb's gcore is building up its own version of a > > coredump, not going through the get_dump_page() code I was wondering > > about. If I read gcore correctly (possibly not!), it will be reading > > selected areas from /proc/<pid>/mem i.e. using access_process_vm(). > > This appears to be correct. I was able to collect the following > stacktrace using INIT: > > [ 2535.074197] Backtrace of pid 4605 (gdb) > [ 2535.074197] > [ 2535.074197] Call Trace: > [ 2535.074197] [<a00000010000bb00>] ia64_native_leave_kernel+0x0/0x270 > [ 2535.074197] sp=e000004081c77c40 bsp=e000004081c71018 > [ 2535.074197] [<a000000100334720>] __copy_user+0x160/0x960 > [ 2535.074197] sp=e000004081c77e10 bsp=e000004081c71018 > [ 2535.074197] [<a000000100176b00>] access_process_vm+0x2c0/0x380 > [ 2535.074197] sp=e000004081c77e10 bsp=e000004081c70f60 > Could you show full stack ? IIUC, ia64's gdb has to call both of strace(PEEK) and /proc/pid/mem to check hidden regiter stack. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |