Prev: [PATCH] scripts/checkpatch.pl: Add strict test of logical test continuations at beginning of line
Next: [PATCH] pmcraid : Remove unwanted cast for void * pointers
From: Torsten Kaiser on 11 Jul 2010 15:00 Trying to upgrade my system from 2.6.33 to 2.6.34, I can't get it to boot. All tries used CONFIG_SLUB=y The gentoo version of 2.6.34 generated an OOPS during network initialization and then came to a stop. (It seemed that all processes got stuck waiting on some locks.) As in this instance the system was able to start the syslog, I was able to capture the complete OOPS: Jul 3 05:51:43 ariolc kernel: [ 32.674367] BUG: unable to handle kernel NULL pointer dereference at 0000000000000003 Jul 3 05:51:43 ariolc kernel: [ 32.675674] IP: [<ffffffff810aab89>] __kmalloc_track_caller+0x69/0x110 Jul 3 05:51:43 ariolc kernel: [ 32.676951] PGD 11e7e5067 PUD 11fd3d067 PMD 0 Jul 3 05:51:43 ariolc kernel: [ 32.678224] Oops: 0000 [#1] SMP Jul 3 05:51:43 ariolc kernel: [ 32.679477] last sysfs file: /sys/devices/virtual/block/md0/md/metadata_version Jul 3 05:51:43 ariolc kernel: [ 32.680745] CPU 1 Jul 3 05:51:43 ariolc kernel: [ 32.680761] Modules linked in: aes_x86_64(+) aes_generic sg Jul 3 05:51:43 ariolc kernel: [ 32.682764] Jul 3 05:51:43 ariolc kernel: [ 32.682764] Pid: 4652, comm: modprobe Not tainted 2.6.34-gentoo-r1 #1 MS-7368/MS-7368 Jul 3 05:51:43 ariolc kernel: [ 32.682764] RIP: 0010:[<ffffffff810aab89>] [<ffffffff810aab89>] __kmalloc_track_caller+0x69/0x110 Jul 3 05:51:43 ariolc kernel: [ 32.682764] RSP: 0018:ffff88011e75fe08 EFLAGS: 00010006 Jul 3 05:51:43 ariolc kernel: [ 32.687268] RAX: ffff880001b0f088 RBX: ffffffff8170d4d0 RCX: ffff88011e574b80 Jul 3 05:51:43 ariolc kernel: [ 32.688564] RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 00000000000002d0 Jul 3 05:51:43 ariolc kernel: [ 32.688564] RBP: 0000000000000296 R08: 0000000000000014 R09: ffff88011e574800 Jul 3 05:51:43 ariolc kernel: [ 32.691414] R10: 0000000000000001 R11: ffff880001a12008 R12: 00000000000000d0 Jul 3 05:51:43 ariolc kernel: [ 32.691414] R13: 0000000000000003 R14: ffffffff81064abb R15: ffffc90010729d68 Jul 3 05:51:43 ariolc kernel: [ 32.691414] FS: 00007f0a9acb8700(0000) GS:ffff880001b00000(0000) knlGS:0000000000000000 Jul 3 05:51:43 ariolc kernel: [ 32.691414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 3 05:51:43 ariolc kernel: [ 32.697212] CR2: 0000000000000003 CR3: 000000011d03e000 CR4: 00000000000006e0 Jul 3 05:51:43 ariolc kernel: [ 32.698792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 3 05:51:43 ariolc kernel: [ 32.698792] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jul 3 05:51:43 ariolc kernel: [ 32.698792] Process modprobe (pid: 4652, threadinfo ffff88011e75e000, task ffff88011d114150) Jul 3 05:51:43 ariolc kernel: [ 32.698792] Stack: Jul 3 05:51:43 ariolc kernel: [ 32.698792] 0000000000000000 ffffc90010729c97 0000000000000008 ffff88011e574800 Jul 3 05:51:43 ariolc kernel: [ 32.698792] <0> ffff88011e574aa0 ffffffff8108c27b ffffffffa0018920 ffffc900000000d0 Jul 3 05:51:43 ariolc kernel: [ 32.698792] <0> ffffffffa0018920 ffffc90010728000 ffffc90010729d68 ffffffff81064abb Jul 3 05:51:43 ariolc kernel: [ 32.708636] Call Trace: Jul 3 05:51:43 ariolc kernel: [ 32.708636] [<ffffffff8108c27b>] ? kstrdup+0x3b/0x70 Jul 3 05:51:43 ariolc kernel: [ 32.711488] [<ffffffff81064abb>] ? load_module+0x13eb/0x1730 Jul 3 05:51:43 ariolc kernel: [ 32.711488] [<ffffffff81064e7b>] ? sys_init_module+0x7b/0x260 Jul 3 05:51:43 ariolc kernel: [ 32.711488] [<ffffffff810024ab>] ? system_call_fastpath+0x16/0x1b Jul 3 05:51:43 ariolc kernel: [ 32.716465] Code: 23 25 dc 47 6f 00 41 f6 c4 10 75 66 9c 5d fa 65 48 8b 14 25 a8 d1 00 00 48 8b 03 48 8d 04 02 4c 8b 28 4d 85 ed 74 55 48 63 53 18 <49> 8b 54 15 00 48 89 10 55 9d 4d 85 ed 74 06 66 45 85 e4 78 22 Jul 3 05:51:43 ariolc kernel: [ 32.718865] RIP [<ffffffff810aab89>] __kmalloc_track_caller+0x69/0x110 Jul 3 05:51:43 ariolc kernel: [ 32.718865] RSP <ffff88011e75fe08> Jul 3 05:51:43 ariolc kernel: [ 32.718865] CR2: 0000000000000003 Jul 3 05:51:43 ariolc kernel: [ 32.718865] ---[ end trace 692101747f991cfb ]--- Two other OOPSen in __kmalloc() followed this one. I tried to switch from CONFIG_NO_BOOTMEM=y to unsetting this option. This kernel froze before the userspace was started, I did not see any OOPS output. Today I tried the vanilla 2.6.34.1 (again with CONFIG_NO_BOOTMEM=y). The vanilla kernel also crashed before userspace, again in __kmalloc(), but with a visible OOPS. I wrote the following informations down: OPPS was: BUG: unable to handle kernel NULL pointer dereference at 0000000000000003 Callchain started with: ffffffff810aab39 : __kmalloc_track_caller+0x69/0x110 ffffffff8108c23b : kstrdup+0x3b/0x70 called from sysfs_new_dirent there where no modules loaded at this time, the faulting process was Pid: 1, comm: swapper From System.map: ffffffff810aa910 t get_slab ffffffff810aa980 T __kmalloc_node_track_caller ffffffff810aaad0 T __kmalloc_track_caller ffffffff810aabe0 T __kmalloc Dump of assembler code from 0xffffffff810aaad0 to 0xffffffff810aabe0: 0xffffffff810aaad0: sub $0x28,%rsp 0xffffffff810aaad4: cmp $0x2000,%rdi 0xffffffff810aaadb: mov %r12,0x10(%rsp) 0xffffffff810aaae0: mov %r14,0x20(%rsp) 0xffffffff810aaae5: mov %esi,%r12d 0xffffffff810aaae8: mov %rbx,(%rsp) 0xffffffff810aaaec: mov %rbp,0x8(%rsp) 0xffffffff810aaaf1: mov %rdx,%r14 0xffffffff810aaaf4: mov %r13,0x18(%rsp) 0xffffffff810aaaf9: ja 0xffffffff810aaba3 0xffffffff810aaaff: callq 0xffffffff810aa910 0xffffffff810aab04: cmp $0x10,%rax 0xffffffff810aab08: mov %rax,%rbx 0xffffffff810aab0b: jbe 0xffffffff810aab51 0xffffffff810aab0d: and 0x6f48ac(%rip),%r12d # 0xffffffff8179f3c0 0xffffffff810aab14: test $0x10,%r12b 0xffffffff810aab18: jne 0xffffffff810aab80 0xffffffff810aab1a: pushfq 0xffffffff810aab1b: pop %rbp 0xffffffff810aab1c: cli 0xffffffff810aab1d: mov %gs:0xd1a8,%rdx 0xffffffff810aab26: mov (%rbx),%rax 0xffffffff810aab29: lea (%rdx,%rax,1),%rax 0xffffffff810aab2d: mov (%rax),%r13 0xffffffff810aab30: test %r13,%r13 0xffffffff810aab33: je 0xffffffff810aab8a 0xffffffff810aab35: movslq 0x18(%rbx),%rdx 0xffffffff810aab39: mov 0x0(%r13,%rdx,1),%rdx 0xffffffff810aab3e: mov %rdx,(%rax) 0xffffffff810aab41: push %rbp 0xffffffff810aab42: popfq 0xffffffff810aab43: test %r13,%r13 0xffffffff810aab46: je 0xffffffff810aab4e 0xffffffff810aab48: test %r12w,%r12w 0xffffffff810aab4c: js 0xffffffff810aab70 0xffffffff810aab4e: mov %r13,%rax 0xffffffff810aab51: mov (%rsp),%rbx 0xffffffff810aab55: mov 0x8(%rsp),%rbp 0xffffffff810aab5a: mov 0x10(%rsp),%r12 0xffffffff810aab5f: mov 0x18(%rsp),%r13 0xffffffff810aab64: mov 0x20(%rsp),%r14 0xffffffff810aab69: add $0x28,%rsp 0xffffffff810aab6d: retq 0xffffffff810aab6e: xchg %ax,%ax 0xffffffff810aab70: movslq 0x14(%rbx),%rdx 0xffffffff810aab74: xor %esi,%esi 0xffffffff810aab76: mov %r13,%rdi 0xffffffff810aab79: callq 0xffffffff811f51e0 0xffffffff810aab7e: jmp 0xffffffff810aab4e 0xffffffff810aab80: callq 0xffffffff814cd640 0xffffffff810aab85: nopl (%rax) 0xffffffff810aab88: jmp 0xffffffff810aab1a 0xffffffff810aab8a: mov %rax,%r8 0xffffffff810aab8d: mov %r14,%rcx 0xffffffff810aab90: or $0xffffffffffffffff,%edx 0xffffffff810aab93: mov %r12d,%esi 0xffffffff810aab96: mov %rbx,%rdi 0xffffffff810aab99: callq 0xffffffff810a9ae0 0xffffffff810aab9e: mov %rax,%r13 0xffffffff810aaba1: jmp 0xffffffff810aab41 0xffffffff810aaba3: dec %rdi 0xffffffff810aaba6: or $0xffffffffffffffff,%esi 0xffffffff810aaba9: shr $0xb,%rdi 0xffffffff810aabad: inc %esi 0xffffffff810aabaf: shr %rdi 0xffffffff810aabb2: jne 0xffffffff810aabad 0xffffffff810aabb4: mov %r12d,%edi 0xffffffff810aabb7: mov (%rsp),%rbx 0xffffffff810aabbb: mov 0x8(%rsp),%rbp 0xffffffff810aabc0: mov 0x10(%rsp),%r12 0xffffffff810aabc5: mov 0x18(%rsp),%r13 0xffffffff810aabca: or $0x4000,%edi 0xffffffff810aabd0: mov 0x20(%rsp),%r14 0xffffffff810aabd5: add $0x28,%rsp 0xffffffff810aabd9: jmpq 0xffffffff81080920 0xffffffff810aabde: xchg %ax,%ax From this assembly, I would guess its this line in slub.c / slab_alloc(): c->freelist = get_freepointer(s, object); A short test with 2.6.35-rc4 suggest that this problem has been fixed on master, although 2.6.35-rc4 only boots with radeon.modset=0. With KMS enabled the display turns off and the system does not even respond to SysRq+B. (I will report this KMS issue in another mail.) The system is an AMD RS690 with an Athlon X2 BE-2400. Under 2.6.33 the system is perfectly stable, KMS is working and enabled. Any guesses what this might cause? Thanks for looking that this, Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Pekka Enberg on 20 Jul 2010 16:20 Hi Torsten, On Sun, Jul 11, 2010 at 9:55 PM, Torsten Kaiser <just.for.lkml(a)googlemail.com> wrote: > Trying to upgrade my system from 2.6.33 to 2.6.34, I can't get it to boot. > > All tries used CONFIG_SLUB=y > > The gentoo version of 2.6.34 generated an OOPS during network > initialization and then came to a stop. (It seemed that all processes > got stuck waiting on some locks.) > As in this instance the system was able to start the syslog, I was > able to capture the complete OOPS: > Jul �3 05:51:43 ariolc kernel: [ � 32.674367] BUG: unable to handle > kernel NULL pointer dereference at 0000000000000003 > Jul �3 05:51:43 ariolc kernel: [ � 32.675674] IP: [<ffffffff810aab89>] > __kmalloc_track_caller+0x69/0x110 > Jul �3 05:51:43 ariolc kernel: [ � 32.676951] PGD 11e7e5067 PUD 11fd3d067 PMD 0 > Jul �3 05:51:43 ariolc kernel: [ � 32.678224] Oops: 0000 [#1] SMP > Jul �3 05:51:43 ariolc kernel: [ � 32.679477] last sysfs file: > /sys/devices/virtual/block/md0/md/metadata_version > Jul �3 05:51:43 ariolc kernel: [ � 32.680745] CPU 1 > Jul �3 05:51:43 ariolc kernel: [ � 32.680761] Modules linked in: > aes_x86_64(+) aes_generic sg > Jul �3 05:51:43 ariolc kernel: [ � 32.682764] > Jul �3 05:51:43 ariolc kernel: [ � 32.682764] Pid: 4652, comm: > modprobe Not tainted 2.6.34-gentoo-r1 #1 MS-7368/MS-7368 > Jul �3 05:51:43 ariolc kernel: [ � 32.682764] RIP: > 0010:[<ffffffff810aab89>] �[<ffffffff810aab89>] > __kmalloc_track_caller+0x69/0x110 > Jul �3 05:51:43 ariolc kernel: [ � 32.682764] RSP: > 0018:ffff88011e75fe08 �EFLAGS: 00010006 > Jul �3 05:51:43 ariolc kernel: [ � 32.687268] RAX: ffff880001b0f088 > RBX: ffffffff8170d4d0 RCX: ffff88011e574b80 > Jul �3 05:51:43 ariolc kernel: [ � 32.688564] RDX: 0000000000000000 > RSI: 00000000000000d0 RDI: 00000000000002d0 > Jul �3 05:51:43 ariolc kernel: [ � 32.688564] RBP: 0000000000000296 > R08: 0000000000000014 R09: ffff88011e574800 > Jul �3 05:51:43 ariolc kernel: [ � 32.691414] R10: 0000000000000001 > R11: ffff880001a12008 R12: 00000000000000d0 > Jul �3 05:51:43 ariolc kernel: [ � 32.691414] R13: 0000000000000003 > R14: ffffffff81064abb R15: ffffc90010729d68 > Jul �3 05:51:43 ariolc kernel: [ � 32.691414] FS: > 00007f0a9acb8700(0000) GS:ffff880001b00000(0000) > knlGS:0000000000000000 > Jul �3 05:51:43 ariolc kernel: [ � 32.691414] CS: �0010 DS: 0000 ES: > 0000 CR0: 0000000080050033 > Jul �3 05:51:43 ariolc kernel: [ � 32.697212] CR2: 0000000000000003 > CR3: 000000011d03e000 CR4: 00000000000006e0 > Jul �3 05:51:43 ariolc kernel: [ � 32.698792] DR0: 0000000000000000 > DR1: 0000000000000000 DR2: 0000000000000000 > Jul �3 05:51:43 ariolc kernel: [ � 32.698792] DR3: 0000000000000000 > DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Jul �3 05:51:43 ariolc kernel: [ � 32.698792] Process modprobe (pid: > 4652, threadinfo ffff88011e75e000, task ffff88011d114150) > Jul �3 05:51:43 ariolc kernel: [ � 32.698792] Stack: > Jul �3 05:51:43 ariolc kernel: [ � 32.698792] �0000000000000000 > ffffc90010729c97 0000000000000008 ffff88011e574800 > Jul �3 05:51:43 ariolc kernel: [ � 32.698792] <0> ffff88011e574aa0 > ffffffff8108c27b ffffffffa0018920 ffffc900000000d0 > Jul �3 05:51:43 ariolc kernel: [ � 32.698792] <0> ffffffffa0018920 > ffffc90010728000 ffffc90010729d68 ffffffff81064abb > Jul �3 05:51:43 ariolc kernel: [ � 32.708636] Call Trace: > Jul �3 05:51:43 ariolc kernel: [ � 32.708636] �[<ffffffff8108c27b>] ? > kstrdup+0x3b/0x70 > Jul �3 05:51:43 ariolc kernel: [ � 32.711488] �[<ffffffff81064abb>] ? > load_module+0x13eb/0x1730 > Jul �3 05:51:43 ariolc kernel: [ � 32.711488] �[<ffffffff81064e7b>] ? > sys_init_module+0x7b/0x260 > Jul �3 05:51:43 ariolc kernel: [ � 32.711488] �[<ffffffff810024ab>] ? > system_call_fastpath+0x16/0x1b > Jul �3 05:51:43 ariolc kernel: [ � 32.716465] Code: 23 25 dc 47 6f 00 > 41 f6 c4 10 75 66 9c 5d fa 65 48 8b 14 25 a8 d1 00 00 48 8b 03 48 8d > 04 02 4c 8b 28 4d 85 ed 74 55 48 63 53 18 <49> 8b 54 15 00 48 89 10 55 > 9d 4d 85 ed 74 06 66 45 85 e4 78 22 > Jul �3 05:51:43 ariolc kernel: [ � 32.718865] RIP > [<ffffffff810aab89>] __kmalloc_track_caller+0x69/0x110 > Jul �3 05:51:43 ariolc kernel: [ � 32.718865] �RSP <ffff88011e75fe08> > Jul �3 05:51:43 ariolc kernel: [ � 32.718865] CR2: 0000000000000003 > Jul �3 05:51:43 ariolc kernel: [ � 32.718865] ---[ end trace > 692101747f991cfb ]--- > > Two other OOPSen in __kmalloc() followed this one. > > I tried to switch from CONFIG_NO_BOOTMEM=y to unsetting this option. > This kernel froze before the userspace was started, I did not see any > OOPS output. > > Today I tried the vanilla 2.6.34.1 (again with CONFIG_NO_BOOTMEM=y). > The vanilla kernel also crashed before userspace, again in > __kmalloc(), but with a visible OOPS. > I wrote the following informations down: > OPPS was: BUG: unable to handle kernel NULL pointer dereference at > 0000000000000003 > Callchain started with: > ffffffff810aab39 : __kmalloc_track_caller+0x69/0x110 > ffffffff8108c23b : kstrdup+0x3b/0x70 > called from sysfs_new_dirent > there where no modules loaded at this time, the faulting process was > Pid: 1, comm: swapper [snip] > From this assembly, I would guess its this line in slub.c / slab_alloc(): > c->freelist = get_freepointer(s, object); > > A short test with 2.6.35-rc4 suggest that this problem has been fixed > on master, although 2.6.35-rc4 only boots with radeon.modset=0. With > KMS enabled the display turns off and the system does not even respond > to SysRq+B. > (I will report this KMS issue in another mail.) > > The system is an AMD RS690 with an Athlon X2 BE-2400. > Under 2.6.33 the system is perfectly stable, KMS is working and enabled. > > Any guesses what this might cause? It's slab corruption that can be cause by many things. Can you please try to reproduce with CONFIG_SLUB_DEBUG_ON=y? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on 20 Jul 2010 16:30 On Tue, 20 Jul 2010, Pekka Enberg wrote: > It's slab corruption that can be cause by many things. Can you please > try to reproduce with CONFIG_SLUB_DEBUG_ON=y? Or simply reboot and add a parameter slub_debug to the other parameters. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Torsten Kaiser on 31 Jul 2010 06:10
On Tue, Jul 20, 2010 at 10:19 PM, Christoph Lameter <cl(a)linux-foundation.org> wrote: > On Tue, 20 Jul 2010, Pekka Enberg wrote: > >> It's slab corruption that can be cause by many things. Can you please >> try to reproduce with CONFIG_SLUB_DEBUG_ON=y? > > Or simply reboot and add a parameter slub_debug to the other parameters. I finally had the opportunity to reboot this system again. CONFIG_SLUB_DEBUG=y was set, so I tried adding slub_debug to the commandline. With slub_debug added the system boots normal, I could not see any errors in the syslog. When I remove slub_debug it crashed againb before reaching userspace. After the KMS fixes from Alex Deucher vanilla kernel 2.6.35-rc6 works for me. So I would thing my problems with earlier 2.6.35-rcs where just these KMS errors and this kmalloc problem has already been fixed in mainline. So I have switched this system to 2.6.35-rc6 and will stay with this kernel. Thanks, Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |