Prev: cpuidle: make cpuidle_curr_driver static
Next: : mutex: Fix !CONFIG_MUTEX_SPIN_ON_OWNER compile warning
From: Ingo Molnar on 27 May 2010 15:10 Dave, FYI, this boot crash in arp_error_report() started triggering in -tip testing: [ 113.285384] BUG: unable to handle kernel paging request at 6b6b6b87 [ 113.285384] IP: [<c14c8237>] arp_error_report+0x1d/0x32 [ 113.285384] *pdpt = 00000000340fd001 *pde = 0000000000000000 [ 113.285384] Oops: 0000 [#1] SMP [ 113.285384] last sysfs file: /sys/class/net/eth0/address [ 113.285384] Modules linked in: [ 113.285384] [ 113.285384] Pid: 0, comm: swapper Not tainted 2.6.34-08129-gcc106eb-dirty #257 A8N-E/System Product Name [ 113.285384] EIP: 0060:[<c14c8237>] EFLAGS: 00010202 CPU: 0 [ 113.285384] EIP is at arp_error_report+0x1d/0x32 [ 113.285384] EAX: 6b6b6b6b EBX: f4c820b0 ECX: 00000000 EDX: c1867a24 [ 113.285384] ESI: f40ff074 EDI: f4c820b0 EBP: c2401f34 ESP: c2401f30 [ 113.285384] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 113.285384] Process swapper (pid: 0, ti=c2401000 task=c185e3c0 task.ti=c1853000) [ 113.285384] Stack: [ 113.285384] f40ff000 c2401f4c c147bca4 f40ff024 f40ff000 00000000 ffffb6db c2401f64 [ 113.285384] <0> c147cfde f40ff024 00000100 f40ff0ec c2401f8c c2401f98 c10447ab f40ff000 [ 113.285384] <0> c147ce8c f40ff0a4 c22104c4 00000000 c1804b08 00000000 00000000 f40ff0a4 [ 113.285384] Call Trace: [ 113.285384] [<c147bca4>] ? neigh_invalidate+0x6d/0x84 [ 113.285384] [<c147cfde>] ? neigh_timer_handler+0x152/0x1f9 [ 113.285384] [<c10447ab>] ? call_timer_fn+0x6f/0xed [ 113.285384] [<c147ce8c>] ? neigh_timer_handler+0x0/0x1f9 [ 113.285384] [<c1044962>] ? run_timer_softirq+0x139/0x16f [ 113.285384] [<c147ce8c>] ? neigh_timer_handler+0x0/0x1f9 [ 113.285384] [<c103ed4f>] ? __do_softirq+0xc9/0x18d [ 113.285384] [<c103ec86>] ? __do_softirq+0x0/0x18d [ 113.285384] <IRQ> [ 113.285384] [<c103e9ab>] ? irq_exit+0x3a/0x6d [ 113.285384] [<c1016895>] ? smp_apic_timer_interrupt+0x6c/0x7a [ 113.285384] [<c15664d6>] ? apic_timer_interrupt+0x36/0x40 [ 113.285384] [<c1009740>] ? poll_idle+0x31/0x5d [ 113.285384] [<c100255e>] ? cpu_idle+0xab/0xc5 [ 113.285384] [<c1529925>] ? rest_init+0xa1/0xa6 [ 113.285384] [<c18e6a2d>] ? start_kernel+0x3ed/0x3f2 [ 113.285384] [<c18e60ca>] ? i386_start_kernel+0xca/0xd1 [ 113.285384] Code: 0c 89 43 70 31 c0 8d 65 f4 5b 5e 5f 5d c3 55 89 e5 53 0f 1f 44 00 00 89 d3 89 d0 e8 fa fb ff ff 85 c0 74 12 8b 40 40 85 c0 74 0b <8b> 50 1c 85 d2 74 04 89 d8 ff d2 89 d8 e8 9b 50 fa ff 5b 5d c3 [ 113.285384] EIP: [<c14c8237>] arp_error_report+0x1d/0x32 SS:ESP 0068:c2401f30 [ 113.285384] CR2: 000000006b6b6b87 [ 113.493016] ---[ end trace e94352dcdbc4a293 ]--- Config and full crashlog attached. Thanks, Ingo
From: Linus Torvalds on 27 May 2010 15:40 On Thu, 27 May 2010, Ingo Molnar wrote: > > FYI, this boot crash in arp_error_report() started triggering in -tip testing: > > [ 113.285384] BUG: unable to handle kernel paging request at 6b6b6b87 That's the POISON_FREE signature, with an offset of 28 (0x1c). And it looks like the whole function got captured in the Code: sequence. It looks like this: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 53 push %ebx 4: 0f 1f 44 00 00 nopl 0x0(%eax,%eax,1) 9: 89 d3 mov %edx,%ebx b: 89 d0 mov %edx,%eax d: e8 fa fb ff ff call 0xfffffc0c # skb_dst() 12: 85 c0 test %eax,%eax # dst 14: 74 12 je 0x28 16: 8b 40 40 mov 0x40(%eax),%eax # dst->ops 19: 85 c0 test %eax,%eax 1b: 74 0b je 0x28 1d:* 8b 50 1c mov 0x1c(%eax),%edx <-- trapping instruction 20: 85 d2 test %edx,%edx 22: 74 04 je 0x28 24: 89 d8 mov %ebx,%eax 26: ff d2 call *%edx # dst->ops->link_failure() 28: 89 d8 mov %ebx,%eax 2a: e8 9b 50 fa ff call 0xfffa50ca # skb_free() 2f: 5b pop %ebx 30: 5d pop %ebp 31: c3 ret Where most of it is "dst_link_failure()" being inlined (that last "callq" is the call to kfree_skb(). Looks like 'dst' points to free'd memory, so when we load a pointer from it (the dst->ops) field, we get 0x6b6b6b6b, and then when we try to load dst->ops->link_failure it oopses. tl;dr: that struct dst_entry *dst = skb_dst(skb); in dst_link_failure seems to result in a stale skb. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Eric Dumazet on 27 May 2010 15:50 Le jeudi 27 mai 2010 à 12:27 -0700, Linus Torvalds a écrit : > > On Thu, 27 May 2010, Ingo Molnar wrote: > > > > FYI, this boot crash in arp_error_report() started triggering in -tip testing: > > > > [ 113.285384] BUG: unable to handle kernel paging request at 6b6b6b87 > > That's the POISON_FREE signature, with an offset of 28 (0x1c). > > And it looks like the whole function got captured in the Code: sequence. > It looks like this: > > 0: 55 push %ebp > 1: 89 e5 mov %esp,%ebp > 3: 53 push %ebx > 4: 0f 1f 44 00 00 nopl 0x0(%eax,%eax,1) > 9: 89 d3 mov %edx,%ebx > b: 89 d0 mov %edx,%eax > d: e8 fa fb ff ff call 0xfffffc0c # skb_dst() > 12: 85 c0 test %eax,%eax # dst > 14: 74 12 je 0x28 > 16: 8b 40 40 mov 0x40(%eax),%eax # dst->ops > 19: 85 c0 test %eax,%eax > 1b: 74 0b je 0x28 > 1d:* 8b 50 1c mov 0x1c(%eax),%edx <-- trapping instruction > 20: 85 d2 test %edx,%edx > 22: 74 04 je 0x28 > 24: 89 d8 mov %ebx,%eax > 26: ff d2 call *%edx # dst->ops->link_failure() > 28: 89 d8 mov %ebx,%eax > 2a: e8 9b 50 fa ff call 0xfffa50ca # skb_free() > 2f: 5b pop %ebx > 30: 5d pop %ebp > 31: c3 ret > > Where most of it is "dst_link_failure()" being inlined (that last "callq" > is the call to kfree_skb(). > > Looks like 'dst' points to free'd memory, so when we load a pointer from > it (the dst->ops) field, we get 0x6b6b6b6b, and then when we try to load > dst->ops->link_failure it oopses. > > tl;dr: that > > struct dst_entry *dst = skb_dst(skb); > > in dst_link_failure seems to result in a stale skb. > > Linus > -- I am looking at this bug report, as I am probably at fault, please give me one or two hour ;) Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Eric Dumazet on 27 May 2010 16:20
Le jeudi 27 mai 2010 à 21:47 +0200, Eric Dumazet a écrit : > I am looking at this bug report, as I am probably at fault, please give > me one or two hour ;) I believe problem comes from commit 7fee226ad2 (net: add a noref bit on skb dst) We probably should add a WARN in __skb_queue_tail() and similar enqueue functions to catch other problems. I'll post a followup. Thanks ! [PATCH] net: fix __neigh_event_send() commit 7fee226ad23 (net: add a noref bit on skb dst) missed one spot where an skb is enqueued, with a possibly not refcounted dst entry. __neigh_event_send() inserts skb into arp_queue, so we must make sure dst entry is refcounted, or dst entry can be freed by garbage collector after caller exits from rcu protected section. Reported-by: Ingo Molnar <mingo(a)elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet(a)gmail.com> --- net/core/neighbour.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index bff3790..6ba1c0e 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -934,6 +934,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) kfree_skb(buff); NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); } + skb_dst_force(skb); __skb_queue_tail(&neigh->arp_queue, skb); } rc = 1; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |