Prev: Pending patches for 802.11 not marked stable or which requires a manual backport
Next: Your mailbox has exceeded one or more size limits
From: Borislav Petkov on 2 Apr 2010 14:10 Hi, I've got the following oopsie two times now when hibernating - this means, I don't get it everytime I hibernate but only sometimes, say once in a blue moon. And yeah, I couldn't catch it over serial console so I had to make ugly pictures. By the way, the numbers in the filenames increment as I scroll down the whole oops (yep, it hadn't completely frozen and I still could do Shift->PgUp or Shift->PgDn on the console): http://www.kernel.org/pub/linux/kernel/people/bp/ So, here's what I could decipher from the oopsie, someone else who's more knowledgeable in mm, rmap and anon_vma's list traversal should be able to tell what goes wrong there. EIP is at page_referenced+0xee which is <disasm> 10c4: 41 01 c4 add %eax,%r12d 10c7: 83 7d cc 00 cmpl $0x0,-0x34(%rbp) 10cb: 74 19 je 10e6 <page_referenced+0xff> 10cd: 4d 8b 6d 20 mov 0x20(%r13),%r13 10d1: 49 83 ed 20 sub $0x20,%r13 10d5: 49 8b 45 20 mov 0x20(%r13),%rax <-------------- 10d9: 0f 18 08 prefetcht0 (%rax) 10dc: 49 8d 45 20 lea 0x20(%r13),%rax 10e0: 48 39 45 80 cmp %rax,-0x80(%rbp) </disasm> Corresponding asm: <asm> .loc 1 496 0 movq 32(%r13), %r13 # <variable>.same_anon_vma.next, __mptr.451 ..LVL295: subq $32, %r13 #, avc ..LVL296: ..L184: ..LBE1278: movq 32(%r13), %rax # <variable>.same_anon_vma.next, <variable>.same_anon_vma.next <---------------- prefetcht0 (%rax) # <variable>.same_anon_vma.next leaq 32(%r13), %rax #, tmp97 cmpq %rax, -128(%rbp) # tmp97, %sfp jne .L187 #, ..L186: .loc 1 514 0 movq %r14, %rdi # anon_vma, call page_unlock_anon_vma # </asm> and the NULL pointer in question is being written into %r13 and then 32 is subtracted from it (I'm guessing container_of()). This is consistent with the register snapshot - %r13 contains 0xffffffffffffffe0 which is -32 and with the code dump in the oops, in CIMG1640.JPG code points to opcode 49 8b 45 20. Which is the following piece of code in <mm/rmap.c:page_referenced_anon()>. <source> mapcount = page_mapcount(page); list_for_each_entry(avc, &anon_vma->head, same_anon_vma) { struct vm_area_struct *vma = avc->vma; unsigned long address = vma_address(page, vma); if (address == -EFAULT) continue; </source> which tells us that same_anon_vma.next is NULL. Hmm... -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on 2 Apr 2010 18:10 On 04/02/2010 02:37 PM, Linus Torvalds wrote: > On Fri, 2 Apr 2010, Andrew Morton wrote: >> On Fri, 2 Apr 2010 11:09:14 -0700 (PDT) Linus Torvalds<torvalds(a)linux-foundation.org> wrote: >> >>> >>> I think this is likely due to the new scalable anon_vma linking by Rik. >> >> Similar to https://bugzilla.kernel.org/show_bug.cgi?id=15680 > > Yup, looks like the same thing, except that bugzilla entry was due to > swapping rather than hibernation and memory shrinking. But same end > result, just different reasons for why we were trying to shrink the page > lists. Interesting that it is a null pointer dereference, given that we do not zero out the anon_vma_chain structs before freeing them. Page_referenced_anon() takes the anon_vma->lock before walking the list. The three places where we modify the anon_vma_chain->same_anon_vma list, we also hold the lock. No doubt something in mm/ is doing something silly, but I have not found anything yet :( If I had to guess, I'd say maybe we got one of the mprotect & vma_adjust cases wrong. Maybe a page stayed around in the LRU (and in a process?) after its anon_vma already got freed? There has to be a reason why a very heavy AIM7 workload and some other stress tests did not trigger it, but a few people are able to trigger it on their systems... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on 4 Apr 2010 13:30 On 04/04/2010 12:12 PM, Minchan Kim wrote: > While I review the code again due to this BUG, I found some strange > thing. > > In anon_vma_fork, if anon_vma_clone is successful but anon_vma_alloc is > failed, what happens? Parent VMA's anon_vmas have anon_vma_chain which > has vma which is destroyed. > I couldn't find any clean routine to remove this garbage. > I am missing something? Good catch. The parent VMA's anon_vmas will get delinked eventually, but we need to get rid of the newly allocated child anon_vmas. You found a hopefully rare memory leak... We need a call to unlink_anon_vmas(vma) at the error label to do that. > But I think it isn't related to this bug because oops point is not > vma_address but anon_vma_chain.next. Agreed, it's probably not it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on 6 Apr 2010 05:00 > > I think this is likely due to the new scalable anon_vma linking by Rik. > Nothing else I can imagine should have introduced anything like it. > > Rik: the picures have the information, but you need to look at several to > see both the oops and the backtrace. Here's a condensed version: > > shrink_all_memory -> > do_try_to_free_pages -> > shrink_zone -> > shrink_inactive_list -> > shrink_page_list -> > page_referenced > > where page_referenced() oopses due page_referenced_anon() as per > Borislav's description below. > > Added all the usual suspects to the Cc list. Left the full report appended > so that the new people don't have to search for it on lkml. Today, I've reviewed this patch carefully. but I haven't found any bug. 1) anon_vma->list is alwasys protected anon_vma->lock. 2) If anyone forget to take lock, list_add() and/or list_del() never assign to NULL. then, NULL mean either three possibility. a) we see uninitialized data b) we see after freed data c) we see memory corruption by another bug but (a) can't happen because static inline void __list_add() { next->prev = new; new->next = next; new->prev = prev; prev->next = new; (*) } If uninitialized var is linked to avc list, new->next was already !NULL. (b) is also impossible. SLAB_DESTROY_BY_RCU delay the page for anon_vma freeing until next rcu period. It mean rcu_read_lock()+page_mapped() can see kfree()ed page. but it is safe. noone corrupt it. now I doubt (c) ;-) Also, I've runned stress workload with shrink_all_memory() today. but I couldn't reproduce the issue. hmm.. (perhaps I'm no lucky guy. I'm frequently fail to reproduce) I'll continue to work. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on 6 Apr 2010 06:10
> (b) is also impossible. SLAB_DESTROY_BY_RCU delay the page for anon_vma > freeing until next rcu period. It mean rcu_read_lock()+page_mapped() > can see kfree()ed page. but it is safe. noone corrupt it. by the way: I haven't understand why rik's per process anon_vma concept works correctly with ksm. ksm increase anon_vma->ksm_refcount. but it seems not guranteed vma->anon_vma and page->anon_vma are the same. but I guess bug reporter doesn't use ksm, it's minor feature. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |