Prev: [Bug #15615] NULL pointer deref in task_is_waking
Next: [Bug #15603] lockdep warning at boot time when determining whether to resume
From: Linus Torvalds on 9 Apr 2010 20:40 On Sat, 10 Apr 2010, Johannes Weiner wrote: > > That leaves the chance that my code was correct and we leave a conceptual > error around somewhere that can materialize again. Absolutely. I really don't know whether your merge routine works or not. I'd just rather not have to even _try_ to understand it. I have a fairly simple rule for most of the code I see: if I have a hard time understanding why it should work, I don't really want to rely on it. > But I am at a point where simplification never sounded more blissful, so > yeah, I like it :) Exactly. This is the "let's limit things a bit to keep them much simpler. > Let's hope it fixes Boris's issue. I'm going to just guess that it won't, and that Boris' issue was actually due to something else entirely, and we've all been staring at totally the wrong code. But we can hope. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on 10 Apr 2010 10:50 On 04/10/2010 07:26 AM, Borislav Petkov wrote: > This time we got stuck on the anon_vma->lock (yep, we've seen that > oopsie before). So, it might be that we _really_ are staring at the > wrong code... Back to square one. This is a different bug, though. If the null pointer dereference is gone, Linus's patch fixed that bug and we can move forward to fixing the anon_vma->lock bug. I'll start auditing the code to see if we forget to unlock the anon_vma in some unlikely error path... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 10 Apr 2010 11:30 On Sat, 10 Apr 2010, Borislav Petkov wrote: > > > > I will keep running that kernel in the next couple of days and keep you > > informed in case this is the fix we're gonna use. > > Yep, you jinxed it :) > > This time we got stuck on the anon_vma->lock (yep, we've seen that > oopsie before). So, it might be that we _really_ are staring at the > wrong code... Back to square one. No, I think we're good. I suspect this is a different issue. Do you have lockdep enabled, along with mutex and spinlock debugging etc? That might help pinpoint what triggers this. But I think the fact that you are apparently not able to get the list corruption is a good sign. Of course, it might just be harder to trigger, and these things could all be a sign of a different bug, but my gut feel is that we did fix something, and you are just damn good at stressing the new code. Kudos. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 10 Apr 2010 13:20 On Sat, 10 Apr 2010, Borislav Petkov wrote: > > And I got an oops again, this time the #GP from couple of days ago. Oh damn. So the list corruption really does happen still. And the pattern is similar, but not the same: now it's 0032323200323232, rather than 002e2e2e002e2e2e. Very intriguing. 0x32 instead of 0x2e, but the same pattern of duplicated bytes. And not very helpful in that it still doesn't actually make any sense. > <thinking out loud> > > I'm starting to think that maybe there could be something wrong with the > machine I'm running it on. Especially since there are only two people > who reported this issue, Steinar and me, so how probable is it that > maybe those two machines have failing RAM module somewhere? Or some > other data corrupting thing? Although I should be getting mchecks... > Hmm... No. Just the fact that there are two people who reported the same thing is already a pretty strong sign that it's real. Also, hardware problems don't tend to be as consistent in the details as yours have been. And in fact I have seen it personally (but couldn't reproduce it) on the kids mac mini after you reported it. So I'm convinced the problem is real, and just not so easily triggered, and you're being a great tester. Linus -- Here's the one I've seen, in case you care. I haven't posted it, because it doesn't really add anything new. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c02850cf>] page_referenced+0xd6/0x199 *pde = 21d73067 *pte = 00000000 Oops: 0000 [#2] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/block/sda/uevent Modules linked in: [last unloaded: scsi_wait_scan] Pid: 14440, comm: firefox Tainted: G D 2.6.34-rc2-00391-gfc1203c #3 Mac-F4208EC8/Macmini1,1 EIP: 0060:[<c02850cf>] EFLAGS: 00210287 CPU: 1 EIP is at page_referenced+0xd6/0x199 EAX: f59e65d4 EBX: c10b5480 ECX: 00000000 EDX: fffffff0 ESI: f59e65d0 EDI: 00000000 EBP: d8f77cd8 ESP: d8f77ca0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process firefox (pid: 14440, ti=d8f76000 task=cb795440 task.ti=d8f76000) Stack: f59e65d4 00000000 fffffff0 c15ba000 d8f77cbc c02885b8 c07972c4 d8f77cdc c0276712 00000000 00000001 c10b5498 c10b5480 d8f77e94 d8f77d58 c0276b53 d8f77d48 00000000 00000000 00000000 0000001d d8f77de8 00000001 c07972c4 Call Trace: [<c02885b8>] ? swapcache_free+0x1b/0x24 [<c0276712>] ? __remove_mapping+0x90/0xb2 [<c0276b53>] ? shrink_page_list+0x109/0x3ba [<c0277099>] ? shrink_inactive_list+0x295/0x48e [<c0273d68>] ? determine_dirtyable_memory+0x34/0x4b [<c0273dd0>] ? get_dirty_limits+0x16/0x26d [<c027750c>] ? shrink_zone+0x27a/0x327 [<c03c55a5>] ? i915_gem_shrink+0x67/0x22c [<c0277e6d>] ? do_try_to_free_pages+0x17d/0x292 [<c0278078>] ? try_to_free_pages+0x6a/0x72 [<c0275cd7>] ? isolate_pages_global+0x0/0x1bd [<c0273210>] ? __alloc_pages_nodemask+0x2c2/0x447 [<c027f1c1>] ? handle_mm_fault+0x188/0x605 [<c02192c3>] ? do_page_fault+0x253/0x269 [<c0219070>] ? do_page_fault+0x0/0x269 [<c05b9e82>] ? error_code+0x66/0x6c [<c05b0000>] ? azx_probe+0x5e8/0x8ae [<c0219070>] ? do_page_fault+0x0/0x269 Code: f9 f2 74 18 ff 75 08 8d 45 f0 50 89 d8 e8 62 f6 ff ff 01 c7 59 83 7d f0 00 58 74 20 8b 55 d0 8b 42 10 83 e8 10 89 45 d0 8b 55 d0 <8b> 42 10 0f 18 00 90 89 d0 83 c0 10 39 45 c8 75 ab fe 06 e9 90 EIP: [<c02850cf>] page_referenced+0xd6/0x199 SS:ESP 0068:d8f77ca0 CR2: 0000000000000000 ---[ end trace 890710798f4c0070 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 10 Apr 2010 14:30
On Sat, 10 Apr 2010, Linus Torvalds wrote: > On Sat, 10 Apr 2010, Borislav Petkov wrote: > > > > And I got an oops again, this time the #GP from couple of days ago. > > Oh damn. So the list corruption really does happen still. Ho humm. Maybe I'm crazy, but something started bothering me. And I started wondering: when is the 'page->mapping' of an anonymous page actually cleared? The thing is, the mapping of an anonymous page is actually cleared only when the page is _freed_, in "free_hot_cold_page()". Now, let's think about that. And in particular, let's think about how that relates to the freeing of the 'anon_vma' that the page->mapping points to. The way the anon_vma is freed is when the mapping is torn down, and we do roughly: tlb = tlb_gather_mmu(mm,..) .. unmap_vmas(&tlb, vma .. .. free_pgtables() .. tlb_finish_mmu(tlb, start, end); and we actually unmap all the pages in "unmap_vmas()", and then _after_ unmapping all the pages we do the "unlink_anon_vmas(vma);" in "free_pgtables()". Fine so far - the anon_vma stay around until after the page has been happily unmapped. But "unmapped all the pages" is _not_ actually the same as "free'd all the pages". The actual _freeing_ of the page happens generally in tlb_finish_mmu(), because we can free the page only after we've flushed any TLB entries. So what we have in that tlb_gather structure is a list of _pending_ pages to be freed, while we already actually free'd the anon_vmas earlier! Now, the thing is, tlb_gather_mmu() begins a preempt-safe region (because we use a per-cpu variable), but as far as I can tell it is _not_ an RCU-safe region. So I think we might actually get a real RCU freeing event while this all happens. So now the 'anon_vma' that 'page->mapping' points to has not just been released back to the SLUB caches, the page itself might have been released too. I dunno. Does the above sound at all sane? Or am I just raving? Something hacky like the above might fix it if I'm not just raving. I really might be missing something here. Linus --- include/asm-generic/tlb.h | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index e43f976..2678118 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -14,6 +14,7 @@ #define _ASM_GENERIC__TLB_H #include <linux/swap.h> +#include <linux/rcupdate.h> #include <asm/pgalloc.h> #include <asm/tlbflush.h> @@ -62,6 +63,7 @@ tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush) tlb->fullmm = full_mm_flush; + rcu_read_lock(); return tlb; } @@ -90,6 +92,7 @@ tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end) /* keep the page table cache within bounds */ check_pgt_cache(); + rcu_read_unlock(); put_cpu_var(mmu_gathers); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |