Prev: BUG? boot failed with "crashkernel=256M@32M", but "crashkernel=256M@64M" can work
Next: USB: testusb: imported David Brownell's USB testing application
From: Peter Zijlstra on 12 Apr 2010 16:00 On Mon, 2010-04-12 at 21:30 +0200, Peter Zijlstra wrote: > > We could here return a non-valid and already freed anon_vma. > OK, so non of the users of page_lock_anon_vma() with exception of the memory-failure.c one could really care. And all of them seem to be safe enough wrt dealing with a dead one. So unless people care, I'm going to not spend more time on trying to make page_lock_anon_vma() behave. Instead I'll try and see wth it is that migrate.c and rmap_walk_anon are doing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on 12 Apr 2010 18:40 From: Linus Torvalds <torvalds(a)linux-foundation.org> Date: Mon, Apr 12, 2010 at 03:18:20PM -0700 > Oh, btw, I like your email gateway. Only noticed now: > > mail.skyhub.de (SuperMail on ZX Spectrum 128k) > > that's a tough little machine. Yeah, and it can handle all that mail traffic just fine :) -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on 13 Apr 2010 05:40 From: Linus Torvalds <torvalds(a)linux-foundation.org> Date: Mon, Apr 12, 2010 at 03:11:53PM -0700 > Ok. That does sound very positive. Of course, last time you sounded > positive, I had an email from you half an hour later that said "oh no, it > oopsed again". So I'll take it with a bit of salt, but on the whole I'll > be optimistic about it. Ok, just finished testing -rc4 - no problems so far. Let's just go out on a limb here and say with a greater certainty that this really got fixed but be smart about it and keep an eye open if it happens again - you never know. Where is the champagne? -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 13 Apr 2010 07:40 On Tue, 2010-04-13 at 19:53 +0900, KOSAKI Motohiro wrote: > > struct anon_vma *page_lock_anon_vma(struct page *page) > > { > > @@ -294,14 +309,24 @@ struct anon_vma *page_lock_anon_vma(struct page *page) > > unsigned long anon_mapping; > > > > rcu_read_lock(); > > - anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping); > > + anon_mapping = (unsigned long)rcu_dereference(page->mapping); > > if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON) > > goto out; > > - if (!page_mapped(page)) > > - goto out; > > > > anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); > > spin_lock(&anon_vma->lock); > > Does anon->lock dereference is guranteed if page->_mapcount==-1? > It can be freed miliseconds ago, rcu_read_lock() doesn't provide such > gurantee. > > perhaps, I'm missing your point. No you're right, I got my head hopelessly twisted up trying to make page_lock_anon_vma() do something reliable, but there really isn't much that can be done. Luckily most users (with exception of the memory-failure.c one) don't really care and all take steps to verify the page is indeed in any of the vmas it might find. So I've given up on this and will only submit a patch like the below, which hopefully does still make sense... I do think there's a missing barrier in there as well, but I've made enough of a fool of myself. [ with the preemptible mmu_gather patches I introduce a refcount to the anon_vma, and then with atomic_inc_not_zero() we can add a guarantee that the returned anon_vma is alive ] --- mm/rmap.c | 18 ++++++++++++++++-- 1 files changed, 16 insertions(+), 2 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index eaa7a09..49a2533 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -285,8 +285,22 @@ void __init anon_vma_init(void) } /* - * Getting a lock on a stable anon_vma from a page off the LRU is - * tricky: page_lock_anon_vma rely on RCU to guard against the races. + * Getting a lock on a stable anon_vma from a page off the LRU is tricky! + * + * Since there is no serialization what so ever against page_remove_rmap() + * the best this function can do is return a locked anon_vma that might + * have been relevant to this page. + * + * The page might have been remapped to a different anon_vma or the anon_vma + * returned may already be freed (and even reused). + * + * All users of this function must be very careful when walking the anon_vma + * chain and verify that the page in question is indeed mapped in it + * [ something equivalent to page_mapped_in_vma() ]. + * + * Since anon_vma's slab is DESTROY_BY_RCU and we know from page_remove_rmap() + * that the anon_vma pointer from page->mapping is valid if there is a + * mapcount, we can dereference the anon_vma after observing those. */ struct anon_vma *page_lock_anon_vma(struct page *page) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 15 Apr 2010 03:40
On Tue, 2010-04-13 at 21:00 +0900, KOSAKI Motohiro wrote: > > [ with the preemptible mmu_gather patches I introduce a refcount to > > the anon_vma, and then with atomic_inc_not_zero() we can add a > > guarantee that the returned anon_vma is alive ] > > Indeed. refcount is best way. anon_vma DESTROY_BY_RCU stuff seems > overengineering, I think. this is fastest, but anon_vma allocation is not > (and was not) fork/exit bottleneck point. So, I guess most simply way is > best. Well, that refcount stuff still relies on DESTROY_BY_RCU :-) Anyway, it also looks like a lot of races are avoided by ordering the rmap_add/remove calls wrt to adding/removing the page to/from the LRU. Rmap calls come from LRU pages, and it looks like rmap state is only changed for pages that are not on the LRU. I still have to go through all that code again to make sure, but I couldn't find a race between page_add_anon_rmap() and page_lock_anon_vma() due to that. If there is, we need to look at page_mapped() before page->mapping because page_add_anon_rmap() first increments the mapcount and only then adjusts the mapping, so the existing order in page_anon_lock_vma() can end up dereferencing a long dead anon_vma. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |