Prev: memcontrol: fix potential null deref
Next: [BUGFIX][PATCH] memcg: avoid use cmpxchg in swap cgroup maintainance (Was Re: 34-rc1-git3 build failure with CGROUP_MEM_RES_CTLR_SWAP=y
From: Minchan Kim on 14 Mar 2010 20:30 Hi, Mel. On Sat, Mar 13, 2010 at 1:41 AM, Mel Gorman <mel(a)csn.ul.ie> wrote: > rmap_walk_anon() was triggering errors in memory compaction that looks like > use-after-free errors in anon_vma. The problem appears to be that between > the page being isolated from the LRU and rcu_read_lock() being taken, the > mapcount of the page dropped to 0 and the anon_vma was freed. This patch > skips the migration of anon pages that are not mapped by anyone. > > Signed-off-by: Mel Gorman <mel(a)csn.ul.ie> > Acked-by: Rik van Riel <riel(a)redhat.com> > --- > mm/migrate.c | 10 ++++++++++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index 98eaaf2..3c491e3 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -602,6 +602,16 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private, > * just care Anon page here. > */ > if (PageAnon(page)) { > + /* > + * If the page has no mappings any more, just bail. An > + * unmapped anon page is likely to be freed soon but worse, > + * it's possible its anon_vma disappeared between when > + * the page was isolated and when we reached here while > + * the RCU lock was not held > + */ > + if (!page_mapcount(page)) As looking code about mapcount of page, I got confused. I think mapcount of page is protected by pte lock. But I can't find pte lock in unmap_and_move. If I am right, what protects race between this condition check and rcu_read_lock? This patch makes race window very small but It can't remove race totally. I think I am missing something. Pz, point me out. :) > + goto uncharge; > + > rcu_read_lock(); > rcu_locked = 1; > anon_vma = page_anon_vma(page); > -- > 1.6.5 > -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on 15 Mar 2010 02:30 On Mon, Mar 15, 2010 at 2:34 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote: > On Mon, 15 Mar 2010 09:28:08 +0900 > Minchan Kim <minchan.kim(a)gmail.com> wrote: > >> Hi, Mel. >> On Sat, Mar 13, 2010 at 1:41 AM, Mel Gorman <mel(a)csn.ul.ie> wrote: >> > rmap_walk_anon() was triggering errors in memory compaction that looks like >> > use-after-free errors in anon_vma. The problem appears to be that between >> > the page being isolated from the LRU and rcu_read_lock() being taken, the >> > mapcount of the page dropped to 0 and the anon_vma was freed. This patch >> > skips the migration of anon pages that are not mapped by anyone. >> > >> > Signed-off-by: Mel Gorman <mel(a)csn.ul.ie> >> > Acked-by: Rik van Riel <riel(a)redhat.com> >> > --- >> > mm/migrate.c | 10 ++++++++++ >> > 1 files changed, 10 insertions(+), 0 deletions(-) >> > >> > diff --git a/mm/migrate.c b/mm/migrate.c >> > index 98eaaf2..3c491e3 100644 >> > --- a/mm/migrate.c >> > +++ b/mm/migrate.c >> > @@ -602,6 +602,16 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private, >> > * just care Anon page here. >> > */ >> > if (PageAnon(page)) { >> > + /* >> > + * If the page has no mappings any more, just bail. An >> > + * unmapped anon page is likely to be freed soon but worse, >> > + * it's possible its anon_vma disappeared between when >> > + * the page was isolated and when we reached here while >> > + * the RCU lock was not held >> > + */ >> > + if (!page_mapcount(page)) >> >> As looking code about mapcount of page, I got confused. >> I think mapcount of page is protected by pte lock. >> But I can't find pte lock in unmap_and_move. > There is no pte_lock. > >> If I am right, what protects race between this condition check and >> rcu_read_lock? >> This patch makes race window very small but It can't remove race totally. >> >> I think I am missing something. >> Pz, point me out. :) >> > > Hmm. This is my understanding of old story. > > At migration. > 1. we increase page_count(). > 2. isolate it from LRU. > 3. call try_to_unmap() under rcu_read_lock(). Then, > 4. replace pte with swp_entry_t made by PFN. under pte_lock. > 5. do migarate > 6. remap new pages. under pte_lock()> > 7. release rcu_read_lock(). > > Here, we don't care whether page->mapping holds valid anon_vma or not. > > Assume a racy threads which calls zap_pte_range() (or some other) > > a) When the thread finds valid pte under pte_lock and successfully call > page_remove_rmap(). > In this case, migration thread finds try_to_unmap doesn't unmap any pte. > Then, at 6, remap pte will not work. > b) When the thread finds migrateion PTE(as swap entry) in zap_page_range(). > In this case, migration doesn't find migrateion PTE and remap fails. > > Why rcu_read_lock() is necessary.. > - When page_mapcount() goes to 0, we shouldn't trust page->mapping is valid. > - Possible cases are > i) anon_vma (= page->mapping) is freed and used for other object. > ii) anon_vma (= page->mapping) is freed > iii) anon_vma (= page->mapping) is freed and used as anon_vma again. > > Here, anon_vma_cachep is created by SLAB_DESTROY_BY_RCU. Then, possible cases > are only ii) and iii). While anon_vma is anon_vma, try_to_unmap and remap_page > can work well because of the list of vmas and address check. IOW, remap routine > just do nothing if anon_vma is freed. > > I'm not sure by what logic "use-after-free anon_vma" is caught. But yes, > there will be case, "anon_vma is touched after freed.", I think. > > Thanks, > -Kame > Thanks for detail explanation, Kame. But it can't understand me enough, Sorry. Mel said he met "use-after-free errors in anon_vma". So added the check in unmap_and_move. if (PageAnon(page)) { .... if (!page_mapcount(page)) goto uncharge; rcu_read_lock(); My concern what protects racy mapcount of the page? For example, CPU A CPU B unmap_and_move page_mapcount check pass zap_pte_range <-- some stall --> pte_lock <-- some stall --> page_remove_rmap(map_count is zero!) <-- some stall --> pte_unlock <-- some stall --> anon_vma_unlink <-- some stall --> anon_vma free !!!! rcu_read_lock anon_vma has gone!! I think above scenario make error "use-after-free", again. What prevent above scenario? -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on 15 Mar 2010 03:20 On Mon, Mar 15, 2010 at 3:44 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> >> Thanks for detail explanation, Kame. >> But it can't understand me enough, Sorry. >> >> Mel said he met "use-after-free errors in anon_vma". >> So added the check in unmap_and_move. >> >> if (PageAnon(page)) { >> .... >> if (!page_mapcount(page)) >> goto uncharge; >> rcu_read_lock(); >> >> My concern what protects racy mapcount of the page? >> For example, >> >> CPU A CPU B >> unmap_and_move >> page_mapcount check pass zap_pte_range >> <-- some stall --> pte_lock >> <-- some stall --> page_remove_rmap(map_count is zero!) >> <-- some stall --> pte_unlock >> <-- some stall --> anon_vma_unlink >> <-- some stall --> anon_vma free !!!! >> rcu_read_lock >> anon_vma has gone!! >> >> I think above scenario make error "use-after-free", again. >> What prevent above scenario? >> > I think this patch is not complete. > I guess this patch in [1/11] is trigger for the race. > == > + > + /* Drop an anon_vma reference if we took one */ > + if (anon_vma && atomic_dec_and_lock(&anon_vma->migrate_refcount, &anon_vma->lock)) { > + int empty = list_empty(&anon_vma->head); > + spin_unlock(&anon_vma->lock); > + if (empty) > + anon_vma_free(anon_vma); > + } > == > If my understainding in above is correct, this "modify" freed anon_vma. > Then, use-after-free happens. (In old implementation, there are no refcnt, > so, there is no use-after-free ops.) > I agree. Let's wait Mel's response. > > So, what I can think of now is a patch like following is necessary. > > == > static inline struct anon_vma *anon_vma_alloc(void) > { > struct anon_vma *anon_vma; > anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL); > atomic_set(&anon_vma->refcnt, 1); > } > > void anon_vma_free(struct anon_vma *anon_vma) > { > /* > * This called when anon_vma is.. > * - anon_vma->vma_list becomes empty. > * - incremetned refcnt while migration, ksm etc.. is dropped. > * - allocated but unused. > */ > if (atomic_dec_and_test(&anon_vma->refcnt)) > kmem_cache_free(anon_vma_cachep, anon_vma); > } > == > Then all things will go simple. > Overhead is concern but list_empty() helps us much. When they made things complicated without atomic_op, there was reasonable reason, I think. :) My opinion depends on you and server guys(Hugh, Rik, Andrea Arcangeli and so on) > > Thanks, > -Kame > > > > > -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on 15 Mar 2010 09:50 On Mon, Mar 15, 2010 at 4:09 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote: > On Mon, 15 Mar 2010 15:44:59 +0900 > KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote: > >> On Mon, 15 Mar 2010 15:28:15 +0900 >> Minchan Kim <minchan.kim(a)gmail.com> wrote: >> >> > On Mon, Mar 15, 2010 at 2:34 PM, KAMEZAWA Hiroyuki >> > <kamezawa.hiroyu(a)jp.fujitsu.com> wrote: >> > > On Mon, 15 Mar 2010 09:28:08 +0900 >> > > Minchan Kim <minchan.kim(a)gmail.com> wrote: > >> > I think above scenario make error "use-after-free", again. >> > What prevent above scenario? >> > >> I think this patch is not complete. >> I guess this patch in [1/11] is trigger for the race. >> == >> + >> + /* Drop an anon_vma reference if we took one */ >> + if (anon_vma && atomic_dec_and_lock(&anon_vma->migrate_refcount, &anon_vma->lock)) { >> + int empty = list_empty(&anon_vma->head); >> + spin_unlock(&anon_vma->lock); >> + if (empty) >> + anon_vma_free(anon_vma); >> + } >> == >> If my understainding in above is correct, this "modify" freed anon_vma. >> Then, use-after-free happens. (In old implementation, there are no refcnt, >> so, there is no use-after-free ops.) >> > Sorry, about above, my understanding was wrong. anon_vma->lock is modifed even > in old code. Sorry for noise. Nope. Such your kindness always helps and cheer up others people. In addition, give others good time to consider seriously something. Thanks, Kame. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on 15 Mar 2010 10:40
On Mon, Mar 15, 2010 at 11:21 PM, Mel Gorman <mel(a)csn.ul.ie> wrote: > On Mon, Mar 15, 2010 at 09:48:49PM +0900, Minchan Kim wrote: >> On Mon, 2010-03-15 at 11:28 +0000, Mel Gorman wrote: >> > The use after free looks like >> > >> > 1. page_mapcount(page) was zero so anon_vma was no longer reliable >> > 2. rcu lock taken but the anon_vma at this point can already be garbage because the >> > process exited >> > 3. call try_to_unmap, looks up tha anon_vma and locks it. This causes problems >> > >> > I thought the race would be closed but there is still a very tiny window there all >> > right. The following alternative should close it. What do you think? >> > >> > if (PageAnon(page)) { >> > rcu_read_lock(); >> > >> > /* >> > * If the page has no mappings any more, just bail. An >> > * unmapped anon page is likely to be freed soon but worse, >> > * it's possible its anon_vma disappeared between when >> > * the page was isolated and when we reached here while >> > * the RCU lock was not held >> > */ >> > if (!page_mapcount(page)) { >> > rcu_read_unlock(); >> > goto uncharge; >> > } >> > >> > rcu_locked = 1; >> > anon_vma = page_anon_vma(page); >> > atomic_inc(&anon_vma->external_refcount); >> > } >> > >> > The rcu_unlock label is not used here because the reference counts were not taken in >> > the case where page_mapcount == 0. >> > >> >> Please, repost above code with your use-after-free scenario comment. >> > > This will be the replacement patch so. > > ==== CUT HERE ==== > mm,migration: Do not try to migrate unmapped anonymous pages > > rmap_walk_anon() was triggering errors in memory compaction that look like > use-after-free errors. The problem is that between the page being isolated > from the LRU and rcu_read_lock() being taken, the mapcount of the page > dropped to 0 and the anon_vma gets freed. This can happen during memory > compaction if pages being migrated belong to a process that exits before > migration completes. Hence, the use-after-free race looks like > > 1. Page isolated for migration > 2. Process exits > 3. page_mapcount(page) drops to zero so anon_vma was no longer reliable > 4. unmap_and_move() takes the rcu_lock but the anon_vma is already garbage > 4. call try_to_unmap, looks up tha anon_vma and "locks" it but the lock > is garbage. > > This patch checks the mapcount after the rcu lock is taken. If the > mapcount is zero, the anon_vma is assumed to be freed and no further > action is taken. > > Signed-off-by: Mel Gorman <mel(a)csn.ul.ie> > Acked-by: Rik van Riel <riel(a)redhat.com> Reviewed-by: Minchan Kim <minchan.kim(a)gmail.com> -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |