From: Peter Zijlstra on 16 Apr 2010 16:30 On Fri, 2010-04-16 at 23:45 +0800, r6144 wrote: > Hello all, > > I'm having an annoying kernel bug regarding huge pages in Fedora 12: > > https://bugzilla.redhat.com/show_bug.cgi?id=552257 > > Basically I want to use huge pages in a multithreaded number crunching > program, which happens to use process-shared semaphores (because fftw > does it). The futex for the semaphore ends up lying on a huge page, and > I then get an endless loop in get_futex_key(), apparently because the > anonymous huge page containing the futex does not have a page->mapping. > A test case is provided in the above link. > > I reported the bug to Fedora bugzilla months ago, but haven't received > any feedback yet. No, it works much better if you simply mail LKML and CC people who work on the code in question ;-) > The Fedora kernel is based on 2.6.32.11, and a > cursory glance at the 2.6.34-rc3 source does not yield any relevant > change. > > So, could anyone tell me if the current mainline kernel might act better > in this respect, before I get around to compiling it? Right, so I had a quick chat with Mel, and it appears MAP_PRIVATE hugetlb pages don't have their page->mapping set. I guess something like the below might work, but I'd really rather not add hugetlb knowledge to futex.c. Does anybody else have a better idea? Maybe create something similar to an anon_vma for hugetlb pages? --- kernel/futex.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index e7a35f1..b0f1b2d 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -252,7 +252,7 @@ again: page = compound_head(page); lock_page(page); - if (!page->mapping) { + if (!page->mapping && !PageHuge(page)) { unlock_page(page); put_page(page); goto again; @@ -265,7 +265,7 @@ again: * it's a read-only handle, it's expected that futexes attach to * the object not the particular process. */ - if (PageAnon(page)) { + if (PageAnon(page) || (PageHuge(page) && !page->mapping)) { key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */ key->private.mm = mm; key->private.address = address; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 19 Apr 2010 08:00 On Mon, 2010-04-19 at 12:43 +0100, Mel Gorman wrote: > > Right, so I had a quick chat with Mel, and it appears MAP_PRIVATE > > hugetlb pages don't have their page->mapping set. > > > > I guess something like the below might work, but I'd really rather not > > add hugetlb knowledge to futex.c. Does anybody else have a better idea? > > Maybe create something similar to an anon_vma for hugetlb pages? > > > > anon_vma for hugetlb pages sounds overkill, what would it gain? In this > context, futex only appears to distinguish between whether the > references are private or shared. > > Looking at the hugetlbfs code, I can't see a place where it actually cares > about the mapping as such. It's used to find shared pages in the page cache > (but not in the LRU) that are backed by the hugetlbfs file. For hugetlbfs > though, the mapping is mostly kept in page->private for reservation accounting > purposes. > > I can't think of other parts of the VM that touch the mapping if the > page is managed by hugetlbfs so the following patch should also work but > without futex having hugetlbfs-awareness. What do you think? Maybe for > safety, it would be better to make the mapping some obvious poison bytes > or'd with PAGE_MAPPING_ANON so an oops will be more obvious? Yes, this seems perfectly adequate to me, that poison idea might be worthwhile too :-) Acked-by: Peter Zijlstra <a.p.zijlstra(a)chello.nl> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 6034dc9..57a5faa 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -546,6 +546,7 @@ static void free_huge_page(struct page *page) > > mapping = (struct address_space *) page_private(page); > set_page_private(page, 0); > + page->mapping = NULL; > BUG_ON(page_count(page)); > INIT_LIST_HEAD(&page->lru); > > @@ -2447,8 +2448,10 @@ retry: > spin_lock(&inode->i_lock); > inode->i_blocks += blocks_per_huge_page(h); > spin_unlock(&inode->i_lock); > - } else > + } else { > lock_page(page); > + page->mapping = (struct address_space *)PAGE_MAPPING_ANON; > + } > } > > /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on 19 Apr 2010 12:00 On Mon, Apr 19, 2010 at 04:32:45PM +0100, Mel Gorman wrote: > +#define HUGETLB_PRIVATE_MAPPING (0x2e4 | PAGE_MAPPING_ANON) Cute indeed. BTW, just in case I tested this on transparent hugepage and it works fine (it uses no cpu and can be killed with C^c). I had to hack it like below to allocate the semaphore on hugepages without khugepaged. I verified 1 hugepage is allocated (thanks to memory compaction there's an huge excess of hugepages compared to what my regular apps can eat ;). Furthermore I found gcc bypasses malloc so a small patch to gcc should move it all into hugepages. Then maybe we can build the kernel faster and definitely translate.o will build 8% faster with the default khugepaged scan_sleep_millisecs settings (waiting to be confirmed but I exclude bad surprises, whatever runs fast with khugepaged has will run even faster without it if something, or equal in the worst case). Thanks, Andrea --- process-shared-sem-hugepage.c.orig 2010-04-19 17:43:47.278964888 +0200 +++ process-shared-sem-hugepage.c 2010-04-19 17:44:01.100032774 +0200 @@ -30,6 +30,7 @@ int main(void) g_thread_init(NULL); workers = g_new(GThread *, NWORKER); work_sem = g_new(sem_t, 1); + posix_memalign(&work_sem, 2*1024*1024, 2*1024*1024); result = sem_init(work_sem, TRUE, 0); g_assert(result == 0); for (i = 0; i < NWORKER; i++) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 19 Apr 2010 12:10 On Mon, 2010-04-19 at 16:32 +0100, Mel Gorman wrote: > Fix infinite loop in get_futex_key when backed by huge pages > > If a futex key happens to be located within a huge page mapped MAP_PRIVATE, > get_futex_key() can go into an infinite loop waiting for a page->mapping > that will never exist. This was reported and documented in an external > bugzilla at > > https://bugzilla.redhat.com/show_bug.cgi?id=552257 > > This patch makes page->mapping a poisoned value that includes PAGE_MAPPING_ANON > mapped MAP_PRIVATE. This is enough for futex to continue but because > of PAGE_MAPPING_ANON, the poisoned value is not dereferenced or used by > futex. No other part of the VM should be dereferencing the page->mapping of > a hugetlbfs page as its page cache is not on the LRU. > > This patch fixes the problem with the test case described in the bugzilla. > > Signed-off-by: Mel Gorman <mel(a)csn.ul.ie> > --- > include/linux/poison.h | 10 ++++++++++ > mm/hugetlb.c | 6 +++++- > 2 files changed, 15 insertions(+), 1 deletions(-) > > diff --git a/include/linux/poison.h b/include/linux/poison.h > index 2110a81..0f7b5ac 100644 > --- a/include/linux/poison.h > +++ b/include/linux/poison.h > @@ -48,6 +48,16 @@ > #define POISON_FREE 0x6b /* for use-after-free poisoning */ > #define POISON_END 0xa5 /* end-byte of poisoning */ > > +/********** mm/hugetlb.c **********/ > +/* > + * Private mappings of hugetlb pages use this poisoned value for > + * page->mapping. The core VM should not be doing anything with this mapping > + * but futex requires the existance of some page->mapping value even if it > + * is unused. If the core VM does deference the mapping, it'll look like a > + * suspiciously high null-pointer offset starting from 0x2e5 > + */ > +#define HUGETLB_PRIVATE_MAPPING (0x2e4 | PAGE_MAPPING_ANON) Wouldn't a longer poison be more recognisable? Also, shouldn't this use POISON_POINTER_DELTA? Something like: #define HUGETBL_POISON ((void *) 0x00300300 + POISON_POINTER_DELTA) 0x2e5 isn't that high, I've had actual derefs in that range. > + > /********** arch/$ARCH/mm/init.c **********/ > #define POISON_FREE_INITMEM 0xcc > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 6034dc9..487e3c2 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -546,6 +546,7 @@ static void free_huge_page(struct page *page) > > mapping = (struct address_space *) page_private(page); > set_page_private(page, 0); > + page->mapping = NULL; > BUG_ON(page_count(page)); > INIT_LIST_HEAD(&page->lru); > > @@ -2447,8 +2448,11 @@ retry: > spin_lock(&inode->i_lock); > inode->i_blocks += blocks_per_huge_page(h); > spin_unlock(&inode->i_lock); > - } else > + } else { > lock_page(page); > + page->mapping = (struct address_space *) > + HUGETLB_PRIVATE_MAPPING; > + } > } > > /* > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 19 Apr 2010 12:20 On Mon, 2010-04-19 at 18:11 +0200, Andrea Arcangeli wrote: > On Mon, Apr 19, 2010 at 05:45:05PM +0200, Peter Zijlstra wrote: > > Wouldn't a longer poison be more recognisable? Also, shouldn't this use > > POISON_POINTER_DELTA? > > > > Something like: > > > > #define HUGETBL_POISON ((void *) 0x00300300 + POISON_POINTER_DELTA) > > > > 0x2e5 isn't that high, I've had actual derefs in that range. > > The default at kernel config time sets only 4k as unmapped (I think > it's a very bad default for 64bit archs), so above 4k userland can map > it and you can have actual derefs with 0x00300300 but not with Mel's > preferred <0x1000 address. So the address must be <0x1000. Well, most poison values have that problem and still we have them. Also on 64bit machines you can use POISON_POINTER_DELTA to map it outside the virtual address range. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 Prev: Add a global synchronization point for pvclock Next: Athlon L110 CPU and powernow-k8 |