Prev: NFS_FSCACHE still depends on EXPERIMENTAL?
Next: fs: fix filesystem_sync vs write race on rw=>ro remount
From: Mark Seaborn on 23 Jan 2010 19:20 I was experimenting with futexes and was a little surprised to discover that futex() works on read-only pages. This creates quite a high bandwidth side channel that allows two processes to communicate if, for example, they share a library. (Mind you, this is not much different from file locks, which also work on read-only file descriptors.) I also found a couple of differences between 2.6.24 (from Ubuntu hardy) and 2.6.31 (from Ubuntu karmic). The first is a definite bug in 2.6.31: 1) On 2.6.31 i686, using futex() on the vdso causes the process to get stuck, consuming CPU in an unkillable state. Both FUTEX_WAIT and FUTEX_WAKE cause the problem. The problem doesn't occur on 2.6.24. (BTW, I was testing to see whether futex() on the vdso allows any two processes to communicate. This appears not to be the case on 2.6.24.) A test program is below. 2) Suppose a file is mapped into two processes with MAP_PRIVATE. Can the resulting mappings be used to communicate via futex()? i.e. Does futex() consider the mappings to be the same? On 2.6.24, the futex wakeup is not transferred; pages must be mapped with MAP_SHARED for futex to work. On 2.6.31, the futex wakeup *is* transferred; futex works with either MAP_SHARED or MAP_PRIVATE. 2.6.24's behaviour seems more correct, because the mappings are logically different, even if the underlying memory pages are the same before copy-on-write is triggered. Is 2.6.31's behaviour a regression, or is the kernel's behaviour here supposed to be undefined? Cheers, Mark /* Test futex() on the vdso, which the kernel maps on process startup. */ #include <stdio.h> #include <stdlib.h> #include <elf.h> #include <linux/futex.h> #include <sys/syscall.h> #include <unistd.h> #if __WORDSIZE == 32 # define Elf(name) Elf32_##name #elif __WORDSIZE == 64 # define Elf(name) Elf64_##name #endif void *find_vdso(char **argv) { /* Find auxv. */ char **p = argv; /* Skip past argv. */ while(*p) p++; p++; /* Skip past env. */ while(*p) p++; p++; Elf(auxv_t) *auxv = (void *) p; for(; auxv->a_type; auxv++) if(auxv->a_type == AT_SYSINFO_EHDR) return (void *) auxv->a_un.a_val; fprintf(stderr, "vdso not found\n"); exit(1); } int main(int argc, char **argv) { int *vdso = find_vdso(argv); fprintf(stderr, "vdso found at %p\n", vdso); if(syscall(__NR_futex, vdso, FUTEX_WAKE, 1) < 0) perror("futex/WAKE"); if(syscall(__NR_futex, vdso, FUTEX_WAIT, *vdso, NULL) < 0) perror("futex/WAIT"); return 0; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on 24 Jan 2010 22:40 CC to futex folks. > I was experimenting with futexes and was a little surprised to > discover that futex() works on read-only pages. This creates quite a > high bandwidth side channel that allows two processes to communicate > if, for example, they share a library. (Mind you, this is not much > different from file locks, which also work on read-only file > descriptors.) > > I also found a couple of differences between 2.6.24 (from Ubuntu > hardy) and 2.6.31 (from Ubuntu karmic). The first is a definite bug > in 2.6.31: > > > 1) On 2.6.31 i686, using futex() on the vdso causes the process to get > stuck, consuming CPU in an unkillable state. Both FUTEX_WAIT and > FUTEX_WAKE cause the problem. The problem doesn't occur on 2.6.24. > (BTW, I was testing to see whether futex() on the vdso allows any two > processes to communicate. This appears not to be the case on 2.6.24.) > > A test program is below. > > > 2) Suppose a file is mapped into two processes with MAP_PRIVATE. Can > the resulting mappings be used to communicate via futex()? i.e. Does > futex() consider the mappings to be the same? > > On 2.6.24, the futex wakeup is not transferred; pages must be mapped > with MAP_SHARED for futex to work. On 2.6.31, the futex wakeup *is* > transferred; futex works with either MAP_SHARED or MAP_PRIVATE. > > 2.6.24's behaviour seems more correct, because the mappings are > logically different, even if the underlying memory pages are the same > before copy-on-write is triggered. Is 2.6.31's behaviour a > regression, or is the kernel's behaviour here supposed to be > undefined? > > Cheers, > Mark > > > /* Test futex() on the vdso, which the kernel maps on process startup. */ > > #include <stdio.h> > #include <stdlib.h> > > #include <elf.h> > #include <linux/futex.h> > #include <sys/syscall.h> > #include <unistd.h> > > > #if __WORDSIZE == 32 > # define Elf(name) Elf32_##name > #elif __WORDSIZE == 64 > # define Elf(name) Elf64_##name > #endif > > void *find_vdso(char **argv) > { > /* Find auxv. */ > char **p = argv; > /* Skip past argv. */ > while(*p) > p++; > p++; > /* Skip past env. */ > while(*p) > p++; > p++; > Elf(auxv_t) *auxv = (void *) p; > for(; auxv->a_type; auxv++) > if(auxv->a_type == AT_SYSINFO_EHDR) > return (void *) auxv->a_un.a_val; > fprintf(stderr, "vdso not found\n"); > exit(1); > } > > int main(int argc, char **argv) > { > int *vdso = find_vdso(argv); > fprintf(stderr, "vdso found at %p\n", vdso); > if(syscall(__NR_futex, vdso, FUTEX_WAKE, 1) < 0) > perror("futex/WAKE"); > if(syscall(__NR_futex, vdso, FUTEX_WAIT, *vdso, NULL) < 0) > perror("futex/WAIT"); > return 0; > } This test with function tracer output following. a.out-11459 [000] 242281.165505: get_user_pages_fast <-get_futex_key a.out-11459 [000] 242281.165505: gup_pud_range <-get_user_pages_fast a.out-11459 [000] 242281.165506: gup_pte_range <-gup_pud_range a.out-11459 [000] 242281.165506: __might_sleep <-get_futex_key a.out-11459 [000] 242281.165507: unlock_page <-get_futex_key a.out-11459 [000] 242281.165507: page_waitqueue <-unlock_page a.out-11459 [000] 242281.165508: __wake_up_bit <-unlock_page a.out-11459 [000] 242281.165508: put_page <-get_futex_key a.out-11459 [000] 242281.165508: get_user_pages_fast <-get_futex_key a.out-11459 [000] 242281.165509: gup_pud_range <-get_user_pages_fast a.out-11459 [000] 242281.165509: gup_pte_range <-gup_pud_range a.out-11459 [000] 242281.165510: __might_sleep <-get_futex_key a.out-11459 [000] 242281.165511: unlock_page <-get_futex_key a.out-11459 [000] 242281.165511: page_waitqueue <-unlock_page a.out-11459 [000] 242281.165512: __wake_up_bit <-unlock_page a.out-11459 [000] 242281.165512: put_page <-get_futex_key a.out-11459 [000] 242281.165513: get_user_pages_fast <-get_futex_key a.out-11459 [000] 242281.165513: gup_pud_range <-get_user_pages_fast a.out-11459 [000] 242281.165514: gup_pte_range <-gup_pud_range a.out-11459 [000] 242281.165515: __might_sleep <-get_futex_key a.out-11459 [000] 242281.165515: unlock_page <-get_futex_key a.out-11459 [000] 242281.165516: page_waitqueue <-unlock_page a.out-11459 [000] 242281.165516: __wake_up_bit <-unlock_page a.out-11459 [000] 242281.165517: put_page <-get_futex_key It mean the following code of get_futex_key() makes infinite loop. again: err = get_user_pages_fast(address, 1, 1, &page); if (err < 0) return err; page = compound_head(page); lock_page(page); if (!page->mapping) { unlock_page(page); put_page(page); goto again; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on 25 Jan 2010 02:30 Hi > CC to futex folks. > > > I was experimenting with futexes and was a little surprised to > > discover that futex() works on read-only pages. This creates quite a > > high bandwidth side channel that allows two processes to communicate > > if, for example, they share a library. (Mind you, this is not much > > different from file locks, which also work on read-only file > > descriptors.) > > > > I also found a couple of differences between 2.6.24 (from Ubuntu > > hardy) and 2.6.31 (from Ubuntu karmic). The first is a definite bug > > in 2.6.31: > > > > 1) On 2.6.31 i686, using futex() on the vdso causes the process to get > > stuck, consuming CPU in an unkillable state. Both FUTEX_WAIT and > > FUTEX_WAKE cause the problem. The problem doesn't occur on 2.6.24. > > (BTW, I was testing to see whether futex() on the vdso allows any two > > processes to communicate. This appears not to be the case on 2.6.24.) > > > > A test program is below. > > > > > > 2) Suppose a file is mapped into two processes with MAP_PRIVATE. Can > > the resulting mappings be used to communicate via futex()? i.e. Does > > futex() consider the mappings to be the same? > > > > On 2.6.24, the futex wakeup is not transferred; pages must be mapped > > with MAP_SHARED for futex to work. On 2.6.31, the futex wakeup *is* > > transferred; futex works with either MAP_SHARED or MAP_PRIVATE. > > > > 2.6.24's behaviour seems more correct, because the mappings are > > logically different, even if the underlying memory pages are the same > > before copy-on-write is triggered. Is 2.6.31's behaviour a > > regression, or is the kernel's behaviour here supposed to be > > undefined? Futex should work both file anon anon. however I personally think vdso is not file nor anon. it is special mappings. nobody defined futex spec on special mappings. (yes, undefined). Personally, I think EINVAL or EFAULT are best result of vdso futexing, like as futexing againt kernel address. but I guess another person have another thinking. I'd like to hear futex folks's opinion. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 25 Jan 2010 04:30 On Mon, 2010-01-25 at 16:27 +0900, KOSAKI Motohiro wrote: > Hi > > > CC to futex folks. > > > > > I was experimenting with futexes and was a little surprised to > > > discover that futex() works on read-only pages. This creates quite a > > > high bandwidth side channel that allows two processes to communicate > > > if, for example, they share a library. (Mind you, this is not much > > > different from file locks, which also work on read-only file > > > descriptors.) > > > > > > I also found a couple of differences between 2.6.24 (from Ubuntu > > > hardy) and 2.6.31 (from Ubuntu karmic). The first is a definite bug > > > in 2.6.31: > > > > > > 1) On 2.6.31 i686, using futex() on the vdso causes the process to get > > > stuck, consuming CPU in an unkillable state. Both FUTEX_WAIT and > > > FUTEX_WAKE cause the problem. The problem doesn't occur on 2.6.24. > > > (BTW, I was testing to see whether futex() on the vdso allows any two > > > processes to communicate. This appears not to be the case on 2.6.24.) > > > > > > A test program is below. > > > > > > > > > 2) Suppose a file is mapped into two processes with MAP_PRIVATE. Can > > > the resulting mappings be used to communicate via futex()? i.e. Does > > > futex() consider the mappings to be the same? > > > > > > On 2.6.24, the futex wakeup is not transferred; pages must be mapped > > > with MAP_SHARED for futex to work. On 2.6.31, the futex wakeup *is* > > > transferred; futex works with either MAP_SHARED or MAP_PRIVATE. > > > > > > 2.6.24's behaviour seems more correct, because the mappings are > > > logically different, even if the underlying memory pages are the same > > > before copy-on-write is triggered. Is 2.6.31's behaviour a > > > regression, or is the kernel's behaviour here supposed to be > > > undefined? > > Futex should work both file anon anon. however I personally think > vdso is not file nor anon. it is special mappings. nobody defined > futex spec on special mappings. (yes, undefined). > > Personally, I think EINVAL or EFAULT are best result of vdso futexing, like as > futexing againt kernel address. but I guess another person have another thinking. > > I'd like to hear futex folks's opinion. Well, my opinion is we should remove the vdso, its ugly as hell :-) But I think it would make most sense to extend its definition in the direction of it being a file (for all intents and purposes its a special DSO -- which unfortunately isn't present in any filesystem). [ For all intents and purposes processes can already communicate through futexes on the libc space, so being able to do so through the vsdo really doesn't add anything ] So the problem is that the VDSO pages do not have a page->mapping because they lack the actual filesystem part of files, so even if (with the recent zero-page patch from Kosaki-san) you make private COWs of the VDSO, you'll get stuck in that loop. So the prettiest solution is to simply place the vdso in an actual filesystem and slowly migrate towards letting userspace map it as a regular DSO -- /sys/lib{32,64}/libkernel.so like. [ that has the bonus of getting rid of install_special_mapping() ] The ugly solution is special casing the vdso in get_futex_key(). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Darren Hart on 25 Jan 2010 12:40 Peter Zijlstra wrote: > On Mon, 2010-01-25 at 16:27 +0900, KOSAKI Motohiro wrote: <snip> >> Futex should work both file anon anon. however I personally think >> vdso is not file nor anon. it is special mappings. nobody defined >> futex spec on special mappings. (yes, undefined). >> >> Personally, I think EINVAL or EFAULT are best result of vdso futexing, like as >> futexing againt kernel address. but I guess another person have another thinking. >> >> I'd like to hear futex folks's opinion. > > Well, my opinion is we should remove the vdso, its ugly as hell :-) > > But I think it would make most sense to extend its definition in the > direction of it being a file (for all intents and purposes its a special > DSO -- which unfortunately isn't present in any filesystem). > > [ For all intents and purposes processes can already communicate through > futexes on the libc space, so being able to do so through the vsdo > really doesn't add anything ] > > So the problem is that the VDSO pages do not have a page->mapping > because they lack the actual filesystem part of files, so even if (with > the recent zero-page patch from Kosaki-san) you make private COWs of the > VDSO, you'll get stuck in that loop. > > So the prettiest solution is to simply place the vdso in an actual > filesystem and slowly migrate towards letting userspace map it as a > regular DSO -- /sys/lib{32,64}/libkernel.so like. > > [ that has the bonus of getting rid of install_special_mapping() ] > > The ugly solution is special casing the vdso in get_futex_key(). I like the creating-a-real-file solution. However, for now (and for stable), I think Kosaki's suggestion of EINVAL or EFAULT is a good stop-gap. EINVAL might play the best with existing glibc implementations. -- Darren Hart IBM Linux Technology Center Real-Time Linux Team -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 Prev: NFS_FSCACHE still depends on EXPERIMENTAL? Next: fs: fix filesystem_sync vs write race on rw=>ro remount |