From: Oleg Nesterov on 2 Jul 2010 12:20 In no way I can review this patch, but I am curious and have the questions. Also, I think it makes sense to cc the fs/ developers, I've added Al. On 07/02, Jiri Olsa wrote: > > there's a race among calling gettimeofday(2) and a file's time > updates. Following test program expose the race. > > run it in the while loop > while [ 1 ]; do ./test1 || break; done > > --- SNIP --- > #include <stdio.h> > #include <stdlib.h> > #include <fcntl.h> > > int main (void) > { > struct stat st; > struct timeval tv; > > unlink("./file"); > > gettimeofday(&tv, NULL); > > if (-1 == creat("./file", O_RDWR)) { > perror("creat"); > return -1; > } > > if (stat("./file", &st) != 0) { > perror("stat"); > return -1; > } > > printf("USER gtod: %ld\n", (long)tv.tv_sec); > printf("USER file: %ld.%09u\n", > (long) st.st_mtime, > (unsigned) st.st_mtim.tv_nsec); > > return tv.tv_sec <= st.st_mtime ? 0 : -1; > } Interesting. To the point, I actually compiled this code and yes, it triggers the problem on ext3 ;) > The following patch will prevent the race by adding the > CURRENT_TIME_SEC_REAL macro, which will return seconds from > the getnstimeofday call, ensuring it's computed on current tick. > It fixes the 'creat' case for ext4. What about other filesystems? Perhaps it makes sense to change CURRENT_TIME_SEC instead of adding CURRENT_TIME_SEC_REAL? Once again, I am asking. It is not that I suggest to change your patch. But there is something I can't understand. We have #define CURRENT_TIME_SEC ((struct timespec) { get_seconds(), 0 }) and your patch adds #define CURRENT_TIME_SEC_REAL ((struct timespec) { get_seconds_real(), 0 }) which fixes the problem for ext4. But, get_seconds() is also used by sys_time(), and we should have the same problem with another trivial test-case: #include <stdio.h> #include <sys/time.h> #include <time.h> int main(void) { struct timeval tv; int sec; do { gettimeofday(&tv, NULL); sec = time(NULL); } while (tv.tv_sec <= sec); printf("gtod: %ld.%06ld\n", tv.tv_sec, tv.tv_usec); printf("time: %d.000000\n", sec); return 0; } However, this test-case can't trigger the problem. Confused. Oleg. > > I'm not sure how much trouble is having this race unfixed compared > to the performance impact the fix might have. Maybe there're > better ways to fix this. > > > thanks for any ideas, > jirka > > > > Signed-off-by: Jiri Olsa <jolsa(a)redhat.com> > --- > diff --git a/fs/ext2/ialloc.c b/fs/ext2/ialloc.c > index 938dbc7..7a0a2fc 100644 > --- a/fs/ext2/ialloc.c > +++ b/fs/ext2/ialloc.c > @@ -558,7 +558,7 @@ got: > > inode->i_ino = ino; > inode->i_blocks = 0; > - inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC; > + inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC_REAL; > memset(ei->i_data, 0, sizeof(ei->i_data)); > ei->i_flags = > ext2_mask_flags(mode, EXT2_I(dir)->i_flags & EXT2_FL_INHERITED); > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 19a4de5..2c2925c 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -1157,7 +1157,7 @@ static inline struct ext4_inode_info *EXT4_I(struct inode *inode) > static inline struct timespec ext4_current_time(struct inode *inode) > { > return (inode->i_sb->s_time_gran < NSEC_PER_SEC) ? > - current_fs_time(inode->i_sb) : CURRENT_TIME_SEC; > + current_fs_time(inode->i_sb) : CURRENT_TIME_SEC_REAL; > } > > static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) > diff --git a/include/linux/time.h b/include/linux/time.h > index ea3559f..f390687 100644 > --- a/include/linux/time.h > +++ b/include/linux/time.h > @@ -109,12 +109,14 @@ void timekeeping_init(void); > extern int timekeeping_suspended; > > unsigned long get_seconds(void); > +unsigned long get_seconds_real(void); > struct timespec current_kernel_time(void); > struct timespec __current_kernel_time(void); /* does not hold xtime_lock */ > struct timespec get_monotonic_coarse(void); > > #define CURRENT_TIME (current_kernel_time()) > #define CURRENT_TIME_SEC ((struct timespec) { get_seconds(), 0 }) > +#define CURRENT_TIME_SEC_REAL ((struct timespec) { get_seconds_real(), 0 }) > > /* Some architectures do not supply their own clocksource. > * This is mainly the case in architectures that get their > diff --git a/kernel/time.c b/kernel/time.c > index 848b1c2..ce10dae 100644 > --- a/kernel/time.c > +++ b/kernel/time.c > @@ -227,7 +227,8 @@ SYSCALL_DEFINE1(adjtimex, struct timex __user *, txc_p) > */ > struct timespec current_fs_time(struct super_block *sb) > { > - struct timespec now = current_kernel_time(); > + struct timespec now; > + getnstimeofday(&now); > return timespec_trunc(now, sb->s_time_gran); > } > EXPORT_SYMBOL(current_fs_time); > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index caf8d4d..7ebfe23 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -897,6 +897,14 @@ unsigned long get_seconds(void) > } > EXPORT_SYMBOL(get_seconds); > > +unsigned long get_seconds_real(void) > +{ > + struct timespec ts; > + getnstimeofday(&ts); > + return ts.tv_sec; > +} > +EXPORT_SYMBOL(get_seconds_real); > + > struct timespec __current_kernel_time(void) > { > return xtime; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on 2 Jul 2010 12:30 On 07/02, Oleg Nesterov wrote: > > But, get_seconds() is also used by sys_time(), and we should have the > same problem with another trivial test-case: > > #include <stdio.h> > #include <sys/time.h> > #include <time.h> > > int main(void) > { > struct timeval tv; > int sec; > > do { > gettimeofday(&tv, NULL); > sec = time(NULL); > } while (tv.tv_sec <= sec); > > printf("gtod: %ld.%06ld\n", tv.tv_sec, tv.tv_usec); > printf("time: %d.000000\n", sec); > return 0; > } > > However, this test-case can't trigger the problem. Confused. Aha. libc's time() doesn't use sys_time(), it uses __vsyscall(1) vtime()->do_vgettimeofday(). This one quickly triggers the "time goes backward" case. #include <stdio.h> #include <sys/time.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> int main(void) { struct timeval tv; int sec; do { gettimeofday(&tv, NULL); sec = syscall(__NR_time, NULL); } while (tv.tv_sec <= sec); printf("gtod: %ld.%06ld\n", tv.tv_sec, tv.tv_usec); printf("time: %d.000000\n", sec); return 0; } Not that I think this "problem" should be fixed, just curious. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jiri Olsa on 2 Jul 2010 20:00 On Fri, Jul 02, 2010 at 06:14:22PM +0200, Oleg Nesterov wrote: > In no way I can review this patch, but I am curious and have the questions. > Also, I think it makes sense to cc the fs/ developers, I've added Al. thanks > > On 07/02, Jiri Olsa wrote: > > > > there's a race among calling gettimeofday(2) and a file's time > > updates. Following test program expose the race. > > > > run it in the while loop > > while [ 1 ]; do ./test1 || break; done > > > > --- SNIP --- > > #include <stdio.h> > > #include <stdlib.h> > > #include <fcntl.h> > > > > int main (void) > > { > > struct stat st; > > struct timeval tv; > > > > unlink("./file"); > > > > gettimeofday(&tv, NULL); > > > > if (-1 == creat("./file", O_RDWR)) { > > perror("creat"); > > return -1; > > } > > > > if (stat("./file", &st) != 0) { > > perror("stat"); > > return -1; > > } > > > > printf("USER gtod: %ld\n", (long)tv.tv_sec); > > printf("USER file: %ld.%09u\n", > > (long) st.st_mtime, > > (unsigned) st.st_mtim.tv_nsec); > > > > return tv.tv_sec <= st.st_mtime ? 0 : -1; > > } > > Interesting. To the point, I actually compiled this code and yes, > it triggers the problem on ext3 ;) > > > The following patch will prevent the race by adding the > > CURRENT_TIME_SEC_REAL macro, which will return seconds from > > the getnstimeofday call, ensuring it's computed on current tick. > > It fixes the 'creat' case for ext4. > > What about other filesystems? Perhaps it makes sense to change > CURRENT_TIME_SEC instead of adding CURRENT_TIME_SEC_REAL? > > Once again, I am asking. It is not that I suggest to change your patch. well, the patch is more or less to prove the problem exists and it could be fixed :) also I'm not sure that all the places using CURRENT_TIME_SEC suffer the same issue.. I'm currenty looking to the code and trying to come up with better solution.. any ideas are welcome :) jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: john stultz on 6 Jul 2010 19:10 On Fri, Jul 2, 2010 at 12:41 AM, Jiri Olsa <jolsa(a)redhat.com> wrote: > hi, > > there's a race among calling gettimeofday(2) and a file's time > updates. �Following test program expose the race. > > run it in the while loop > � � � �while [ 1 ]; do ./test1 || break; done > > --- SNIP --- > #include <stdio.h> > #include <stdlib.h> > #include <fcntl.h> > > int main (void) > { > � � � �struct stat st; > � � � �struct timeval tv; > > � � � �unlink("./file"); > > � � � �gettimeofday(&tv, NULL); > > � � � �if (-1 == creat("./file", O_RDWR)) { > � � � � � � � �perror("creat"); > � � � � � � � �return -1; > � � � �} > > � � � �if (stat("./file", &st) != 0) { > � � � � � � � �perror("stat"); > � � � � � � � �return -1; > � � � �} > > � � � �printf("USER gtod: %ld\n", (long)tv.tv_sec); > � � � �printf("USER file: %ld.%09u\n", > � � � � � � � � � � � �(long) st.st_mtime, > � � � � � � � � � � � �(unsigned) st.st_mtim.tv_nsec); > > � � � �return tv.tv_sec <= st.st_mtime ? 0 : -1; > } > --- SNIP --- > > > The point is that the stat call returns time that > sometime precedes time from gettimeofday. > > The reason follows. > > � � � �- inode's time is initialized by CURRENT_TIME_SEC macro, > � � � � �which returns tv_sec portion of xtime variable > � � � �- the xtime is updated by update_wall_time being called > � � � � �each tick (not that often for NO_HZ) > � � � �- vgettimeofday reads the actuall clocksource tick > � � � � �and computes the time > > Thus while the inode's time is based on the xtime update > by the update_wall_time function, the vgettimeofday computes > the time correctly each time it's called. > > Thus the race is triggered within 2 update_wall_time updates, > when in between the gettimeofday and creat calls happened. > > > > ticks � � � � � � � � � CPU � � � � � � � � � update_wall_time executed > ------------------------------------------------------------------------------- > �t1 > � � � � � � � � � � � � � � � � � � � � � � � � � � � �update 1 > � � � � � � � � � � � � � � � � � � � � � (xtime is computed based on tick t1) > > > �t2 > > � � � | � � � � gettimeofday � � � � � | > � � � | (returns time based on tick 2) | > � � � | � � � � � � � � � � � � � � � �| > � � � | � � � � � �creat � � � � � � � | > � � � | � (set time based on tick 1) � | > > > � � � � � � � � � � � � � � � � � � � � � � � � � � � �update 2 > � � � � � � � � � � � � � � � � � � � � � (xtime is computed based on tick t2) > �t3 > ------------------------------------------------------------------------------- > > > > This issue was described in the BZ 244697 > > � � � �Time goes backward - gettimeofday() vs. rename() > � � � �https://bugzilla.redhat.com/show_bug.cgi?id=244697 > > > It's not just issue of the creat but few others like rename. > > > The following patch will prevent the race by adding the > CURRENT_TIME_SEC_REAL macro, which will return seconds from > the getnstimeofday call, ensuring it's computed on current tick. > It fixes the 'creat' case for ext4. > > > I'm not sure how much trouble is having this race unfixed compared > to the performance impact the fix might have. Maybe there're > better ways to fix this. I do worry that your patch will have too much of a performance hit. I think the right fix would be in vtime(). Test patch to follow shortly. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: john stultz on 6 Jul 2010 19:20 On Tue, 2010-07-06 at 16:03 -0700, john stultz wrote: > On Fri, Jul 2, 2010 at 12:41 AM, Jiri Olsa <jolsa(a)redhat.com> wrote: > > This issue was described in the BZ 244697 > > > > Time goes backward - gettimeofday() vs. rename() > > https://bugzilla.redhat.com/show_bug.cgi?id=244697 > > > > > > It's not just issue of the creat but few others like rename. > > > > > > The following patch will prevent the race by adding the > > CURRENT_TIME_SEC_REAL macro, which will return seconds from > > the getnstimeofday call, ensuring it's computed on current tick. > > It fixes the 'creat' case for ext4. > > > > > > I'm not sure how much trouble is having this race unfixed compared > > to the performance impact the fix might have. Maybe there're > > better ways to fix this. > > I do worry that your patch will have too much of a performance hit. I > think the right fix would be in vtime(). > > Test patch to follow shortly. So the following (untested) patch _should_ resolve this in mainline on x86. Please let me know if it works for you. However, for older kernels, this patch won't be sufficient, as it depends on update_vsyscall getting the time at the last tick in its wall_time, and kernels that don't have the logarithmic accumulation code & remove xtime_cache patches can't be fixed so easily. Once we get this upstream, let me know if you have any questions about how to backport this to older kernels. thanks -john [PATCH] x86: Fix vtime/file timestamp inconsistencies Due to vtime calling vgettimeofday(), its possible that an application could call time();create("stuff",O_RDRW); only to see the file's creation timestamp to be before the value returned by time. The proper fix is to make vtime use the same time value as current_kernel_time() (which is exported via update_vsyscall) instead of vgettime(). Signed-off-by: John Stultz <johnstul(a)us.ibm.com> diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c index 1c0c6ab..ce9a4fa 100644 --- a/arch/x86/kernel/vsyscall_64.c +++ b/arch/x86/kernel/vsyscall_64.c @@ -169,13 +169,18 @@ int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz) * unlikely */ time_t __vsyscall(1) vtime(time_t *t) { - struct timeval tv; + unsigned seq; time_t result; if (unlikely(!__vsyscall_gtod_data.sysctl_enabled)) return time_syscall(t); - vgettimeofday(&tv, NULL); - result = tv.tv_sec; + do { + seq = read_seqbegin(&__vsyscall_gtod_data.lock); + + result = vsyscall_gtod_data.wall_time_sec; + + } while (read_seqretry(&__vsyscall_gtod_data.lock, seq)); + if (t) *t = result; return result; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 Prev: iwl3945: HARDWARE GONE?? Next: [PATCH UPDATED 4/4] async: use workqueue for worker pool |