From: Dan Magenheimer on 24 May 2010 20:10 > Sorry, this is sort of mixing points > Again, this is mixing the discussion. Oops, sorry, missed that. :-} > Maybe something like a tsc_long_calibration=1 option would allow for > this? Sounds good to me. If it's non-obvious what value to choose for the new calibration, maybe specifying it in MS (per MAX_QUICK_PIT_MS in arch/x86/kernel/tsc.c) would be nice. > However, I really do like the idea of pulling the stamped value from > the MSR and if its close to what we quickly calibrated, use that. On a quick sample of two machines looking at the TSC calibration done by Xen (which exposes the equivalent of tsc_khz), it appears that the stamped value is different from the calibration by about 1000ppm. YMMV. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on 24 May 2010 20:20 On 05/24/2010 05:01 PM, Dan Magenheimer wrote: > On a quick sample of two machines looking at the TSC calibration > done by Xen (which exposes the equivalent of tsc_khz), it appears > that the stamped value is different from the calibration by about > 1000ppm. YMMV. So pretty seriously different. Less than I would have expected, but not out of the ballpark. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Brian Bloniarz on 24 May 2010 21:40 I still sorta think the clearest thing is to keep the kernel's quick, dynamic calibration at boot, and expose the tsc_khz it decides on. People who don't care about clocksync get their quick boot, NTP knows enough to correct its drift estimate when it starts, and users don't need to learn all these details to get stable clocksync (*). If we're being strict, something like NTP needs to know exactly what's driving gettimeofday(). If the clocksource changes, it needs to know that so it could correct or trash its drift estimate. If there's one-time calibration, it needs to know what the result was. If there's continuous calibration, it either needs to be notified, or have the ability to disable it. Right? So I think exporting tsc_khz in some form is a step in the right direction. So what's wrong with just adding a /sys/devices/system/clocksource/clocksource0/tsc_khz? Maybe Thomas Gleixner's suggestion of a vget_tsc_raw() would also suffice, I'm not sure I understand the details enough. Any of the other fixes people have discussed (tsc_khz= bootopt, tsc_long_calibration=1) would be enough to make me happy though :) (*) Though they still need to learn enough to coax the kernel into giving them a fast gettimeofday(). That's a price you gotta pay if you care enough :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Brian Bloniarz on 25 May 2010 20:20 On 05/24/2010 09:33 PM, Brian Bloniarz wrote: > So what's wrong with just adding a > /sys/devices/system/clocksource/clocksource0/tsc_khz? As an RFC: Add clocksource.sys_register & sys_unregister so the current clocksource can add supplemental information to /sys/devices/system/clocksource/clocksource0/ Export tsc_khz when current_clocksource==tsc so that daemons like NTP can account for the variability of calibration results. diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 9faf91a..9c99965 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -76,6 +76,11 @@ unsigned long long sched_clock(void) __attribute__((alias("native_sched_clock"))); #endif +int sysfs_tsc_register(struct sys_device *clocksource_dev, + struct clocksource *cs); +void sysfs_tsc_unregister(struct sys_device *clocksource_dev, + struct clocksource *cs); + int check_tsc_unstable(void) { return tsc_unstable; @@ -757,6 +762,8 @@ static struct clocksource clocksource_tsc = { #ifdef CONFIG_X86_64 .vread = vread_tsc, #endif + .sys_register = sysfs_tsc_register, + .sys_unregister = sysfs_tsc_unregister, }; void mark_tsc_unstable(char *reason) @@ -967,3 +974,22 @@ void __init tsc_init(void) init_tsc_clocksource(); } +static ssize_t show_tsc_khz( + struct sys_device *dev, struct sysdev_attribute *attr, char *buf) +{ + return sprintf(buf, "%u\n", tsc_khz); +} + +static SYSDEV_ATTR(tsc_khz, 0444, show_tsc_khz, NULL); + +int sysfs_tsc_register(struct sys_device *clocksource_dev, + struct clocksource *cs) +{ + return sysdev_create_file(clocksource_dev, &attr_tsc_khz); +} + +void sysfs_tsc_unregister(struct sys_device *clocksource_dev, + struct clocksource *cs) +{ + sysdev_remove_file(clocksource_dev, &attr_tsc_khz); +} diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 5ea3c60..d9f6f13 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -15,6 +15,7 @@ #include <linux/cache.h> #include <linux/timer.h> #include <linux/init.h> +#include <linux/sysdev.h> #include <asm/div64.h> #include <asm/io.h> @@ -156,6 +157,8 @@ extern u64 timecounter_cyc2time(struct timecounter *tc, * @vread: vsyscall based read * @suspend: suspend function for the clocksource, if necessary * @resume: resume function for the clocksource, if necessary + * @sys_register: optional, register additional sysfs attributes + * @sys_unregister: optional, unregister sysfs attributes */ struct clocksource { /* @@ -194,6 +197,10 @@ struct clocksource { struct list_head wd_list; cycle_t wd_last; #endif + int (*sys_register)(struct sys_device *clocksource_dev, + struct clocksource *cs); + void (*sys_unregister)(struct sys_device *clocksource_dev, + struct clocksource *cs); }; /* diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index f08e99c..d8b69a5 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -41,6 +41,8 @@ void timecounter_init(struct timecounter *tc, } EXPORT_SYMBOL_GPL(timecounter_init); +void sysfs_alter_clocksource(struct clocksource *old, struct clocksource *new); + /** * timecounter_read_delta - get nanoseconds since last call of this function * @tc: Pointer to time counter @@ -572,6 +574,7 @@ static void clocksource_select(void) } if (curr_clocksource != best) { printk(KERN_INFO "Switching to clocksource %s\n", best->name); + sysfs_alter_clocksource(curr_clocksource, best); curr_clocksource = best; timekeeping_notify(curr_clocksource); } @@ -834,6 +837,8 @@ static struct sys_device device_clocksource = { .cls = &clocksource_sysclass, }; +static int sysfs_active = 0; + static int __init init_clocksource_sysfs(void) { int error = sysdev_class_register(&clocksource_sysclass); @@ -848,10 +853,34 @@ static int __init init_clocksource_sysfs(void) error = sysdev_create_file( &device_clocksource, &attr_available_clocksource); + + if (!error) + { + mutex_lock(&clocksource_mutex); + if(curr_clocksource->sys_register) + error = curr_clocksource->sys_register( + &device_clocksource, curr_clocksource); + mutex_unlock(&clocksource_mutex); + } + + if (!error) + sysfs_active = 1; return error; } device_initcall(init_clocksource_sysfs); + +void sysfs_alter_clocksource(struct clocksource *old, + struct clocksource *new) +{ + if(!sysfs_active) + return; + if(old->sys_unregister) + old->sys_unregister(&device_clocksource, old); + if(new->sys_register) + new->sys_register(&device_clocksource, new); +} + #endif /* CONFIG_SYSFS */ /** -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: john stultz on 25 May 2010 20:50
On Tue, 2010-05-25 at 20:16 -0400, Brian Bloniarz wrote: > On 05/24/2010 09:33 PM, Brian Bloniarz wrote: > > So what's wrong with just adding a > > /sys/devices/system/clocksource/clocksource0/tsc_khz? > > As an RFC: > > Add clocksource.sys_register & sys_unregister so the > current clocksource can add supplemental information to > /sys/devices/system/clocksource/clocksource0/ > > Export tsc_khz when current_clocksource==tsc so that > daemons like NTP can account for the variability of > calibration results. I think this is a bad idea, as it creates an ABI that is arch AND machine specific, which will cause portability problems in applications that expect the interface to be there. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |