From: Dan Magenheimer on 18 May 2010 11:20 > From: Peter Zijlstra [mailto:peterz(a)infradead.org] > Subject: Re: [PATCH] x86: Export tsc related information in sysfs > > On Tue, 2010-05-18 at 13:25 +0200, Andi Kleen wrote: > > On Tue, May 18, 2010 at 11:58:18AM +0200, Peter Zijlstra wrote: > > > On Sun, 2010-05-16 at 22:06 -0700, Arjan van de Ven wrote: > > > > look we're not disabling ring 3 tsc. We could, but we don't. > > > > > > Maybe we should. > > > > That would kill the vsyscall too. Remember it's running in ring 3. > > > > That is in theory you could disable it on systems where the vsyscall > > doesn't use it, but then you would likely break huge amounts of > software, > > unless you emulate it. > > Well, software shouldn't use it, so breaking it sounds like a fine > idea ;-) Last fall, I discovered that EVERY program on RHEL5 (U2?) uses rdtsc because RHEL5 ld.so uses rdtsc. The uses are few and harmless but are there nonetheless. Clearly that can be fixed, but you might be surprised how big "huge amounts of software" is. > Also, a slow emulation is an incentive to actually do the right thing. Emulation is not particularly slow, especially compared to accessing the HPET. If the kernel deems TSC is unsafe, the ring 3 vsyscall shouldn't be using rdtsc either so the additional trap overhead might be in the noise. As long as there is a sysfs file that can override the setting and there is a counter (accessible via sysfs) that can count the number of emulated rdtsc/rdtscp instructions (possibly optionally by pid so the "offending" userland threads can be tracked down), setting CR4.TSD whenever the kernel deems TSC is unsafe and emulating rdtsc might be a reasonable solution. Infrequent rdtsc users won't know or care, and frequent users will at least be able to learn the frequency of their "sin". And to help Thomas/Arjan/Ingo/Andi educate users, every read or write to any of these sysfs files could also result in a printk of "Use of rdtsc is deprecated... use vsyscalls instead. See Documentation/friends_dont_let_friends_use_rdtsc." (Half ;-) And the sysfs file could have a "strict" setting which kills any thread that uses rdtsc, so Thomas can tell future problem reporters: "Set the rdtsc setting to strict and if you still have problems, call me back." (Other half of ;-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on 18 May 2010 12:50 On 05/18/2010 04:58 AM, Peter Zijlstra wrote: > On Tue, 2010-05-18 at 13:25 +0200, Andi Kleen wrote: >> On Tue, May 18, 2010 at 11:58:18AM +0200, Peter Zijlstra wrote: >>> On Sun, 2010-05-16 at 22:06 -0700, Arjan van de Ven wrote: >>>> look we're not disabling ring 3 tsc. We could, but we don't. >>> >>> Maybe we should. >> >> That would kill the vsyscall too. Remember it's running in ring 3. >> >> That is in theory you could disable it on systems where the vsyscall >> doesn't use it, but then you would likely break huge amounts of software, >> unless you emulate it. > > Well, software shouldn't use it, so breaking it sounds like a fine > idea ;-) > > Also, a slow emulation is an incentive to actually do the right thing. The problem is that you throw the baby (vsyscall) out with the bathwater (user rdtsc). -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 18 May 2010 13:00 On Tue, 2010-05-18 at 09:40 -0700, H. Peter Anvin wrote: > > The problem is that you throw the baby (vsyscall) out with the bathwater > (user rdtsc). Well, we could only flip the CR4 bit when we mark the TSC unsuitable for gtod. That should be plenty good to tag all userspace trying to use it, since more than half my machines don't use TSC for clocksource. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on 18 May 2010 13:10 On 05/18/2010 09:52 AM, Peter Zijlstra wrote: > On Tue, 2010-05-18 at 09:40 -0700, H. Peter Anvin wrote: >> >> The problem is that you throw the baby (vsyscall) out with the bathwater >> (user rdtsc). > > Well, we could only flip the CR4 bit when we mark the TSC unsuitable for > gtod. That should be plenty good to tag all userspace trying to use it, > since more than half my machines don't use TSC for clocksource. This might be an option, although it would have to be an *option*. There are restricted uses of the TSC in userspace which are still useful (mainly involving performance analysis and/or CPU-locked processes). -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Dan Magenheimer on 18 May 2010 14:00
> From: H. Peter Anvin [mailto:hpa(a)zytor.com] > Sent: Tuesday, May 18, 2010 11:04 AM > To: Peter Zijlstra > Cc: Andi Kleen; Arjan van de Ven; Dan Magenheimer; Thomas Gleixner; > Venkatesh Pallipadi; Ingo Molnar; Chris Mason; linux- > kernel(a)vger.kernel.org > Subject: Re: [PATCH] x86: Export tsc related information in sysfs > > On 05/18/2010 09:52 AM, Peter Zijlstra wrote: > > On Tue, 2010-05-18 at 09:40 -0700, H. Peter Anvin wrote: > >> > >> The problem is that you throw the baby (vsyscall) out with the > bathwater > >> (user rdtsc). > > > > Well, we could only flip the CR4 bit when we mark the TSC unsuitable > for > > gtod. That should be plenty good to tag all userspace trying to use > it, > > since more than half my machines don't use TSC for clocksource. > > This might be an option, although it would have to be an *option*. > There are restricted uses of the TSC in userspace which are still > useful > (mainly involving performance analysis and/or CPU-locked processes). (Though I expect tglx/arjan/andi/mingo to disagree with this proposal for similar reasons as the original one that started this thread...) Proposal: /sys/devices/system/tsc/native (writable by root): 0 = (default) Kernel dynamically controls TSC emulation. When the kernel deems TSC usable as a clocksource, rdtsc will be executed directly by the CPU. When the kernel deems TSC unsafe to use, rdtsc will be trapped and emulated. 1 = TSC emulation is never enabled. Programs using rdtsc directly are subject to the many known and sometimes rare and subtle vagaries of TSC. 2 = TSC emulation is always enabled (for debug only) 3 = Processes using TSC will be treated as if they executed an illegal instruction. [? Can the kernel recognize use of rdtsc in a vsyscall and emulate so that, even though vsyscall is slower, all other rdtsc in userspace are illegal?] [? Can/should this be enforced only on non-root processes?] /sys/devices/system/tsc/system_count (writable by root): Contains a count of all TSC emulations, system-wide. Writable to allow reset to zero. /sys/devices/system/tsc/pid_counters (writable by root): 0 = (default) TSC counts are system-wide only 1 = TSC counted per pid (at performance penalty) counters in /proc/PID/tsc_count /proc/PID/tsc_count (readonly): If /sys/devices/system/tsc_pid_counters is 1, contains the count of rdtsc instructions emulated for this PID. (Note: except for the actual instruction emulation which will be faithful, rdtscp will be treated and counted as a rdtsc.) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |