From: Arjan van de Ven on 15 May 2010 15:20 On Sat, 15 May 2010 06:29:25 -0700 (PDT) Dan Magenheimer <dan.magenheimer(a)oracle.com> wrote: > > It would be better to fix them to use the vsyscalls instead. > > Or if they can't use the vsyscalls for some reason today fix them. > > The problem is from an app point-of-view there is no vsyscall. > There are two syscalls: gettimeofday and clock_gettime. Sometimes, > if it gets lucky, they turn out to be very fast and sometimes > it doesn't get lucky and they are VERY slow (resulting in a > performance hit of 10% or more), depending on a number of factors > completely out of the control of the app and even undetectable to the > app. But the point is.. in the case you get that 10% hit.... that is exactly the case where tsc would not work either!!! > > If tsc_reliable is 1, the system and the kernel are guaranteeing > to the app that nothing will change in the TSC. In an Invariant > TSC system that has passed Ingo's warp test (to eliminate the > possibility of a fixed interprocessor TSC gap due to a broken BIOS > in a multi-node NUMA system), if anything changes in the clock just when we're trying to get rid of this constraint by allowing a per cpu offset... (this is needed to cope with cpus not powering on at the exact same time... including hotplug cpu etc etc) oh and.. what notification mechanism do you have to notify the application that the tsc now is no longer reliable? Such conditions can exist... for example due to a CPU being hotplugged, or some SMM screwing around and the kernel detecting that or .. or ... really. Use the vsyscall. If the vsyscall does not do exactly what you want, make a better vsyscall. But friends don't let friends use rdtsc in application code. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Dan Magenheimer on 15 May 2010 18:40 > From: Arjan van de Ven [mailto:arjan(a)infradead.org] (Arjan comments reordered somewhat) > But friends don't let friends use rdtsc in application code. Um, I realize that many people have been burned by this many times over the years so it is a "hot stove". I also realize that there are many environments where using rdtsc is risking stepping on landmines. But I (we?) also know there are many environments now where using rdtsc is NOT risky at all... and with the vast majority of new systems soon shipping with Invariant TSC and a single socket (and even most multiple-socket systems with non-broken BIOSes passing a warp test), why should past burns outlaw userland use of a very fast, very useful CPU feature? After all, CPU designers at both Intel and AMD have spent a great deal of design effort and transistors to FINALLY provide an Invariant TSC. > > The problem is from an app point-of-view there is no vsyscall. > > There are two syscalls: gettimeofday and clock_gettime. Sometimes, > > if it gets lucky, they turn out to be very fast and sometimes > > it doesn't get lucky and they are VERY slow (resulting in a > > performance hit of 10% or more), depending on a number of factors > > completely out of the control of the app and even undetectable to the > > app. > > But the point is.. in the case you get that 10% hit.... that is exactly > the case where tsc would not work either!!! Yes, understood. But the kernel doesn't expose a "gettimeofday performance sucks" flag either. If it did (or in the case of the patch, if tsc_reliable is zero) the application could at least choose to turn off the 10000-100000 timestamps/second and log a message saying "you are running on old hardware so you get fewer features". > just when we're trying to get rid of this constraint by allowing a per > cpu offset... (this is needed to cope with cpus not powering on at the > exact same time... including hotplug cpu etc etc) > > oh and.. what notification mechanism do you have to notify the > application that the tsc now is no longer reliable? Such conditions > can exist... for example due to a CPU being hotplugged, or some SMM > screwing around and the kernel detecting that or .. or ... The proposal doesn't provide a notification mechanism (though I'm not against it)... if the tsc can EVER become unreliable, tsc_reliable should be 0. A CPU-hotplugable system is a good example of a case where the kernel should expose that tsc_reliable is 0. (I've heard anecdotally that CPU hotplug into a QPI or Hypertransport system will have some other interesting challenges, so may require some special kernel parameters anyway.) Even if tsc_reliable were only enabled if a "no-cpu_hotplug" kernel parameter is set, that is still useful. And with cores-per-socket (and even nodes-per-socket) going up seemingly every day, multi-socket systems will likely be an ever smaller percentage of new systems. A virtual machine where live migration to another physical machine may occur is another good example where tsc_reliable should be 0. Xen now has a VM config feature that says "migration is disallowed" for this reason; the Invariant TSC flag is always off for a VM unless this "no_migrate" flag is set (or rdtsc is emulated). > really. Use the vsyscall. If the vsyscall does not do exactly what you > want, make a better vsyscall. If this discussion results in a better vsyscall and/or a way for applications to easily determine (and report loudly) that the system does NOT provide a good way to do a fast timestamp, that may be sufficient. But please propose how that will be done as the current software choices are inadequate and the CPU designers have finally fixed the problem for the vast majority of systems. I am already aware of some enterprise software that is doing its best to guess whether TSC is reliable by looking at CPU families and socket counts, but this is doomed to failure in userland and is something that the kernel knows and should now expose. Thanks, Dan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on 16 May 2010 01:50 On Sat, 15 May 2010 15:32:51 -0700 (PDT) Dan Magenheimer <dan.magenheimer(a)oracle.com> wrote: > > From: Arjan van de Ven [mailto:arjan(a)infradead.org] > (Arjan comments reordered somewhat) > > > But friends don't let friends use rdtsc in application code. > > Um, I realize that many people have been burned by this > many times over the years so it is a "hot stove". I also > realize that there are many environments where using > rdtsc is risking stepping on landmines. > But I (we?) also > know there are many environments now where using rdtsc is > NOT risky at all... I see a lot of Intel hardware.. (stuff that you likely don't see yet ;-) and I have not yet seen a system where the kernel would be able to give the guarantee as you describe it in your email. If you want a sysfs variable that is always 0... go wild. > and with the vast majority of new > systems soon shipping with Invariant TSC and a single socket > (and even most multiple-socket systems with non-broken > BIOSes passing a warp test), (the warp test is going away) on multisocket that passes a wrap test you can still get skew over time.. due to things like SMM, thermal throttling etc etc. > why should past burns outlaw > userland use of a very fast, very useful CPU feature? After > all, CPU designers at both Intel and AMD have spent > a great deal of design effort and transistors to FINALLY > provide an Invariant TSC. sadly even with all these transistors no system that I know of today can guarantee the guarantee by the rules you state. > > oh and.. what notification mechanism do you have to notify the > > application that the tsc now is no longer reliable? Such conditions > > can exist... for example due to a CPU being hotplugged, or some SMM > > screwing around and the kernel detecting that or .. or ... > > The proposal doesn't provide a notification mechanism (though I'm > not against it)... if the tsc can EVER become unreliable, > tsc_reliable should be 0. then it should be 0 always on all of todays hardware. SMM, thermal overload, etc etc ... you name it. Things the kernel will get notified about... > A CPU-hotplugable system is a good example of a case where > the kernel should expose that tsc_reliable is 0. (I've heard > anecdotally that CPU hotplug into a QPI or Hypertransport system > will have some other interesting challenges, so may require some > special kernel parameters anyway.) eh no. hot add works just fine. (hot remove is a very different ballgame) > > really. Use the vsyscall. If the vsyscall does not do exactly what > > you want, make a better vsyscall. > > If this discussion results in a better vsyscall and/or a way > for applications to easily determine (and report loudly) that > the system does NOT provide a good way to do a fast timestamp, > that may be sufficient. But please propose how that will be done > as the current software choices are inadequate and the CPU > designers have finally fixed the problem for the vast majority > of systems. *cough* > I am already aware of some enterprise software > that is doing its best to guess whether TSC is reliable by > looking at CPU families and socket counts, but this is doomed > to failure in userland and is something that the kernel knows > and should now expose. can you name said "enterprise" software by name please? We need a huge advertisement to let people know not to trust their important data to it.. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on 16 May 2010 05:30 On Sat, 15 May 2010, Arjan van de Ven wrote: > On Sat, 15 May 2010 15:32:51 -0700 (PDT) > Dan Magenheimer <dan.magenheimer(a)oracle.com> wrote: > > > > From: Arjan van de Ven [mailto:arjan(a)infradead.org] > > (Arjan comments reordered somewhat) > > > > > But friends don't let friends use rdtsc in application code. > > > > Um, I realize that many people have been burned by this > > many times over the years so it is a "hot stove". I also > > realize that there are many environments where using > > rdtsc is risking stepping on landmines. > > > But I (we?) also > > know there are many environments now where using rdtsc is > > NOT risky at all... > > I see a lot of Intel hardware.. (stuff that you likely don't see yet ;-) > and I have not yet seen a system where the kernel would be able to give > the guarantee as you describe it in your email. > > If you want a sysfs variable that is always 0... go wild. Nah, there are systems which will have it set to 1: Dig out your good old Pentium-I box and enjoy. > > > oh and.. what notification mechanism do you have to notify the > > > application that the tsc now is no longer reliable? Such conditions > > > can exist... for example due to a CPU being hotplugged, or some SMM > > > screwing around and the kernel detecting that or .. or ... > > > > The proposal doesn't provide a notification mechanism (though I'm > > not against it)... if the tsc can EVER become unreliable, > > tsc_reliable should be 0. > > then it should be 0 always on all of todays hardware. > SMM, thermal overload, etc etc ... you name it. > Things the kernel will get notified about... What we could expose is an estimate about the performance of gettimeofday/clock_gettime. The kernel has all the information to do that, but this still does not solve the notification problem when we need to switch to a different clock source. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Dan Magenheimer on 16 May 2010 12:50
> From: Thomas Gleixner [mailto:tglx(a)linutronix.de] > What we can talk about is a vget_tsc_raw() interface along with a > vconvert_tsc_delta() interface, where vget_tsc_raw() returns you an > nasty error code for everything which is not usable. I'm open to something like that provided: 1) It works (whenever possible) without changing privilege levels or causing vmexits or other "hidden slowness" problems when used both in bare-metal Linux and in a virtual machine. 2) The "transformation" performed by the kernel on the TSC does not require some hidden pcpu number that won't work in a virtual machine. If TSC is indeed reliable (see below), it is both faster AND meets the above constraints. > > From: Arjan van de Ven [mailto:arjan(a)infradead.org] > > If you want a sysfs variable that is always 0... go wild. > > From: Thomas Gleixner [mailto:tglx(a)linutronix.de] > Nah, there are systems which will have it set to 1: > Dig out your good old Pentium-I box and enjoy. Hot stove syndrome again? Are you truly saying that there are NO single-socket multi-core systems that don't have stupid firmware (SMI and/or BIOS)? Or are you saying that significant TSC clock skew occurs even between the cores on a single-socket Nehalem system? If things are this bad, why on earth would the kernel itself EVER use TSC even as its own internal clocksource? Or even to provide additional precision to a slow platform timer? Or are you saying that many systems (and especially large multi-socket systems) DO exist where the kernel isn't able to proactively determine that the firmware is broken and/or significant thermal variation may occur across sockets? This I believe. I understand that you both are involved in pushing the limits of large systems and that time synchronization is a very hard problem, perhaps effectively unsolvable, in these systems. But that doesn't mean the vast majority of latest generation single-socket systems can't set "tsc_reliable" to 1. Or that the kernel is responsible for detecting and/or correcting every system with buggy firmware. Maybe the best way to solve the "buggy firmware problem" is exactly by encouraging enterprise apps to use TSC and to expose and *blacklist* systems and/or system vendors who ship boxes with crappy firmware! > From: Thomas Gleixner [mailto:tglx(a)linutronix.de] > What we could expose is an estimate about the performance of > gettimeofday/clock_gettime. The kernel has all the information to do > that, but this still does not solve the notification problem when we > need to switch to a different clock source. This would at least be a big step in the right direction. But if we go with a vget_raw_tsc() or direct TSC solution, you have convinced me of the need for notification. Maybe this is a perfect use for (at least one bit in) the TSC_AUX register and the rdtscp instruction? And I do agree with Venki that some user library (or at least published sample code) should be made available to demonstrate proper usage and to dampen out the worst of the "broken user problem". > > From: Arjan van de Ven [mailto:arjan(a)infradead.org] > > can you name said "enterprise" software by name please? We need a huge > > advertisement to let people know not to trust their important data to > > it.. For obvious reasons I can't do that, but I can point to enterprise *operating systems* that have long since solved this same problem one way or another: Solaris on x86 and HP-UX (the latter admittedly on ia64). Enterprise app vendors are quite happy with requiring conformance to a very completely specified software/hardware/firmware stack before providing support to an app customer. I'm just trying to ensure that Linux can be part of that spec. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |