From: Casper H.S. Dik on 9 Mar 2010 16:03 Chris Friesen <cbf123(a)mail.usask.ca> writes: >On 03/09/2010 11:58 AM, Scott Lurndal wrote: >> So put a very high res timer in the northbridge and have it respond >> to some address above top of high memory. >You mean like mapping /dev/hpet on modern x86 systems running linux? :) >I seem to remember an architecture (maybe Sparc?) that distributed a >fast-but-not-insanely-fast clock pulse to all cpus. Like 1MHz or >something similar. This was fast enough to be useful but not so fast >that clock skew becomes significant. This then incremented a counter in >each cpu which could be read in a single instruction. 10MHz, IIRC; the %stick register. (There's also %tick which counts on the clock frequency) Casper -- Expressed in this posting are my opinions. They are in no way related to opinions held by my employer, Sun Microsystems. Statements on Sun products included here are not gospel and may be fiction rather than truth.
From: Scott Lurndal on 9 Mar 2010 20:32 Chris Friesen <cbf123(a)mail.usask.ca> writes: >On 03/09/2010 11:58 AM, Scott Lurndal wrote: > >> So put a very high res timer in the northbridge and have it respond >> to some address above top of high memory. > >You mean like mapping /dev/hpet on modern x86 systems running linux? :) Not really. The HPET still relies on interrupts (high perf _EVENT_ timer). > >I seem to remember an architecture (maybe Sparc?) that distributed a >fast-but-not-insanely-fast clock pulse to all cpus. Like 1MHz or >something similar. This was fast enough to be useful but not so fast >that clock skew becomes significant. This then incremented a counter in >each cpu which could be read in a single instruction. > And we're looped back to rdtsc :-) scott
From: Chris Friesen on 10 Mar 2010 11:08 On 03/09/2010 07:32 PM, Scott Lurndal wrote: > Chris Friesen <cbf123(a)mail.usask.ca> writes: >> On 03/09/2010 11:58 AM, Scott Lurndal wrote: >> >>> So put a very high res timer in the northbridge and have it respond >>> to some address above top of high memory. >> >> You mean like mapping /dev/hpet on modern x86 systems running linux? :) > > Not really. The HPET still relies on interrupts (high perf _EVENT_ timer). I believe you can read the HPET to get a 64-bit timestamp. It's slower than rdtsc though. >> I seem to remember an architecture (maybe Sparc?) that distributed a >> fast-but-not-insanely-fast clock pulse to all cpus. Like 1MHz or >> something similar. This was fast enough to be useful but not so fast >> that clock skew becomes significant. This then incremented a counter in >> each cpu which could be read in a single instruction. >> > > And we're looped back to rdtsc :-) Until relatively recently (especially on AMC cpus) rdtsc varied with cpu frequency and sleep states, and was not necessarily synchronized across multiple cores. It's now possible to determine whether rdtsc is reliable...on linux an easy way is to look at /proc/cpuinfo. Ideally you want to see "constant_tsc" and "nonstop_tsc". Chris
From: William Ahern on 10 Mar 2010 14:10 Chris Friesen <cbf123(a)mail.usask.ca> wrote: > On 03/09/2010 07:32 PM, Scott Lurndal wrote: <snip> > > And we're looped back to rdtsc :-) > Until relatively recently (especially on AMC cpus) rdtsc varied with cpu > frequency and sleep states, and was not necessarily synchronized across > multiple cores. > It's now possible to determine whether rdtsc is reliable...on linux an > easy way is to look at /proc/cpuinfo. Ideally you want to see > "constant_tsc" and "nonstop_tsc". On Linux/x86_64, at least, the kernel already uses HPET+rtdsc tricks, and it uses some special hacks for gettimeofday and similar so that a regular system call isn't necessary. You can tell whether it's enabled by cat /proc/sys/kernel/vsyscall64 It should read 1 or 2. If 0 then it's falling back to a regular syscall. I can do 2^26 calls to gettimeofday in 3.8 seconds william(a)proxy0:/tmp$ time ./bench real 0m3.800s user 0m3.800s sys 0m0.000s If I disable vsyscall64 then it runs only 4x slower, which is a testament to how fast system calls are in general on Linux/x86. I think the vsyscall (now called vdso, I think) mechanism is also implemented on other architectures.
From: William Ahern on 10 Mar 2010 15:36
Scott Lurndal <scott(a)slp53.sl.home> wrote: <snip> > SVR4/Unixware had reserved read-only page in the application virtual address space > that could be mapped into the application (silently, the first time > gettimeofday() was called). This page had the current TOD at a fixed > location (and was updated out of the kernel timer routines); this turned > gettimeofday() into a simple memory reference. IIRC they did this to > improve Oracle performance. This is pretty much how it works in Linux (x86, ppc, and s390). For example, see do_realtime() starting at line 46 in http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/arch/x86/vdso/vclock_gettime.c |