'clock microseconds' broken on Windows [TCL]

Prev: Tcl-URL! - weekly Tcl news and links (May 14)
Next: Getting the names of all directories

From: Helmut Giese on 15 May 2010 04:10

Hello out there,
about a week ago there was a thread
clock microseconds with resolution in milliseconds
in which it was observed that 'clock microseconds' only returned a
millisecond resolution (under Windows). The consensus at the end was
that Tcl can only get what the underlying OS offers.

To me this was worrying because I had used the high resolution timing
in the past, and who knows? The need may arise again. But only now did
I have time to investigate (nothing beats a holiday with bad wheather
to engage in this kind of activity) and it appears to be broken.

Here's the issue:
a) Tcl asks Windows if a performance counter exists.
b) If it exists but its frequency is > 15 MHz Tcl performs additional
checks and if these fail (and they apparently do on newer machines)
Tcl decides not to use the counter.

I filed a bug at SF and invite anybody with some knowledge wrt the
performance counter to add their comments (bug id 3002022) - it may
help the maintainers to resolve the bug.

For those who need this feature I can offer a work around. The
following test script will show you
- the performanc counter's current value and
- what 'clock microseconds' does: is it stuck at every millisecond or
does it "move".
---
package require Ffidl

# Define two Ffidl functions
ffidl::callout queryFrequ {pointer-var} int \
[ffidl::symbol kernel32.dll QueryPerformanceFrequency]
ffidl::callout queryCount {pointer-var} int \
[ffidl::symbol kernel32.dll QueryPerformanceCounter]

#
# getPerfCnt Get the value of the performance counter
#
proc getPerfCnt {} {
set i64 [binary format w 0]
queryCount i64
binary scan $i64 w cnt
return $cnt
}

#
# getTimeVals Collect N counter values
#
proc getTimeVals {n} {
for {set i 0} {$i < $n} {incr i} {
lappend res [getPerfCnt]
}
return $res
}

#
# getTimeVals2 Relate the perf counter to Tcl's clock
#
# In a loop collect [getPerfCnt], [clock clicks] and [clock
microseconds].
#
# Return list of N triples.
#
proc getTimeVals2 {n} {
for {set i 0} {$i < $n} {incr i} {
lappend res [list [getPerfCnt] [clock clicks]\
[clock microseconds]]
}
return $res
}

# Create a binary string suitable for a 'large integer'
set i64 [binary format w 0]
set res [queryFrequ i64]
binary scan $i64 w f
puts "Frequency: $f"
puts ""

# call the test procs once to have them byte-compiled
getTimeVals 2
getTimeVals2 2

# Run the tests and show results
set N 5
set cntLst [getTimeVals $N]
puts "raw 64 bit counter"
puts "------------------"
for {set i 0} {$i < $N} {incr i} {
puts "[format %16lu [lindex $cntLst $i]]"
}

# Test 2
set res [getTimeVals2 $N]
puts ""
puts "raw 64 bit counter clock clicks clock microseconds"
puts "-------------------------------------------------------"
for {set i 0} {$i < $N} {incr i} {
puts "[format %16lu [lindex $res $i 0]]\
[format %16lu [lindex $res $i 1]]\
[format %20lu [lindex $res $i 2]]"
}
---
'getPerfCnt' returns the counter's raw value. If you need absolute
times or delays you would need to take the counter's frequency into
account. It's not as nice as the original function (especially
regarding the time it takes to just get the value), but then - it's
just a work around.

Best regards and have a nice weekend
Helmut Giese

From: Helmut Giese on 16 May 2010 15:10

I would like to correct the subject: It's not broken, it's by 'design
out of necessity'.
Background (for those interested in technical details): Each core has
a fast running 'performance counter' (aka 'high resolution timer'),
which is at the base of every attempt to get sub-millisecond
resolution.
The problem is that on multicore machines those timers can (and
apparently will) get out of sync. Since one cannot rely on always
executing on the same core getting the counter value introduces an
element of random: A value you get may (appear to) lie in the past -
or in a distant future.
It is for this reason that Tcl deliberately relinquishes use of this
timer if a 'safe environment' cannot be determined. So far nobody
seems to have found a satisfying solution.

A word of WARNING: If anybody wants to use the "work around" I posted:
Please be aware, that it is subject to the same problem: Sukzessive
calls need not be executed on the same core - hence may report values
from different timers - hence may produce surprising results.
Sigh, sorry for the bad news.
Helmut Giese

From: MartinLemburg on 25 May 2010 03:19

Hi Helmut,

perhaps the solution we introduced in a non-tcl application for our
timing measurements is acceptable for tcl applications, too.

In the case, that we want our application to do profiling, the
application binds itself to one CPU core via ...

SetProcessAffinityMask(GetCurrentProcess(), 1)

Perhaps, the tcl core could be able to switch to one CPU core usage,
if "clock microseconds" is used the first time, to assure the
correctness of the returned times.

Best regards,

Martin

On 16 Mai, 21:10, Helmut Giese <hgi...(a)ratiosoft.com> wrote:
> I would like to correct the subject: It's not broken, it's by 'design
> out of necessity'.
> Background (for those interested in technical details): Each core has
> a fast running 'performance counter' (aka 'high resolution timer'),
> which is at the base of every attempt to get sub-millisecond
> resolution.
> The problem is that on multicore machines those timers can (and
> apparently will) get out of sync. Since one cannot rely on always
> executing on the same core getting the counter value introduces an
> element of random: A value you get may (appear to) lie in the past -
> or in a distant future.
> It is for this reason that Tcl deliberately relinquishes use of this
> timer if a 'safe environment' cannot be determined. So far nobody
> seems to have found a satisfying solution.
>
> A word of WARNING: If anybody wants to use the "work around" I posted:
> Please be aware, that it is subject to the same problem: Sukzessive
> calls need not be executed on the same core - hence may report values
> from different timers - hence may produce surprising results.
> Sigh, sorry for the bad news.
> Helmut Giese

From: Donal K. Fellows on 25 May 2010 08:44

On 25 May, 08:19, "MartinLemburg(a)Siemens-PLM"
<martin.lemburg.siemens-...(a)gmx.net> wrote:
> perhaps the solution we introduced in a non-tcl application for our
> timing measurements is acceptable for tcl applications, too.
>
> In the case, that we want our application to do profiling, the
> application binds itself to one CPU core via ...
>
> SetProcessAffinityMask(GetCurrentProcess(), 1)
>
> Perhaps, the tcl core could be able to switch to one CPU core usage,
> if "clock microseconds" is used the first time, to assure the
> correctness of the returned times.

We're not about to do that; it would utterly ruin performance on
multiprocessor systems. Better to have somewhat less accurate timing.
(You could make your own tclsh that did this though, just setting the
affinity and then calling Tcl_Main...)

Donal.

From: Helmut Giese on 25 May 2010 11:28

Hi Martin,
>perhaps the solution we introduced in a non-tcl application for our
>timing measurements is acceptable for tcl applications, too.
>
>In the case, that we want our application to do profiling, the
>application binds itself to one CPU core via ...
>
> SetProcessAffinityMask(GetCurrentProcess(), 1)
>
>Perhaps, the tcl core could be able to switch to one CPU core usage,
>if "clock microseconds" is used the first time, to assure the
>correctness of the returned times.
see Donal's reply.
This question still occupies me, but since nobody has fund a
satisfactory solution so far, it is somewhat unlikely that I will be
able to come up with one.

This is really annoying: We got technological advancements in the form
of
- high resolution timers and
- multi-core machines
but cannot use them together.
Best regards
Helmut Giese

>
>Best regards,
>
>Martin
>
>On 16 Mai, 21:10, Helmut Giese <hgi...(a)ratiosoft.com> wrote:
>> I would like to correct the subject: It's not broken, it's by 'design
>> out of necessity'.
>> Background (for those interested in technical details): Each core has
>> a fast running 'performance counter' (aka 'high resolution timer'),
>> which is at the base of every attempt to get sub-millisecond
>> resolution.
>> The problem is that on multicore machines those timers can (and
>> apparently will) get out of sync. Since one cannot rely on always
>> executing on the same core getting the counter value introduces an
>> element of random: A value you get may (appear to) lie in the past -
>> or in a distant future.
>> It is for this reason that Tcl deliberately relinquishes use of this
>> timer if a 'safe environment' cannot be determined. So far nobody
>> seems to have found a satisfying solution.
>>
>> A word of WARNING: If anybody wants to use the "work around" I posted:
>> Please be aware, that it is subject to the same problem: Sukzessive
>> calls need not be executed on the same core - hence may report values
>> from different timers - hence may produce surprising results.
>> Sigh, sorry for the bad news.
>> Helmut Giese

|
Pages: 1
Prev: Tcl-URL! - weekly Tcl news and links (May 14)
Next: Getting the names of all directories