Prev: Correct way to write a wrapper for C functions that accept/return ?strings
Next: Correct way to write a wrapper for C functions that accept/return ??strings
From: Richard Maine on 9 Mar 2010 16:29 JB <foo(a)bar.invalid> wrote: > Here in Fortran-happy-happy-land, the solution in the vast majority of > cases is to use the standard timing intrinsics DATE_AND_TIME, > SYSTEM_CLOCK, and CPU_TIME. Indeed. I would say that one should have very specific reasons to use anything other than those. I won't deny that those reasons might exist in some cases, but they really do need to be specific. Something like "I read a post on comp.lang.fortran from someone who says that he usually uses method x (be it RDTSC or anything else)" doesn't even come close to being specific enough. Answering in advance the obvious question about what would be specific enough, I'd say that if one has to ask, then one doesn't have a specific enough reason. -- Richard Maine | Good judgment comes from experience; email: last name at domain . net | experience comes from bad judgment. domain: summertriangle | -- Mark Twain
From: Arjan on 9 Mar 2010 16:42 > Here in Fortran-happy-happy-land, the solution in the vast majority of > cases is to use the standard timing intrinsics DATE_AND_TIME, > SYSTEM_CLOCK, and CPU_TIME. Sounds like I have an answer! Thanks! A.
From: glen herrmannsfeldt on 9 Mar 2010 16:46 JB <foo(a)bar.invalid> wrote: > On 2010-03-09, glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote: (snip, in response to a question about performance timing) >> For IA32, I usually use a routine that returns the value of >> the time stamp counter, as given by the RDTSC instruction. > Here, let me formulate a corollary to Godwin's law: "As an online > programming discussion about timing grows longer, the probability of > someone suggesting use of RDTSC approaches 1". > The wikipedia page contains reasons why it should not be used except > in very specific circumstances: > http://en.wikipedia.org/wiki/Rdtsc I completely agree that one needs to be careful with its use. Even so, I have never had any problems. I have never seen a negative increment, even on multiprocessor systems. With variable clock rate processors, one has to know what is important. When trying to find computation bottlenecks, I usually consider clock cycles to be the important factor, not elapsed time (especially in a possibly varying clock speed processor.) > Here in Fortran-happy-happy-land, the solution in the vast majority of > cases is to use the standard timing intrinsics DATE_AND_TIME, > SYSTEM_CLOCK, and CPU_TIME. If you read the standard, those routines have pretty much the same disclaimers as RDTSC in the wikipedia site. Also, they are often low resolution even as processors get faster. You might find that the CPU_TIME or DATE_AND_TIME values don't update at all through a fairly long computation. If you average over enough calls to a routine, then you can get a reasonably value even for a low resolution clock, but it isn't easy. I have even used RDTSC in Java, through JNI calling what looks (to Java) like a C function, with useful results. -- glen
From: James Van Buskirk on 9 Mar 2010 20:06 "JB" <foo(a)bar.invalid> wrote in message news:slrnhpdean.3eg.foo(a)kosh.hut.fi... > Here, let me formulate a corollary to Godwin's law: "As an online > programming discussion about timing grows longer, the probability of > someone suggesting use of RDTSC approaches 1". > The wikipedia page contains reasons why it should not be used except > in very specific circumstances: Yeah, don't ever use RDTSC because then you would have a chance to measure performance and possibly enhance performance rather than just blather about performance which is much more in vogue nowadays. Fortran just doesn't provide primitives which can split out the time taken by one subroutine in the context of running with everything else in the program fighting it for cache, TLB entries, and BTB entries. It is not at all unusual for lots of pieces of a program to be performing suboptimally and if you fiddle with one of the pieces the improvement (or not) can get lost in the noise. You can try to write a benchmark that only invokes the subroutine you are working on, but it's trickier to do this than to filter out the noise inherent in RDTSC. At least I have seen otherwise respected programmers write total garbage benchmarks that don't measure performance correctly because they use cache of BTB differently than the subroutine would in practice. RDTSC can "measure" glitches like interrupts and processor switchover, so it's the responsibility of the user to detect these events and filter them out so as to see what the effects of your adjustments have been. -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
From: Phred Phungus on 10 Mar 2010 03:18
JB wrote: > On 2010-03-09, glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote: >> Arjan <arjan.van.dijk(a)rivm.nl> wrote: >> >>> Until now I monitor the performance of my application by measuring the >>> real time spent by my program and subtract the value from the former >>> iteration from the latest estimate. This gives me the number of >>> seconds per iteration of my process. I have only 1 CPU, so the >>> available time is distributed over all processes. My current >>> application uses a lot of CPU and produces only a tiny bit of output, >>> so I/O-time is not restrictive. How can I measure the net cpu-time >>> spent by my program per iteration of my calculation, i.e. corrected >>> for the fraction of CPU assigned to the process? >> For IA32, I usually use a routine that returns the value of >> the time stamp counter, as given by the RDTSC instruction. > > Here, let me formulate a corollary to Godwin's law: "As an online > programming discussion about timing grows longer, the probability of > someone suggesting use of RDTSC approaches 1". I don't want to tarnish your thesis, aber habe ich etwa den Fuehrer erwaehnt? Glen's posts have none of the triteness that your law suggests. Also, bitte schoen. -- fred |