From: Victor Javier on 4 May 2010 03:20 Hello, I am doing some research where I need to collect performance information for SPEC CPU2006 benchmarks on a POWER6 JS22 system. Previously I was using perfmon2, but after the release of "performance counters for linux" (and the 'perf' tool), I decided to try it. One of the reasons was the native support for multiplexing. However, I have been noticing a much higher variability when using perf, compared to perfmon2. As an example, I will provide data for 'bwaves' benchmark when run with the reference input set (it takes around 20 minutes to finish). The information for the kernels I am using is: * perfmon2: Linux version 2.6.28-pfmon2 (gcc version 4.1.2 20070115 (SUSE Linux)) #6 SMP * perf: Linux version 2.6.33.3-perf (gcc version 4.1.2 20070115 (SUSE Linux)) #1 SMP I am using libpfm version 3.8. I can provide more information, such as modules, detailed processor information, etc.) if necessary. The commands I used to collect the counters are: perfmon2: pfmon -e PM_CYC,PM_INST_CMPL,PM_LD_MISS_L1 ./bwaves_base.Linux64 perf: perf stat -e r1e:u,r2:u,r80080:u ./bwaves_base.Linux64 I also tried to pin the execution to a given CPU, but the results were the same. I repeated the executions 10 times, so I am also providing the mean and the standard deviation. ============ = perfmon2 = ============ cycles instrs completed L1 load misses 4,567,041,667,206 2,772,827,993,242 6,918,871,375 4,569,071,274,248 2,772,827,992,642 6,931,066,292 4,568,234,790,260 2,772,827,992,716 6,922,975,235 4,566,485,780,016 2,772,827,992,065 6,917,600,192 4,566,437,677,239 2,772,827,992,067 6,915,222,376 4,566,640,807,800 2,772,827,992,066 6,915,703,838 4,566,466,402,423 2,772,827,992,062 6,914,107,325 4,569,322,329,138 2,772,828,006,865 6,933,546,730 4,567,018,722,323 2,772,827,992,066 6,914,210,622 4,566,778,622,700 2,772,827,992,066 6,914,251,098 mean 4,567,349,807,335 2,772,827,993,786 6,919,755,508 stdev 1,107,043,810 4,614 7,178,958 ======== = perf = ======== cycles instrs completed L1 load misses 4,562,017,366,591 2,772,768,370,128 7,134,353,697 4,541,500,651,248 2,772,868,724,285 6,341,491,710 4,550,876,532,582 2,772,787,520,375 6,661,719,666 4,540,558,691,334 2,772,868,724,156 6,266,617,715 4,573,942,460,136 2,772,861,831,519 7,419,020,488 4,587,876,861,751 2,772,868,724,189 8,174,507,077 4,550,771,568,044 2,772,841,147,861 6,547,437,055 4,600,947,093,875 2,772,787,520,375 9,152,895,835 4,572,501,705,517 2,772,861,831,526 7,765,464,256 4,561,690,369,227 2,772,787,520,368 6,902,452,934 mean 4,564,268,330,031 2,772,830,191,478 7,236,596,043 stdev 19,770,352,264 41,980,009 914,965,698 As can be seen, the standard deviation for perf is significantly higher. Considering the instructions completed, perf shows a 10000x higher standard deviation. Although this variation may not be very high if compared to the absolute number of instructions completed, it is an issue for the case of L1 load misses. In the case of perfmon2 I can expect misses to be in the range [6,905,397,592 .. 6,934,113,424], which is a tight confidence interval. However, for perf this interval grows until [5,406,664,646 .. 9,066,527,440]. This variation is clearly not acceptable, as I cannot really draw any conclusion from those results. I would like to know if you are aware of this issue, and which could be the causes. I would also appreciate any help into fixing this. In case it is not easy to read the data, I provide it as a separate PDF file as well. I also attach a couple of graphs showing the variation for instructions and misses. Thank you for any help on this, Victor
|
Pages: 1 Prev: RCU fixes for 2.6.34 Next: watchdog: update/improve/consolidate watchdog driver parameters |