Prev: [patch 0/12] sched: fastpath cycle recovery
Next: [PATCH] DRIVER: Libertas: Fixed coding style in rx.c
From: Xiao Guangrong on 11 Mar 2010 04:50 We found that if enable lock's trace events, the 'sysbench mutex' benchmark program can run quicker. The simple program that is attached can reproduce it, the system info, kernel config, and the script are also attached. The test step is below: # tar -zxvf test-mutex.tar.bz # cd test-mutex # make # ./tscript.sh >& log # cat log | grep "real" real 0m46.765s < all trace events are disabled > real 0m47.073s real 0m47.402s real 0m46.458s real 0m47.433s real 0m47.395s real 0m47.010s real 0m47.454s real 0m47.044s real 0m47.464s real 0m39.245s < enable lock's trace events > real 0m40.822s real 0m40.779s real 0m40.549s real 0m40.605s real 0m40.923s real 0m40.560s real 0m41.050s real 0m40.757s real 0m40.715s [ "< ... >" is my comments ] From the result, we can see the program's runtime is less if enable lock's trace events. The conclusion is weird but i don't know why.
From: Hitoshi Mitake on 12 Mar 2010 23:30 On 03/11/10 18:43, Xiao Guangrong wrote: > We found that if enable lock's trace events, the 'sysbench mutex' > benchmark program can run quicker. > > The simple program that is attached can reproduce it, the system info, > kernel config, and the script are also attached. > > The test step is below: > > # tar -zxvf test-mutex.tar.bz > # cd test-mutex > # make > # ./tscript.sh>& log > # cat log | grep "real" > real 0m46.765s< all trace events are disabled> > real 0m47.073s > real 0m47.402s > real 0m46.458s > real 0m47.433s > real 0m47.395s > real 0m47.010s > real 0m47.454s > real 0m47.044s > real 0m47.464s > real 0m39.245s< enable lock's trace events> > real 0m40.822s > real 0m40.779s > real 0m40.549s > real 0m40.605s > real 0m40.923s > real 0m40.560s > real 0m41.050s > real 0m40.757s > real 0m40.715s > > [ "< ...>" is my comments ] > > From the result, we can see the program's runtime is less if enable lock's > trace events. > > The conclusion is weird but i don't know why. Hi Xiao, It's hard to believe, but... % sudo ./tscript.sh &> log % grep real log real 0m24.132s real 0m23.535s real 0m20.064s real 0m16.636s <- enabled from here real 0m16.435s real 0m17.339s I could reproduce your surprising result. (I only execed your benchmark 3 times.) I rewrote your mainc.c and checked contended count of each test like this way, if (pthread_mutex_trylock(&mutex) == EBUSY) { pthread_mutex_lock(&mutex); atomic_inc(&contended); } # I'll attach my new mainc.c % cat log Run mutex with trace events disabled... contended:25191221 real 0m24.132s user 0m17.149s sys 1m18.933s contended:25360563 real 0m23.535s user 0m17.233s sys 1m16.213s contended:23813911 real 0m20.064s user 0m15.561s sys 1m4.332s Run mutex with lockdep events enabled... contended:11458318 real 0m16.636s user 0m10.173s sys 0m55.595s contended:11881095 real 0m16.435s user 0m10.273s sys 0m54.911s contended:11261650 real 0m17.339s user 0m10.225s sys 0m58.556s It seems that num of contention decreased to about half. I don't know why this happened and effect to performance of it, but this result is worth to consider. Thanks, Hitoshi
From: Xiao Guangrong on 8 Apr 2010 05:10
Hi Michel, Michel Lespinasse wrote: > Sorry for the late reply... > > One thing to consider in locking micro-benchmarks is that often, code > changes that slow down parts of the contended code path where the lock > is not held, will result in an increase of the reported > micro-benchmark metric. This effect is particularly marked for > micro-benchmarks that consist of multiple threads doing empty > acquire/release loops. > > As a thought experiment, imagine what would happen if you added a > one-millisecond sleep in the contended code path for mutex > acquisition. Soon all but one of your benchmark threads would be > sleeping, and the only non-sleeping thread would be able to spin on > that lock/unlock loop with no contention, resulting in very nice > results for the micro-benchmark. Remove the sleep and the lock/unlock > threads will have to contend, resulting in lower reported performance > metrics. Great thanks for your valuable reply that makes we see the issue more clearly. I've do the test address your conjecture that add usleep(1) in mutex acquisition path, the test result shows contention is reduced. And i also do the test that does more work in mutex holding path, the result shows optimization ratio is decreased. Thanks, Xiao -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |