Prev: [PATCH 1/1] AGP: amd64, fix pci reference leaks
Next: [PATCH 2/3] viafb: remove unused structure member
From: Mike Galbraith on 9 Sep 2009 08:50 On Wed, 2009-09-09 at 13:54 +0200, Jens Axboe wrote: > Things are much better with this patch on the notebook! I cannot compare > with BFS as that still doesn't run anywhere I want it to run, but it's > way better than -rc9-git stock. latt numbers on the notebook have 1/3 > the max latency, average is lower, and stddev is much smaller too. That patch has a bit of bustage in it. We definitely want to turn down sched_latency though, and LAST_BUDDY also wants some examination it seems. taskset -c 3 ./xx 1 (100% cpu 1 sec interval perturbation measurement proggy. overhead is what it is not getting) xx says 2392.52 MHZ CPU perturbation threshold 0.057 usecs. .... 'nuther terminal taskset -c 3 make -j2 vmlinux xx output current (fixed breakage) patched tip tree pert/s: 153 >18842.18us: 11 min: 0.50 max:36010.37 avg:4354.06 sum/s:666171us overhead:66.62% pert/s: 160 >18767.18us: 12 min: 0.13 max:32011.66 avg:4172.69 sum/s:667631us overhead:66.66% pert/s: 156 >18499.43us: 9 min: 0.13 max:27883.24 avg:4296.08 sum/s:670189us overhead:66.49% pert/s: 146 >18480.71us: 10 min: 0.50 max:32009.38 avg:4615.19 sum/s:673818us overhead:67.26% pert/s: 154 >18433.20us: 17 min: 0.14 max:31537.12 avg:4474.14 sum/s:689018us overhead:67.68% pert/s: 158 >18520.11us: 9 min: 0.50 max:34328.86 avg:4275.66 sum/s:675554us overhead:66.76% pert/s: 154 >18683.74us: 12 min: 0.51 max:35949.23 avg:4363.67 sum/s:672005us overhead:67.04% pert/s: 154 >18745.53us: 8 min: 0.51 max:34203.43 avg:4399.72 sum/s:677556us overhead:67.03% bfs209 pert/s: 124 >18681.88us: 17 min: 0.15 max:27274.74 avg:4627.36 sum/s:573793us overhead:56.70% pert/s: 106 >18702.52us: 20 min: 0.55 max:32022.07 avg:5754.48 sum/s:609975us overhead:59.80% pert/s: 116 >19082.42us: 17 min: 0.15 max:39835.34 avg:5167.69 sum/s:599452us overhead:59.95% pert/s: 109 >19289.41us: 22 min: 0.14 max:36818.95 avg:5485.79 sum/s:597951us overhead:59.64% pert/s: 108 >19238.97us: 19 min: 0.14 max:32026.74 avg:5543.17 sum/s:598662us overhead:59.87% pert/s: 106 >19415.76us: 20 min: 0.54 max:36011.78 avg:6001.89 sum/s:636201us overhead:62.95% pert/s: 115 >19341.89us: 16 min: 0.08 max:32040.83 avg:5313.45 sum/s:611047us overhead:59.98% pert/s: 101 >19527.53us: 24 min: 0.14 max:36018.37 avg:6378.06 sum/s:644184us overhead:64.42% stock tip (ouch ouch ouch) pert/s: 153 >48453.23us: 5 min: 0.12 max:144009.85 avg:4688.90 sum/s:717401us overhead:70.89% pert/s: 172 >47209.49us: 3 min: 0.48 max:68009.05 avg:4022.55 sum/s:691879us overhead:67.05% pert/s: 148 >51139.18us: 5 min: 0.53 max:168094.76 avg:4918.14 sum/s:727885us overhead:71.65% pert/s: 171 >51350.64us: 6 min: 0.12 max:102202.79 avg:4304.77 sum/s:736115us overhead:69.24% pert/s: 153 >57686.54us: 5 min: 0.12 max:224019.85 avg:5399.31 sum/s:826094us overhead:74.50% pert/s: 172 >55886.47us: 2 min: 0.11 max:75378.18 avg:3993.52 sum/s:686885us overhead:67.67% pert/s: 157 >58819.31us: 3 min: 0.12 max:165976.63 avg:4453.16 sum/s:699146us overhead:69.91% pert/s: 149 >58410.21us: 5 min: 0.12 max:104663.89 avg:4792.73 sum/s:714116us overhead:71.41% sched_latency=20ms min_granularity=4ms pert/s: 162 >30152.07us: 2 min: 0.49 max:60011.85 avg:4272.97 sum/s:692221us overhead:68.13% pert/s: 147 >29705.33us: 8 min: 0.14 max:46577.27 avg:4792.03 sum/s:704428us overhead:70.44% pert/s: 162 >29344.16us: 2 min: 0.49 max:48010.50 avg:4176.75 sum/s:676633us overhead:67.40% pert/s: 155 >29109.69us: 2 min: 0.49 max:49575.08 avg:4423.87 sum/s:685700us overhead:68.30% pert/s: 153 >30627.66us: 3 min: 0.13 max:84005.71 avg:4573.07 sum/s:699680us overhead:69.42% pert/s: 142 >30652.47us: 5 min: 0.49 max:56760.06 avg:4991.61 sum/s:708808us overhead:70.88% pert/s: 152 >30101.12us: 2 min: 0.49 max:45757.88 avg:4519.92 sum/s:687028us overhead:67.89% pert/s: 161 >29303.50us: 3 min: 0.12 max:40011.73 avg:4238.15 sum/s:682342us overhead:67.43% NO_LAST_BUDDY pert/s: 154 >15257.87us: 28 min: 0.13 max:42004.05 avg:4590.99 sum/s:707013us overhead:70.41% pert/s: 162 >15392.05us: 34 min: 0.12 max:29021.79 avg:4177.47 sum/s:676750us overhead:66.81% pert/s: 162 >15665.11us: 33 min: 0.13 max:32008.34 avg:4237.10 sum/s:686410us overhead:67.90% pert/s: 159 >15914.89us: 31 min: 0.56 max:32056.86 avg:4268.87 sum/s:678751us overhead:67.47% pert/s: 166 >15858.94us: 26 min: 0.13 max:26655.84 avg:4055.02 sum/s:673134us overhead:66.65% pert/s: 165 >15878.96us: 32 min: 0.13 max:28010.44 avg:4107.86 sum/s:677798us overhead:66.68% pert/s: 164 >16213.55us: 29 min: 0.14 max:34263.04 avg:4186.64 sum/s:686610us overhead:68.04% pert/s: 149 >16764.54us: 20 min: 0.13 max:38688.64 avg:4758.26 sum/s:708981us overhead:70.23% -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Pavel Machek on 9 Sep 2009 10:10 Hi! > > So ... to get to the numbers - i've tested both BFS and the tip of > > the latest upstream scheduler tree on a testbox of mine. I > > intentionally didnt test BFS on any really large box - because you > > described its upper limit like this in the announcement: > > I ran a simple test as well, since I was curious to see how it performed > wrt interactiveness. One of my pet peeves with the current scheduler is > that I have to nice compile jobs, or my X experience is just awful while > the compile is running. > > Now, this test case is something that attempts to see what > interactiveness would be like. It'll run a given command line while at > the same time logging delays. The delays are measured as follows: > > - The app creates a pipe, and forks a child that blocks on reading from > that pipe. > - The app sleeps for a random period of time, anywhere between 100ms > and 2s. When it wakes up, it gets the current time and writes that to > the pipe. > - The child then gets woken, checks the time on its own, and logs the > difference between the two. > > The idea here being that the delay between writing to the pipe and the > child reading the data and comparing should (in some way) be indicative > of how responsive the system would seem to a user. > > The test app was quickly hacked up, so don't put too much into it. The > test run is a simple kernel compile, using -jX where X is the number of > threads in the system. The files are cache hot, so little IO is done. > The -x2 run is using the double number of processes as we have threads, > eg -j128 on a 64 thread box. Could you post the source? Someone else might get us numbers... preferably on dualcore box or something... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on 9 Sep 2009 14:10 * Jens Axboe <jens.axboe(a)oracle.com> wrote: > On Wed, Sep 09 2009, Jens Axboe wrote: > > On Wed, Sep 09 2009, Jens Axboe wrote: > > > On Wed, Sep 09 2009, Mike Galbraith wrote: > > > > On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote: > > > > > * Jens Axboe <jens.axboe(a)oracle.com> wrote: > > > > > > > > > > > On Tue, Sep 08 2009, Peter Zijlstra wrote: > > > > > > > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote: > > > > > > > > And here's a newer version. > > > > > > > > > > > > > > I tinkered a bit with your proglet and finally found the > > > > > > > problem. > > > > > > > > > > > > > > You used a single pipe per child, this means the loop in > > > > > > > run_child() would consume what it just wrote out until it got > > > > > > > force preempted by the parent which would also get woken. > > > > > > > > > > > > > > This results in the child spinning a while (its full quota) and > > > > > > > only reporting the last timestamp to the parent. > > > > > > > > > > > > Oh doh, that's not well thought out. Well it was a quick hack :-) > > > > > > Thanks for the fixup, now it's at least usable to some degree. > > > > > > > > > > What kind of latencies does it report on your box? > > > > > > > > > > Our vanilla scheduler default latency targets are: > > > > > > > > > > single-core: 20 msecs > > > > > dual-core: 40 msecs > > > > > quad-core: 60 msecs > > > > > opto-core: 80 msecs > > > > > > > > > > You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via > > > > > /proc/sys/kernel/sched_latency_ns: > > > > > > > > > > echo 10000000 > /proc/sys/kernel/sched_latency_ns > > > > > > > > He would also need to lower min_granularity, otherwise, it'd be larger > > > > than the whole latency target. > > > > > > > > I'm testing right now, and one thing that is definitely a problem is the > > > > amount of sleeper fairness we're giving. A full latency is just too > > > > much short term fairness in my testing. While sleepers are catching up, > > > > hogs languish. That's the biggest issue going on. > > > > > > > > I've also been doing some timings of make -j4 (looking at idle time), > > > > and find that child_runs_first is mildly detrimental to fork/exec load, > > > > as are buddies. > > > > > > > > I'm running with the below at the moment. (the kthread/workqueue thing > > > > is just because I don't see any reason for it to exist, so consider it > > > > to be a waste of perfectly good math;) > > > > > > Using latt, it seems better than -rc9. The below are entries logged > > > while running make -j128 on a 64 thread box. I did two runs on each, and > > > latt is using 8 clients. > > > > > > -rc9 > > > Max 23772 usec > > > Avg 1129 usec > > > Stdev 4328 usec > > > Stdev mean 117 usec > > > > > > Max 32709 usec > > > Avg 1467 usec > > > Stdev 5095 usec > > > Stdev mean 136 usec > > > > > > -rc9 + patch > > > > > > Max 11561 usec > > > Avg 1532 usec > > > Stdev 1994 usec > > > Stdev mean 48 usec > > > > > > Max 9590 usec > > > Avg 1550 usec > > > Stdev 2051 usec > > > Stdev mean 50 usec > > > > > > max latency is way down, and much smaller variation as well. > > > > Things are much better with this patch on the notebook! I cannot compare > > with BFS as that still doesn't run anywhere I want it to run, but it's > > way better than -rc9-git stock. latt numbers on the notebook have 1/3 > > the max latency, average is lower, and stddev is much smaller too. > > BFS210 runs on the laptop (dual core intel core duo). With make -j4 > running, I clock the following latt -c8 'sleep 10' latencies: > > -rc9 > > Max 17895 usec > Avg 8028 usec > Stdev 5948 usec > Stdev mean 405 usec > > Max 17896 usec > Avg 4951 usec > Stdev 6278 usec > Stdev mean 427 usec > > Max 17885 usec > Avg 5526 usec > Stdev 6819 usec > Stdev mean 464 usec > > -rc9 + mike > > Max 6061 usec > Avg 3797 usec > Stdev 1726 usec > Stdev mean 117 usec > > Max 5122 usec > Avg 3958 usec > Stdev 1697 usec > Stdev mean 115 usec > > Max 6691 usec > Avg 2130 usec > Stdev 2165 usec > Stdev mean 147 usec At least in my tests these latencies were mainly due to a bug in latt.c - i've attached the fixed version. The other reason was wakeup batching. If you do this: echo 0 > /proc/sys/kernel/sched_wakeup_granularity_ns .... then you can switch on insta-wakeups on -tip too. With a dual-core box and a make -j4 background job running, on latest -tip i get the following latencies: $ ./latt -c8 sleep 30 Entries: 656 (clients=8) Averages: ------------------------------ Max 158 usec Avg 12 usec Stdev 10 usec Thanks, Ingo
From: Nikos Chantziaras on 9 Sep 2009 16:20 On 09/09/2009 09:04 PM, Ingo Molnar wrote: > [...] > * Jens Axboe<jens.axboe(a)oracle.com> wrote: > >> On Wed, Sep 09 2009, Jens Axboe wrote: >> [...] >> BFS210 runs on the laptop (dual core intel core duo). With make -j4 >> running, I clock the following latt -c8 'sleep 10' latencies: >> >> -rc9 >> >> Max 17895 usec >> Avg 8028 usec >> Stdev 5948 usec >> Stdev mean 405 usec >> >> Max 17896 usec >> Avg 4951 usec >> Stdev 6278 usec >> Stdev mean 427 usec >> >> Max 17885 usec >> Avg 5526 usec >> Stdev 6819 usec >> Stdev mean 464 usec >> >> -rc9 + mike >> >> Max 6061 usec >> Avg 3797 usec >> Stdev 1726 usec >> Stdev mean 117 usec >> >> Max 5122 usec >> Avg 3958 usec >> Stdev 1697 usec >> Stdev mean 115 usec >> >> Max 6691 usec >> Avg 2130 usec >> Stdev 2165 usec >> Stdev mean 147 usec > > At least in my tests these latencies were mainly due to a bug in > latt.c - i've attached the fixed version. > > The other reason was wakeup batching. If you do this: > > echo 0> /proc/sys/kernel/sched_wakeup_granularity_ns > > ... then you can switch on insta-wakeups on -tip too. > > With a dual-core box and a make -j4 background job running, on > latest -tip i get the following latencies: > > $ ./latt -c8 sleep 30 > Entries: 656 (clients=8) > > Averages: > ------------------------------ > Max 158 usec > Avg 12 usec > Stdev 10 usec With your version of latt.c, I get these results with 2.6-tip vs 2.6.31-rc9-bfs: (mainline) Averages: ------------------------------ Max 50 usec Avg 12 usec Stdev 3 usec (BFS) Averages: ------------------------------ Max 474 usec Avg 11 usec Stdev 16 usec However, the interactivity problems still remain. Does that mean it's not a latency issue? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on 9 Sep 2009 17:00
On Wed, Sep 09 2009, Nikos Chantziaras wrote: > On 09/09/2009 09:04 PM, Ingo Molnar wrote: >> [...] >> * Jens Axboe<jens.axboe(a)oracle.com> wrote: >> >>> On Wed, Sep 09 2009, Jens Axboe wrote: >>> [...] >>> BFS210 runs on the laptop (dual core intel core duo). With make -j4 >>> running, I clock the following latt -c8 'sleep 10' latencies: >>> >>> -rc9 >>> >>> Max 17895 usec >>> Avg 8028 usec >>> Stdev 5948 usec >>> Stdev mean 405 usec >>> >>> Max 17896 usec >>> Avg 4951 usec >>> Stdev 6278 usec >>> Stdev mean 427 usec >>> >>> Max 17885 usec >>> Avg 5526 usec >>> Stdev 6819 usec >>> Stdev mean 464 usec >>> >>> -rc9 + mike >>> >>> Max 6061 usec >>> Avg 3797 usec >>> Stdev 1726 usec >>> Stdev mean 117 usec >>> >>> Max 5122 usec >>> Avg 3958 usec >>> Stdev 1697 usec >>> Stdev mean 115 usec >>> >>> Max 6691 usec >>> Avg 2130 usec >>> Stdev 2165 usec >>> Stdev mean 147 usec >> >> At least in my tests these latencies were mainly due to a bug in >> latt.c - i've attached the fixed version. >> >> The other reason was wakeup batching. If you do this: >> >> echo 0> /proc/sys/kernel/sched_wakeup_granularity_ns >> >> ... then you can switch on insta-wakeups on -tip too. >> >> With a dual-core box and a make -j4 background job running, on >> latest -tip i get the following latencies: >> >> $ ./latt -c8 sleep 30 >> Entries: 656 (clients=8) >> >> Averages: >> ------------------------------ >> Max 158 usec >> Avg 12 usec >> Stdev 10 usec > > With your version of latt.c, I get these results with 2.6-tip vs > 2.6.31-rc9-bfs: > > > (mainline) > Averages: > ------------------------------ > Max 50 usec > Avg 12 usec > Stdev 3 usec > > > (BFS) > Averages: > ------------------------------ > Max 474 usec > Avg 11 usec > Stdev 16 usec > > > However, the interactivity problems still remain. Does that mean it's > not a latency issue? It probably just means that latt isn't a good measure of the problem. Which isn't really too much of a surprise. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |