Prev: [PATCH 1/1] AGP: amd64, fix pci reference leaks
Next: [PATCH 2/3] viafb: remove unused structure member
From: Benjamin Herrenschmidt on 9 Sep 2009 07:40 On Wed, 2009-09-09 at 20:44 +0930, David Newall wrote: > Benjamin Herrenschmidt wrote: > > On Tue, 2009-09-08 at 22:22 +0200, Frans Pop wrote: > > > >> Arjan van de Ven wrote: > >> > >>> the latest version of latencytop also has a GUI (thanks to Ben) > >>> > >> That looks nice, but... > >> > >> I kind of miss the split screen feature where latencytop would show both > >> the overall figures + the ones for the currently most affected task. > >> Downside of that last was that I never managed to keep the display on a > >> specific task. > >> > > > > Any idea of how to present it ? I'm happy to spend 5mn improving the > > GUI :-) > > Use a second window. I'm not too fan of cluttering the screen with windows... I suppose I could have a separate pane for the "global" view but I haven't found a way to lay it out in a way that doesn't suck :-) I could have done a 3rd colums on the right with the overall view but it felt like using too much screen real estate. I'll experiment a bit, maybe 2 windows is indeed the solution. But you get into the problem of what to do if only one of them is closed ? Do I add a menu bar on each of them to re-open the "other" one if closed ? etc... Don't get me wrong, I have a shitload of experience doing GUIs (back in the old days when I was hacking on MacOS), though I'm relatively new to GTK. But GUI design is rather hard in general :-) Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Frans Pop on 9 Sep 2009 08:00 On Wednesday 09 September 2009, Benjamin Herrenschmidt wrote: > On Tue, 2009-09-08 at 22:22 +0200, Frans Pop wrote: > > Arjan van de Ven wrote: > > > the latest version of latencytop also has a GUI (thanks to Ben) > > > > That looks nice, but... > > > > I kind of miss the split screen feature where latencytop would show > > both the overall figures + the ones for the currently most affected > > task. Downside of that last was that I never managed to keep the > > display on a specific task. > > Any idea of how to present it ? I'm happy to spend 5mn improving the > GUI :-) I'd say add an extra horizontal split in the second column, so you'd get three areas in the right column: - top for the global target (permanently) - middle for current, either: - "current most lagging" if "Global" is selected in left column - selected process if a specific target is selected in left column - bottom for backtrace Maybe with that setup "Global" in the left column should be renamed to something like "Dynamic". The backtrace area would show selection from either top or middle areas (so selecting a cause in top or middle area should unselect causes in the other). Cheers, FJP -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on 9 Sep 2009 08:00 On Wed, Sep 09 2009, Jens Axboe wrote: > On Wed, Sep 09 2009, Mike Galbraith wrote: > > On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote: > > > * Jens Axboe <jens.axboe(a)oracle.com> wrote: > > > > > > > On Tue, Sep 08 2009, Peter Zijlstra wrote: > > > > > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote: > > > > > > And here's a newer version. > > > > > > > > > > I tinkered a bit with your proglet and finally found the > > > > > problem. > > > > > > > > > > You used a single pipe per child, this means the loop in > > > > > run_child() would consume what it just wrote out until it got > > > > > force preempted by the parent which would also get woken. > > > > > > > > > > This results in the child spinning a while (its full quota) and > > > > > only reporting the last timestamp to the parent. > > > > > > > > Oh doh, that's not well thought out. Well it was a quick hack :-) > > > > Thanks for the fixup, now it's at least usable to some degree. > > > > > > What kind of latencies does it report on your box? > > > > > > Our vanilla scheduler default latency targets are: > > > > > > single-core: 20 msecs > > > dual-core: 40 msecs > > > quad-core: 60 msecs > > > opto-core: 80 msecs > > > > > > You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via > > > /proc/sys/kernel/sched_latency_ns: > > > > > > echo 10000000 > /proc/sys/kernel/sched_latency_ns > > > > He would also need to lower min_granularity, otherwise, it'd be larger > > than the whole latency target. > > > > I'm testing right now, and one thing that is definitely a problem is the > > amount of sleeper fairness we're giving. A full latency is just too > > much short term fairness in my testing. While sleepers are catching up, > > hogs languish. That's the biggest issue going on. > > > > I've also been doing some timings of make -j4 (looking at idle time), > > and find that child_runs_first is mildly detrimental to fork/exec load, > > as are buddies. > > > > I'm running with the below at the moment. (the kthread/workqueue thing > > is just because I don't see any reason for it to exist, so consider it > > to be a waste of perfectly good math;) > > Using latt, it seems better than -rc9. The below are entries logged > while running make -j128 on a 64 thread box. I did two runs on each, and > latt is using 8 clients. > > -rc9 > Max 23772 usec > Avg 1129 usec > Stdev 4328 usec > Stdev mean 117 usec > > Max 32709 usec > Avg 1467 usec > Stdev 5095 usec > Stdev mean 136 usec > > -rc9 + patch > > Max 11561 usec > Avg 1532 usec > Stdev 1994 usec > Stdev mean 48 usec > > Max 9590 usec > Avg 1550 usec > Stdev 2051 usec > Stdev mean 50 usec > > max latency is way down, and much smaller variation as well. Things are much better with this patch on the notebook! I cannot compare with BFS as that still doesn't run anywhere I want it to run, but it's way better than -rc9-git stock. latt numbers on the notebook have 1/3 the max latency, average is lower, and stddev is much smaller too. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nikos Chantziaras on 9 Sep 2009 08:00 On 09/08/2009 06:23 PM, Peter Zijlstra wrote: > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote: >> And here's a newer version. > > I tinkered a bit with your proglet and finally found the problem. > > You used a single pipe per child, this means the loop in run_child() > would consume what it just wrote out until it got force preempted by the > parent which would also get woken. > > This results in the child spinning a while (its full quota) and only > reporting the last timestamp to the parent. > > Since consumer (parent) is a single thread the program basically > measures the worst delay in a thundering herd wakeup of N children. > > The below version yields: > > idle > > [root(a)opteron sched]# ./latt -c8 sleep 30 > Entries: 664 (clients=8) > > Averages: > ------------------------------ > Max 128 usec > Avg 26 usec > Stdev 16 usec > > > make -j4 > > [root(a)opteron sched]# ./latt -c8 sleep 30 > Entries: 648 (clients=8) > > Averages: > ------------------------------ > Max 20861 usec > Avg 3763 usec > Stdev 4637 usec > > > Mike's patch, make -j4 > > [root(a)opteron sched]# ./latt -c8 sleep 30 > Entries: 648 (clients=8) > > Averages: > ------------------------------ > Max 17854 usec > Avg 6298 usec > Stdev 4735 usec I've run two tests with this tool. One with mainline (2.6.31-rc9) and one patched with 2.6.31-rc9-sched-bfs-210.patch. Before running this test, I disabled the cron daemon in order not to have something pop-up in the background out of a sudden. The test consisted of starting a "make -j2" in the kernel tree inside a 3GB tmpfs mountpoint and then running 'latt "mplayer -vo gl2 -framedrop videofile.mkv"' (mplayer in this case is a single-threaded application.) Caches were warmed-up first; the results below are from the second run of each test. The kernel .config file used by the running kernels and also for "make -j2" is: http://foss.math.aegean.gr/~realnc/kernel/config-2.6.31-rc9-latt-test The video file used for mplayer is: http://foss.math.aegean.gr/~realnc/vids/3DMark2000.mkv (100MB) (The reason this was used is that it's a 60FPS video, therefore very smooth and makes all skips stand out clearly.) Results for mainline: Averages: ------------------------------ Max 29930 usec Avg 11043 usec Stdev 5752 usec Results for BFS: Averages: ------------------------------ Max 14017 usec Avg 49 usec Stdev 697 usec One thing that's worth noting is that with mainline, mplayer would occasionally spit this out: YOUR SYSTEM IS TOO SLOW TO PLAY THIS which doesn't happen with BFS. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on 9 Sep 2009 08:30
On Wed, Sep 09 2009, Jens Axboe wrote: > On Wed, Sep 09 2009, Jens Axboe wrote: > > On Wed, Sep 09 2009, Mike Galbraith wrote: > > > On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote: > > > > * Jens Axboe <jens.axboe(a)oracle.com> wrote: > > > > > > > > > On Tue, Sep 08 2009, Peter Zijlstra wrote: > > > > > > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote: > > > > > > > And here's a newer version. > > > > > > > > > > > > I tinkered a bit with your proglet and finally found the > > > > > > problem. > > > > > > > > > > > > You used a single pipe per child, this means the loop in > > > > > > run_child() would consume what it just wrote out until it got > > > > > > force preempted by the parent which would also get woken. > > > > > > > > > > > > This results in the child spinning a while (its full quota) and > > > > > > only reporting the last timestamp to the parent. > > > > > > > > > > Oh doh, that's not well thought out. Well it was a quick hack :-) > > > > > Thanks for the fixup, now it's at least usable to some degree. > > > > > > > > What kind of latencies does it report on your box? > > > > > > > > Our vanilla scheduler default latency targets are: > > > > > > > > single-core: 20 msecs > > > > dual-core: 40 msecs > > > > quad-core: 60 msecs > > > > opto-core: 80 msecs > > > > > > > > You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via > > > > /proc/sys/kernel/sched_latency_ns: > > > > > > > > echo 10000000 > /proc/sys/kernel/sched_latency_ns > > > > > > He would also need to lower min_granularity, otherwise, it'd be larger > > > than the whole latency target. > > > > > > I'm testing right now, and one thing that is definitely a problem is the > > > amount of sleeper fairness we're giving. A full latency is just too > > > much short term fairness in my testing. While sleepers are catching up, > > > hogs languish. That's the biggest issue going on. > > > > > > I've also been doing some timings of make -j4 (looking at idle time), > > > and find that child_runs_first is mildly detrimental to fork/exec load, > > > as are buddies. > > > > > > I'm running with the below at the moment. (the kthread/workqueue thing > > > is just because I don't see any reason for it to exist, so consider it > > > to be a waste of perfectly good math;) > > > > Using latt, it seems better than -rc9. The below are entries logged > > while running make -j128 on a 64 thread box. I did two runs on each, and > > latt is using 8 clients. > > > > -rc9 > > Max 23772 usec > > Avg 1129 usec > > Stdev 4328 usec > > Stdev mean 117 usec > > > > Max 32709 usec > > Avg 1467 usec > > Stdev 5095 usec > > Stdev mean 136 usec > > > > -rc9 + patch > > > > Max 11561 usec > > Avg 1532 usec > > Stdev 1994 usec > > Stdev mean 48 usec > > > > Max 9590 usec > > Avg 1550 usec > > Stdev 2051 usec > > Stdev mean 50 usec > > > > max latency is way down, and much smaller variation as well. > > Things are much better with this patch on the notebook! I cannot compare > with BFS as that still doesn't run anywhere I want it to run, but it's > way better than -rc9-git stock. latt numbers on the notebook have 1/3 > the max latency, average is lower, and stddev is much smaller too. BFS210 runs on the laptop (dual core intel core duo). With make -j4 running, I clock the following latt -c8 'sleep 10' latencies: -rc9 Max 17895 usec Avg 8028 usec Stdev 5948 usec Stdev mean 405 usec Max 17896 usec Avg 4951 usec Stdev 6278 usec Stdev mean 427 usec Max 17885 usec Avg 5526 usec Stdev 6819 usec Stdev mean 464 usec -rc9 + mike Max 6061 usec Avg 3797 usec Stdev 1726 usec Stdev mean 117 usec Max 5122 usec Avg 3958 usec Stdev 1697 usec Stdev mean 115 usec Max 6691 usec Avg 2130 usec Stdev 2165 usec Stdev mean 147 usec -rc9 + bfs210 Max 92 usec Avg 27 usec Stdev 19 usec Stdev mean 1 usec Max 80 usec Avg 23 usec Stdev 15 usec Stdev mean 1 usec Max 97 usec Avg 27 usec Stdev 21 usec Stdev mean 1 usec One thing I also noticed is that when I have logged in, I run xmodmap manually to load some keymappings (I always tell myself to add this to the log in scripts, but I suspend/resume this laptop for weeks at the time and forget before the next boot). With the stock kernel, xmodmap will halt X updates and take forever to run. With BFS, it returned instantly. As I would expect. So the BFS design may be lacking in the scalability end (which is obviously true, if you look at the code), but I can understand the appeal of the scheduler for "normal" desktop people. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |