Prev: Change some config options to be automatically selected
Next: i2c-core: fix some coding style issues in i2c-core.c
From: Mathieu Desnoyers on 18 Apr 2010 02:30 Huge CFS vruntime spread (18 minutes) has been observed with LTTng while simply running Xorg on a uniprocessor machine, 2.6.33.2 kernel. Detailed explanation in my ELC2010 presentation at: http://www.efficios.com/elc2010 (includes slides, ad-hoc CFS instrumentation patches and wakeup latency test program) I've torn the CFS scheduler apart in the past days to figure out what is causing this weird behavior, and the culprit seems to be place_entity(). The problem appears to be the cumulative effect of letting the min_vruntime go backward when putting sleepers back on the runqueue. It lets the vruntime spread grow to "entertaining" values (it is supposed to be in the 5ms range, not 18 minutes!). In the original code, a max between the sched entity vruntime and the calculated vruntime was supposed to "ensure that the thread time never go backward". But I don't see why we even care about that. The key point is that the min_vruntime of the runqueue should not go backward. I propose to fix this by calculating the relative offset from min_vruntime + sysctl_sched_latency rather than directly from min_vruntime. I also ensure that the value never goes below min_vruntime. Under the Xorg workload, moving a few windows around and starting firefox while executing the wakeup-latency.c program (program waking up every 10ms and reporting wakeup latency), this patch brings worse latency from 60ms down to 12ms. Even doing a kernel compilation at the same time, the worse latency stays around 20ms now. I'm submitting this patch ASAP, since it seems to fix CFS issues that many people have been complaining about. I'm sending it as RFC because testing its effect on more workloads would be welcome. I can see that place_entity() has stayed more or less the same since 2.6.24 (and maybe even before, as code has just been reorganised between 2.6.23 and 2.6.24), so we can expect this to be a problem people have been experiencing for a while. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> CC: Ingo Molnar <mingo(a)elte.hu> CC: Peter Zijlstra <a.p.zijlstra(a)chello.nl> CC: Mike Galbraith <efault(a)gmx.de> CC: Andrew Morton <akpm(a)linux-foundation.org> CC: Linus Torvalds <torvalds(a)linux-foundation.org> CC: Greg Kroah-Hartman <greg(a)kroah.com> CC: Steven Rostedt <rostedt(a)goodmis.org> CC: Jarkko Nikula <jhnikula(a)gmail.com> CC: Tony Lindgren <tony(a)atomide.com> --- kernel/sched_fair.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) Index: linux-2.6-lttng.git/kernel/sched_fair.c =================================================================== --- linux-2.6-lttng.git.orig/kernel/sched_fair.c 2010-04-18 01:44:19.000000000 -0400 +++ linux-2.6-lttng.git/kernel/sched_fair.c 2010-04-18 01:47:38.000000000 -0400 @@ -738,6 +738,14 @@ unsigned long thresh = sysctl_sched_latency; /* + * Place the woken up task relative to + * min_vruntime + sysctl_sched_latency. + * We must _never_ decrement min_vruntime, because the effect is + * that spread increases progressively under the Xorg workload. + */ + vruntime += sysctl_sched_latency; + + /* * Convert the sleeper threshold into virtual time. * SCHED_IDLE is a special sub-class. We care about * fairness only relative to other SCHED_IDLE tasks, @@ -755,11 +763,10 @@ thresh >>= 1; vruntime -= thresh; + /* Ensure min_vruntime never go backwards. */ + vruntime = max_t(u64, vruntime, cfs_rq->min_vruntime); } - /* ensure we never gain time by being placed backwards. */ - vruntime = max_vruntime(se->vruntime, vruntime); - se->vruntime = vruntime; } -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |