Prev: perf fix
Next: [GIT PULL] PM / hibernate fix for 2.6.34
From: Chase Douglas on 13 Apr 2010 19:20 There's a period of 10 ticks where calc_load_tasks is updated by all the cpus for the load avg. Usually all the cpus do this during the first tick. If any cpus go idle, calc_load_tasks is decremented accordingly. However, if they wake up calc_load_tasks is not incremented. Thus, if cpus go idle during the 10 tick period, calc_load_tasks may be decremented to a non-representative value. This issue can lead to systems having a load avg of exactly 0, even though the real load avg could theoretically be up to NR_CPUS. This change defers calc_load_tasks accounting after each cpu updates the count until after the 10 tick update window. A few points: * A global atomic deferral counter, and not per-cpu vars, is needed because a cpu may go NOHZ idle and not be able to update the global calc_load_tasks variable for subsequent load calculations. * It is not enough to add calls to account for the load when a cpu is awakened: - Load avg calculation must be independent of cpu load. - If a cpu is awakend by one tasks, but then has more scheduled before the end of the update window, only the first task will be accounted. BugLink: http://bugs.launchpad.net/bugs/513848 Signed-off-by: Chase Douglas <chase.douglas(a)canonical.com> Acked-by: Colin King <colin.king(a)canonical.com> Acked-by: Andy Whitcroft <apw(a)canonical.com> --- kernel/sched.c | 24 ++++++++++++++++++++++-- 1 files changed, 22 insertions(+), 2 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index abb36b1..be348cd 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -3010,6 +3010,7 @@ unsigned long this_cpu_load(void) /* Variables and functions for calc_load */ static atomic_long_t calc_load_tasks; +static atomic_long_t calc_load_tasks_deferred; static unsigned long calc_load_update; unsigned long avenrun[3]; EXPORT_SYMBOL(avenrun); @@ -3064,7 +3065,7 @@ void calc_global_load(void) */ static void calc_load_account_active(struct rq *this_rq) { - long nr_active, delta; + long nr_active, delta, deferred; nr_active = this_rq->nr_running; nr_active += (long) this_rq->nr_uninterruptible; @@ -3072,6 +3073,25 @@ static void calc_load_account_active(struct rq *this_rq) if (nr_active != this_rq->calc_load_active) { delta = nr_active - this_rq->calc_load_active; this_rq->calc_load_active = nr_active; + + /* + * Update calc_load_tasks only once per cpu in 10 tick update + * window. + */ + if (unlikely(time_before(jiffies, this_rq->calc_load_update) && + time_after_eq(jiffies, calc_load_update))) { + if (delta) + atomic_long_add(delta, + &calc_load_tasks_deferred); + return; + } + + if (atomic_long_read(&calc_load_tasks_deferred)) { + deferred = atomic_long_xchg(&calc_load_tasks_deferred, + 0); + delta += deferred; + } + atomic_long_add(delta, &calc_load_tasks); } } @@ -3106,8 +3126,8 @@ static void update_cpu_load(struct rq *this_rq) } if (time_after_eq(jiffies, this_rq->calc_load_update)) { - this_rq->calc_load_update += LOAD_FREQ; calc_load_account_active(this_rq); + this_rq->calc_load_update += LOAD_FREQ; } } -- 1.6.3.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Pages: 1 Prev: perf fix Next: [GIT PULL] PM / hibernate fix for 2.6.34 |