High load average on idle machine running 2.6.32 [Kernel]

Prev: ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()
Next: sched: Fix boot crash by zalloc()ing most of the cpu masks

From: James Pearson on 7 Dec 2009 18:40

I've booted a 64 bit 2.6.32 kernel on dual processor, quad core Xeon
E5440 machine. The load average when the machine is idle varies between
2 and 3.

When using a 2.6.31 kernel on the same machine, the load average when
idle is nearly 0

The kernel doesn't use modules - all that is needed is compiled in. The
machine uses NFS-root

Strangely, when I run 'iftop' (from http://www.ex-parrot.com/pdw/iftop/)
using the 2.6.32 kernel, the load average drops to below 0.5 - stop
running iftop, and the load average climbs again ...

Any idea what might be causing this?

I've attached the .config used

Thanks

James Pearson

From: James Pearson on 10 Dec 2009 11:30

James Pearson wrote:
> I've booted a 64 bit 2.6.32 kernel on dual processor, quad core Xeon
> E5440 machine. The load average when the machine is idle varies between
> 2 and 3.
>
> When using a 2.6.31 kernel on the same machine, the load average when
> idle is nearly 0
>
> The kernel doesn't use modules - all that is needed is compiled in. The
> machine uses NFS-root
>
> Strangely, when I run 'iftop' (from http://www.ex-parrot.com/pdw/iftop/)
> using the 2.6.32 kernel, the load average drops to below 0.5 - stop
> running iftop, and the load average climbs again ...
>
> Any idea what might be causing this?

It looks like whatever is causing this happened between 2.6.31-git7 and
2.6.31-git8 - unfortunately I don't know how to find out what change
caused this ...

Also, if I 'hot-unplug' CPUs 1 to 7, the load average drops to 0 - when
I re-enable theses CPUs, the load average climbs.

I guess this is a problem with my particular config - or maybe because
I'm using NFS-root (the root file system is readonly), or using a
non-module kernel?

Thanks

James Pearson
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Zijlstra on 18 Dec 2009 09:20

On Fri, 2009-12-18 at 14:43 +0100, Andrea Suisani wrote:

> >>> Strangely, when I run 'iftop' (from
> >>> http://www.ex-parrot.com/pdw/iftop/) using the 2.6.32 kernel, the
> >>> load average drops to below 0.5 - stop running iftop, and the load
> >>> average climbs again ...

This is the thing that puzzles me most..

> >> Also, if I 'hot-unplug' CPUs 1 to 7, the load average drops to 0 -
> >> when I re-enable theses CPUs, the load average climbs.

Very curious too

> >> I guess this is a problem with my particular config - or maybe because
> >> I'm using NFS-root (the root file system is readonly), or using a
> >> non-module kernel?

Russell, you grumbled something like this on IRC, are you too using
NFS-root?

> > I gave 'git bisect' a go - which appears to suggest that my problem
> > started at:
> >
> > % git bisect bad
> > d7c33c4930f569caf6b2ece597432853c4151a45 is first bad commit
> > commit d7c33c4930f569caf6b2ece597432853c4151a45
> > Author: Peter Zijlstra <a.p.zijlstra(a)chello.nl>
> > Date: Fri Sep 11 12:45:38 2009 +0200
> >
> > sched: Fix task affinity for select_task_rq_fair
> >
> > While merging select_task_rq_fair() and sched_balance_self() I made
> > a mistake that leads to testing the wrong task affinty.
> >
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra(a)chello.nl>
> > LKML-Reference: <new-submission>
> > Signed-off-by: Ingo Molnar <mingo(a)elte.hu>
> >
> > :040000 040000 3d7aa3e193c7faf9c7ebbb1443c6f63269d86d04
> > 9cfb647eb5d80f156fd8a495da68f765c3fdd772 M kernel

> > So I guess, it is not just one patch that has caused the issue I'm
> > seeing, which I guess is to be expected as the above patch was part of
> > the 'scheduler updates for v2.6.32' patch set

Right, so the thing that seems most likely to cause such funnies is the
introduction of TASK_WAKING state in .32, during development we had a
brief period where we saw what you described, but I haven't seen it
after:

commit eb24073bc1fe3e569a855cf38d529fb650c35524
Author: Ingo Molnar <mingo(a)elte.hu>
Date: Wed Sep 16 21:09:13 2009 +0200

sched: Fix TASK_WAKING & loadaverage breakage

> > I guess as no one else has reported this issue - it must be something to
> > do with my set up - could using NFS-root affect how the load average is
> > calculated?

So the thing that contributes to load is TASK_UNINTERRUPTIBLE sleeps
(and !PF_FREEZING) as tested by task_contributes_to_load().

Are you seeing a matching number of tasks being stuck in 'D' state when
the load is high? If so, how are these tasks affected by iftop/hotplug?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: James Pearson on 18 Dec 2009 10:40

Peter Zijlstra wrote:
>
>>>So I guess, it is not just one patch that has caused the issue I'm
>>>seeing, which I guess is to be expected as the above patch was part of
>>>the 'scheduler updates for v2.6.32' patch set
>
>
> Right, so the thing that seems most likely to cause such funnies is the
> introduction of TASK_WAKING state in .32, during development we had a
> brief period where we saw what you described, but I haven't seen it
> after:
>
> commit eb24073bc1fe3e569a855cf38d529fb650c35524
> Author: Ingo Molnar <mingo(a)elte.hu>
> Date: Wed Sep 16 21:09:13 2009 +0200
>
> sched: Fix TASK_WAKING & loadaverage breakage

Yes, I did hit that while bisecting - and got load averages in the tens
of thousands - this, of course, masked the load averages I was seeing -
so I cheated and applied that patch to the bisects to proceed - I guess
I should have mentioned that earlier. i.e. I'm not seeing ridiculously
large load averages - but idle load averages of about 2 or 3

>>>I guess as no one else has reported this issue - it must be something to
>>>do with my set up - could using NFS-root affect how the load average is
>>>calculated?
>
>
> So the thing that contributes to load is TASK_UNINTERRUPTIBLE sleeps
> (and !PF_FREEZING) as tested by task_contributes_to_load().
>
> Are you seeing a matching number of tasks being stuck in 'D' state when
> the load is high? If so, how are these tasks affected by iftop/hotplug?

No - but running 'echo w > /proc/sysreq-trigger' I occassionally see
'portmap' in 'D' state

e.g.

SysRq : Show Blocked State
task PC stack pid father
portmap D ffffffff8102e05e 0 3660 1 0x00000000
ffff88043e5d4440 0000000000000082 0000000000000000 0000000000000000
0000000000000000 ffff88043f84db00 0000000000000000 0000000100009921
ffff88043e5d46b0 0000000081353f24 0000000000000000 000000003ea193b8

But I also see these with a 2.6.31 kernel when the load is O (or there
abouts)

If I stop portmap, the load does drop - e.g from 3.0 to 1.5, but not to zero

Another thing I've noticed is that when running 'top' (I'm using CentOS
4.7 as the distro) in 'SMP' mode (so all CPUs are listed), the % idle of
one or more of the CPUs shows 0.0% - the other CPUs show a % idle of
100.0% or 99.x% - I don't know if this top not reporting correctly, but
I don't see this when running a 2.6.31 kernel - in this case, all the
CPUs report 100.0% or 99.x% idle all the time.

e.g with 2.6.32 I see:

> top - 15:25:27 up 36 min, 3 users, load average: 2.20, 2.21, 2.01
> Tasks: 171 total, 1 running, 170 sleeping, 0 stopped, 0 zombie
> Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si
> Cpu1 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si
> Cpu2 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si
> Cpu3 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si
> Cpu4 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
> Cpu5 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
> Cpu6 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si
> Cpu7 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si

I don't know if this is significant

Thanks

James Pearson
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()
Next: sched: Fix boot crash by zalloc()ing most of the cpu masks