BFS vs. mainline scheduler benchmarks and measurements [Kernel]

Prev: [PATCH 1/1] AGP: amd64, fix pci reference leaks
Next: [PATCH 2/3] viafb: remove unused structure member

From: Theodore Tso on 8 Sep 2009 08:10

On Tue, Sep 08, 2009 at 01:13:34PM +0300, Nikos Chantziaras wrote:
>> despite the untranslated content, it is clear that you have scheduler
>> delays (either due to scheduler bugs or cpu contention) of upto 68
>> msecs... Second in line is your binary AMD graphics driver that is
>> chewing up 14% of your total latency...
>
> I've now used a correctly installed and up-to-date version of latencytop
> and repeated the test. Also, I got rid of AMD's binary blob and used
> kernel DRM drivers for my graphics card to throw fglrx out of the
> equation (which btw didn't help; the exact same problems occur).
>
> Here the result:
>
> http://foss.math.aegean.gr/~realnc/pics/latop2.png

This was with an unmodified 2.6.31-rcX kernel? Does Latencytop do
anything useful on a BFS-patched kernel?

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Felix Fietkau on 8 Sep 2009 09:50

Benjamin Herrenschmidt wrote:
> On Tue, 2009-09-08 at 09:48 +0200, Ingo Molnar wrote:
>> So either your MIPS system has some unexpected dependency on the
>> scheduler, or there's something weird going on.
>>
>> Mind poking on this one to figure out whether it's all repeatable
>> and why that slowdown happens? Multiple attempts to reproduce it
>> failed here for me.
>
> Could it be the scheduler using constructs that don't do well on MIPS ?
>
> I remember at some stage we spotted an expensive multiply in there,
> maybe there's something similar, or some unaligned or non-cache friendly
> vs. the MIPS cache line size data structure, that sort of thing ...
>
> Is this a SW loaded TLB ? Does it misses on kernel space ? That could
> also be some differences in how many pages are touched by each scheduler
> causing more TLB pressure. This will be mostly invisible on x86.
The TLB is SW loaded, yes. However it should not do any misses on kernel
space, since the whole segment is in a wired TLB entry.

- Felix
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Arjan van de Ven on 8 Sep 2009 10:20

On Tue, 08 Sep 2009 13:13:34 +0300
Nikos Chantziaras <realnc(a)arcor.de> wrote:

> On 09/08/2009 11:38 AM, Arjan van de Ven wrote:
> > On Tue, 08 Sep 2009 10:19:06 +0300
> > Nikos Chantziaras<realnc(a)arcor.de> wrote:
> >
> >> latencytop has this to say:
> >>
> >> http://foss.math.aegean.gr/~realnc/pics/latop1.png
> >>
> >> Though I don't really understand what this tool is trying to tell
> >> me, I hope someone does.
> >
> > despite the untranslated content, it is clear that you have
> > scheduler delays (either due to scheduler bugs or cpu contention)
> > of upto 68 msecs... Second in line is your binary AMD graphics
> > driver that is chewing up 14% of your total latency...
>
> I've now used a correctly installed and up-to-date version of
> latencytop and repeated the test. Also, I got rid of AMD's binary
> blob and used kernel DRM drivers for my graphics card to throw fglrx
> out of the equation (which btw didn't help; the exact same problems
> occur).
>
> Here the result:
>
> http://foss.math.aegean.gr/~realnc/pics/latop2.png
>
> Again: this is on an Intel Core 2 Duo CPU.

so we finally have objective numbers!

now the interesting part is also WHERE the latency hits. Because
fundamentally, if you oversubscribe the CPU, you WILL get scheduling
latency.. simply you have more to run than there is CPU.

Now the scheduler impacts this latency in two ways
* Deciding how long apps run before someone else gets to take over
("time slicing")
* Deciding who gets to run first/more; eg priority between apps

the first one more or less controls the maximum, while the second one
controls which apps get to enjoy this maximum.

latencytop shows you both, but it is interesting to see how much the
apps get that you care about latency for....

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Michael Buesch on 8 Sep 2009 10:50

On Tuesday 08 September 2009 09:48:25 Ingo Molnar wrote:
> Mind poking on this one to figure out whether it's all repeatable
> and why that slowdown happens?

I repeated the test several times, because I couldn't really believe that
there's such a big difference for me, but the results were the same.
I don't really know what's going on nor how to find out what's going on.

--
Greetings, Michael.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Zijlstra on 8 Sep 2009 11:30

On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> And here's a newer version.

I tinkered a bit with your proglet and finally found the problem.

You used a single pipe per child, this means the loop in run_child()
would consume what it just wrote out until it got force preempted by the
parent which would also get woken.

This results in the child spinning a while (its full quota) and only
reporting the last timestamp to the parent.

Since consumer (parent) is a single thread the program basically
measures the worst delay in a thundering herd wakeup of N children.

The below version yields:

idle

[root(a)opteron sched]# ./latt -c8 sleep 30
Entries: 664 (clients=8)

Averages:
------------------------------
Max 128 usec
Avg 26 usec
Stdev 16 usec

make -j4

[root(a)opteron sched]# ./latt -c8 sleep 30
Entries: 648 (clients=8)

Averages:
------------------------------
Max 20861 usec
Avg 3763 usec
Stdev 4637 usec

Mike's patch, make -j4

[root(a)opteron sched]# ./latt -c8 sleep 30
Entries: 648 (clients=8)

Averages:
------------------------------
Max 17854 usec
Avg 6298 usec
Stdev 4735 usec

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Prev: [PATCH 1/1] AGP: amd64, fix pci reference leaks
Next: [PATCH 2/3] viafb: remove unused structure member