Prev: [PATCH 1/1] AGP: amd64, fix pci reference leaks
Next: [PATCH 2/3] viafb: remove unused structure member
From: Nikos Chantziaras on 8 Sep 2009 03:20 On 09/07/2009 05:40 PM, Arjan van de Ven wrote: > On Mon, 07 Sep 2009 06:38:36 +0300 > Nikos Chantziaras<realnc(a)arcor.de> wrote: > >> On 09/06/2009 11:59 PM, Ingo Molnar wrote: >>> [...] >>> Also, i'd like to outline that i agree with the general goals >>> described by you in the BFS announcement - small desktop systems >>> matter more than large systems. We find it critically important >>> that the mainline Linux scheduler performs well on those systems >>> too - and if you (or anyone else) can reproduce suboptimal behavior >>> please let the scheduler folks know so that we can fix/improve it. >> >> BFS improved behavior of many applications on my Intel Core 2 box in >> a way that can't be benchmarked. Examples: > > Have you tried to see if latencytop catches such latencies ? I've just tried it. I start latencytop and then mplayer on a video that doesn't max out the CPU (needs about 20-30% of a single core (out of 2 available)). Then, while the video is playing, I press Alt+Tab repeatedly which makes the desktop compositor kick-in and stay active (it lays out all windows as a "flip-switch", similar to the Microsoft Vista Aero alt+tab effect). Repeatedly pressing alt+tab results in the compositor (in this case KDE 4.3.1) keep doing processing. With the mainline scheduler, mplayer starts dropping frames and skip sound like crazy for the whole duration of this exercise. latencytop has this to say: http://foss.math.aegean.gr/~realnc/pics/latop1.png Though I don't really understand what this tool is trying to tell me, I hope someone does. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on 8 Sep 2009 03:50 * Ingo Molnar <mingo(a)elte.hu> wrote: > That's interesting. I tried to reproduce it on x86, but the > profile does not show any scheduler overhead at all on the server: I've now simulated a saturated iperf server by adding an udelay(3000) to e1000_intr() in via the patch below. There's no idle time left that way: Cpu(s): 0.0%us, 2.6%sy, 0.0%ni, 0.0%id, 0.0%wa, 93.2%hi, 4.2%si, 0.0%st Mem: 1021044k total, 93400k used, 927644k free, 5068k buffers Swap: 8193140k total, 0k used, 8193140k free, 25404k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1604 mingo 20 0 38300 956 724 S 99.4 0.1 3:15.07 iperf 727 root 15 -5 0 0 0 S 0.2 0.0 0:00.41 kondemand/0 1226 root 20 0 6452 336 240 S 0.2 0.0 0:00.06 irqbalance 1387 mingo 20 0 78872 1988 1300 S 0.2 0.2 0:00.23 sshd 1657 mingo 20 0 12752 1128 800 R 0.2 0.1 0:01.34 top 1 root 20 0 10320 684 572 S 0.0 0.1 0:01.79 init 2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd And the server is only able to saturate half of the 1 gigabit bandwidth: Client connecting to t, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.1.19 port 50836 connected with 10.0.1.14 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 504 MBytes 423 Mbits/sec ------------------------------------------------------------ Client connecting to t, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.1.19 port 50837 connected with 10.0.1.14 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 502 MBytes 420 Mbits/sec perf top is showing: ------------------------------------------------------------------------------ PerfTop: 28517 irqs/sec kernel:99.4% [100000 cycles], (all, 1 CPUs) ------------------------------------------------------------------------------ samples pcnt kernel function _______ _____ _______________ 139553.00 - 93.2% : delay_tsc 2098.00 - 1.4% : hmac_digest 561.00 - 0.4% : ip_call_ra_chain 335.00 - 0.2% : neigh_alloc 279.00 - 0.2% : __hash_conntrack 257.00 - 0.2% : dev_activate 186.00 - 0.1% : proc_tcp_available_congestion_control 178.00 - 0.1% : e1000_get_regs 167.00 - 0.1% : tcp_event_data_recv delay_tsc() dominates, as expected. Still zero scheduler overhead and the contex-switch rate is well below 1000 per sec. Then i booted v2.6.30 vanilla, added the udelay(3000) and got: [ 5] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 47026 [ 5] 0.0-10.0 sec 493 MBytes 412 Mbits/sec [ 4] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 47027 [ 4] 0.0-10.0 sec 520 MBytes 436 Mbits/sec [ 5] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 47028 [ 5] 0.0-10.0 sec 506 MBytes 424 Mbits/sec [ 4] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 47029 [ 4] 0.0-10.0 sec 496 MBytes 415 Mbits/sec i.e. essentially the same throughput. (and this shows that using .30 versus .31 did not materially impact iperf performance in this test, under these conditions and with this hardware) The i applied the BFS patch to v2.6.30 and used the same udelay(3000) hack and got: No measurable change in throughput. Obviously, this test is not equivalent to your test - but it does show that even saturated iperf is getting scheduled just fine. (or, rather, does not get scheduled all that much.) [ 5] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 38505 [ 5] 0.0-10.1 sec 481 MBytes 401 Mbits/sec [ 4] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 38506 [ 4] 0.0-10.0 sec 505 MBytes 423 Mbits/sec [ 5] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 38507 [ 5] 0.0-10.0 sec 508 MBytes 426 Mbits/sec [ 4] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 38508 [ 4] 0.0-10.0 sec 486 MBytes 406 Mbits/sec So either your MIPS system has some unexpected dependency on the scheduler, or there's something weird going on. Mind poking on this one to figure out whether it's all repeatable and why that slowdown happens? Multiple attempts to reproduce it failed here for me. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on 8 Sep 2009 04:10 * Pekka Pietikainen <pp(a)ee.oulu.fi> wrote: > On Mon, Sep 07, 2009 at 10:57:01PM +0200, Ingo Molnar wrote: > > > > Could you profile it please? Also, what's the context-switch rate? > > > > > > As far as I can tell, the broadcom mips architecture does not have > > > profiling support. It does only have some proprietary profiling > > > registers that nobody wrote kernel support for, yet. > > Well, what does 'vmstat 1' show - how many context switches are > > there per second on the iperf server? In theory if it's a truly > > saturated box, there shouldnt be many - just a single iperf task > > Yay, finally something that's measurable in this thread \o/ My initial posting in this thread contains 6 separate types of measurements, rather extensive ones. Out of those, 4 measurements were latency oriented, two were throughput oriented. Plenty of data, plenty of results, and very good reproducability. > Gigabit Ethernet iperf on an Atom or so might be something that > shows similar effects yet is debuggable. Anyone feel like taking a > shot? I tried iperf on x86 and simulated saturation and no, there's no BFS versus mainline performance difference that i can measure - simply because a saturated iperf server does not schedule much - it's busy handling all that networking workload. I did notice that iperf is somewhat noisy: it can easily have weird outliers regardless of which scheduler is used. That could be an effect of queueing/timing: depending on precisely what order packets arrive and they get queued by the networking stack, does get a cache-effective pathway of packets get opened - while with slightly different timings, that pathway closes and we get much worse queueing performance. I saw noise on the order of magnitude of 10%, so iperf has to be measured carefully before drawing conclusions. > That beast doing iperf probably ends up making it go quite close > to it's limits (IO, mem bw, cpu). IIRC the routing/bridging > performance is something like 40Mbps (depends a lot on the model, > corresponds pretty well with the Mhz of the beast). > > Maybe not totally unlike what make -j16 does to a 1-4 core box? No, a single iperf session is very different from kbuild make -j16. Firstly, iperf server is just a single long-lived task - so we context-switch between that and the idle thread , [and perhaps a kernel thread such as ksoftirqd]. The scheduler essentially has no leeway what task to schedule and for how long: if there's work going on the iperf server task will run - if there's none, the idle task runs. [modulo ksoftirqd - depending on the driver model and dependent on precise timings.] kbuild -j16 on the other hand is a complex hierarchy and mixture of thousands of short-lived and long-lived tasks. The scheduler has a lot of leeway to decide what to schedule and for how long. From a scheduler perspective the two workloads could not be any more different. Kbuild does test scheduler decisions in non-trivial ways - iperf server does not really. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nikos Chantziaras on 8 Sep 2009 04:20 On 09/08/2009 11:04 AM, Ingo Molnar wrote: > > * Pekka Pietikainen<pp(a)ee.oulu.fi> wrote: > >> On Mon, Sep 07, 2009 at 10:57:01PM +0200, Ingo Molnar wrote: >>>>> Could you profile it please? Also, what's the context-switch rate? >>>> >>>> As far as I can tell, the broadcom mips architecture does not have >>>> profiling support. It does only have some proprietary profiling >>>> registers that nobody wrote kernel support for, yet. >>> Well, what does 'vmstat 1' show - how many context switches are >>> there per second on the iperf server? In theory if it's a truly >>> saturated box, there shouldnt be many - just a single iperf task >> >> Yay, finally something that's measurable in this thread \o/ > > My initial posting in this thread contains 6 separate types of > measurements, rather extensive ones. Out of those, 4 measurements > were latency oriented, two were throughput oriented. Plenty of data, > plenty of results, and very good reproducability. None of which involve latency-prone GUI applications running on cheap commodity hardware though. I listed examples where mainline seems to behave sub-optimal and ways to reproduce them but this doesn't seem to be an area of interest. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on 8 Sep 2009 04:30
On Tue, 08 Sep 2009 10:19:06 +0300 Nikos Chantziaras <realnc(a)arcor.de> wrote: > latencytop has this to say: > > http://foss.math.aegean.gr/~realnc/pics/latop1.png > > Though I don't really understand what this tool is trying to tell me, > I hope someone does. unfortunately this is both an older version of latencytop, and it's incorrectly installed ;-( Latencytop is supposed to translate those cryptic strings to english, but due to not being correctly installed, it does not do this ;( the latest version of latencytop also has a GUI (thanks to Ben) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |