From: Peter Zijlstra on 4 Aug 2010 06:20 On Tue, 2010-08-03 at 14:28 -0700, Nikhil Rao wrote: > I see your point here, and yes I agree having 1 nice-0 on one cpu, 512 > SCHED_IDLE tasks on another cpu and all other cpus idle is correct if > we only considered fairness. However, we would also like to maximize > machine utilization. The fitness function we would ideally like to > optimize for is a combination of both fairness and utilization. Sure, I see (and agree with) the fact that we want to optimize utilization as well (although I bet the power management people might feel otherwise :-) > Thanks for your suggestions; I explored the first one a bit and I > added a check into find_busiest_queue() (instead of > find_busiest_group()) to skip a cpu if it has only 1 task on it (patch > attached below - did you have something else in mind?). You might also need some changes to find_busiest_group(), suppose you have a 4 cpu machine, with 2 groups of 2, now also assume you have 4 tasks, 2 of nice-0 and 2 idle, if both nice-0 are in the same group, each on their own cpu, then f_b_g() could select that group as being the busiest (its got W=2048, against W=4 of the other group after all). Once you have that group, f_b_q() won't be able to do anything sensible. > This fixes the > example I posted in the RFC, but it doesn't work as well when the > SCHED_NORMAL tasks have a sleep/wakeup pattern. I have some data below > where the load balancer fails to fully utilize a machine. In these > examples, I ran with the upstream kernel and with a kernel compiled > with the check in fbq(). Right, so wakeup/sleep are indeed more interesting. For wakeup we also have select_task_rq() to consider, it is responsible to choosing where to run the newly woken task. For sleeps we have new idle balancing, which is a lot like the regular load-balancing but differs enough to need looking at. >From the data you provided I cannot tell you which of these two is responsible for the thing you see (although under-utilization suggests the new-idle balancer), you can use perf/ftrace to look at what your tasks are doing and how they could be doing it better (Arjan's timechart might be a good help). If they get woken to the wrong CPU, its select_task_rq(), if they leave a CPU idle too long, its new idle balancing -- or possibly its something I overlooked all together :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Pages: 1 Prev: genhd, efi: add efi partition metadata to hd_structs Next: oprofile: updates for v2.6.36 |