Prev: [PATCH 09/11] Removing dead SERIAL_BFIN_{HARD_, }CTSRTS
Next: Badness at fs/sysfs/symlink.c:82 during qeth initalization
From: Bjoern Brandenburg on 4 Aug 2010 02:40 On Aug 3, 2010, at 4:16 AM, Peter Zijlstra <peterz(a)infradead.org> wrote: > On Sun, 2010-07-11 at 08:46 +0200, Bjoern Brandenburg wrote: >> I'd be hesitant to just assume that it "approximates G-EDF" >> sufficiently well to apply any of the published G-EDF tests. > > OK, suppose that for each cpu we keep the earliest and next-earliest > deadline in a table. Then on wakeup (job release) we pick the cpu with > the currently last deadline to preempt (we push the task). > > On sleep (job completion) we look for the earliest among all > next-earliest deadlines to select the next runnable task (we pull the > task). > > If we serialize all this using one big lock around this [ {earliest, > next-earliest} ] table, we've basically implemented G-EDF, agreed? Yes, agreed. (Assuming that the next-earliest filed is always kept up-to-date by finding the next-earliest when the task is pulled.) > > Now replace that global lock with an algorithm that looks at the table, > finds the last-earliest or earliest-next-earliest in a lock-less > fashion, then locks the target cpu's rq->lock, verifies the result and > either continues or tries again. Can this lead to tasks bouncing back-and-forth? Under a strict interpretation of G-EDF, each job arrival should cause at most one migration. Can you bound the maximum number of times that the retry-loop is taken per scheduling decision? Can you prove that the lock-less traversal of the table yields a consistent snapshot, or is it possible to accidentally miss a priority inversion due to concurrent job arrivals? In practice, repeated retries are probably not much of a problem, but not having a firm bound would violate strict validation rules (you can't prove it terminates), and would also violate academic real-time rules (again, you ought to be able to prove it correct). I realize that these rules may not be something that has a high priority for Linux, but on the other hand some properties such as the max number of migrations may be implicitly assumed in schedulability tests. I'm not saying that the proposed implementation is not compatible with published analysis, but I'd be cautious to simply assume that it is. Some of the questions that were raised in this thread make it sound like the border between global and partitioned isn't clearly drawn in the implementation yet (e.g., handling of proc affinity masks), so my opinion may change when the code stabilizes. (This isn't meant as a criticism of Dario et al.'s good work; this is just something very hard to get right, and especially so on the first attempt.) > So we replace the global lock with cmpxchg like loops using 2 per-cpu > locks. Our current SCHED_FIFO balancer does just this and is found to be > a very good approximation of global-fifo (iirc there's one funny case, > but I can't remember, maybe Steve or Gregory can remember the details). Going back to Dario's original comments, when combined with proc. affinities/partitioning you'd either have to move budget allocations from CPU to CPU or track a global utilization sum for admission test purposes. - Björn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 4 Aug 2010 03:20
On Tue, 2010-08-03 at 23:52 -0400, Andrea Bastoni wrote: > Instead, if you want to use the cpuset + affinity to define possibly _overlapping_ clusters (or > containers, or servers) to support different budgets on each CPU (something similar to cgroup, > see [1,3]), forcing only two configuration (single cpu/full cluster) may be restrictive. cpusets doesn't allow overlapping load-balance domains as it stands today. In that case it would end up being a single large domain. With cpu affinity we can of course create whatever we want, hence my suggestion to limit allowed affinity masks to 1 cpu or the full load-balance domain. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |