Prev: [PATCH 09/11] Removing dead SERIAL_BFIN_{HARD_, }CTSRTS
Next: Badness at fs/sysfs/symlink.c:82 during qeth initalization
From: Peter Zijlstra on 2 Aug 2010 15:40 On Sun, 2010-07-11 at 09:32 +0200, Bjoern Brandenburg wrote: > Trying to infer whether a task is "hard" or "soft" from task > parameters is not a good idea, IMO. It's much better to make this an > explicit part of the task model that is configured via sched_setparam. > By default, tasks should be marked "soft" (this leaves more wiggle > room to the kernel); users who care can change the flag to "hard". I think we're in violent agreement here ;-) and I was convinced that was what we were talking about. The question was only how to represent that in the sched_param_ex structure, the options were: struct sched_param_ex params; params.flags |= SF_SOFT; sched_setscheduler_ex( .policy = SCHED_DEADLINE, .param = ¶ms); vs sched_setscheduler_ex( .policy = SCHED_DEADLINE_{SOFT,HARD}, .param = ¶ms); > Taking a step back, I think the problem here is that we are trying to > shove too many concepts and properties into a single scheduler. Hard > (no tardiness) is different from soft (bounded tardiness) is different > from global is different from partitioned. > > From my point of view, it makes no sense to support hard deadlines > under G-EDF (this is backed up by our schedulability studies [1]). > Hard deadlines are best served by a P-EDF implementation (that only > migrates on task creation/admission). > The problem is more that we need to support things like cpu affinity and cpusets within the context of a 'global' scheduler. Using cpusets we can partition the 'load-balancer' and create clusters (root-domains in linux scheduler speak). Using cpu affinity we can limit tasks to a subset of their cluster's cpus. Esp. the latter is very hard to do, and I think we can get away with only allowing a single cpu or the full cluster (its a new policy, so there is no existing userspace to break). This ends up meaning we need to support both P-EDF and G-EDF for soft, and since we want to re-use pretty much all the code and only have a different admission test for hard (initially), it would end up also being P/G-EDF for hard (even though as you rightly point out, hard G-EDF is pretty pointless -- but since the policy doesn't promise EDF, we could later improve it to be PD^2 or whatever, at which point global hard does start to make sense). (which I guess would suggest we use different policies instead of a flag, since that would make most sense if we end up replacing the hard part with another policy) So what I want to have is a sporadic task scheduler, not an EDF scheduler (hence also the request to s/SCHED_EDF/SCHED_DEADLINE/ -- avoiding the obvious SCHED_SPORADIC in order to avoid confusion with the POSIX thing). EDF is just the easiest of the many different ways to schedule a sporadic task set. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 3 Aug 2010 04:20 On Sun, 2010-07-11 at 08:46 +0200, Bjoern Brandenburg wrote: > I'd be hesitant to just assume that it "approximates G-EDF" > sufficiently well to apply any of the published G-EDF tests. OK, suppose that for each cpu we keep the earliest and next-earliest deadline in a table. Then on wakeup (job release) we pick the cpu with the currently last deadline to preempt (we push the task). On sleep (job completion) we look for the earliest among all next-earliest deadlines to select the next runnable task (we pull the task). If we serialize all this using one big lock around this [ {earliest, next-earliest} ] table, we've basically implemented G-EDF, agreed? Now replace that global lock with an algorithm that looks at the table, finds the last-earliest or earliest-next-earliest in a lock-less fashion, then locks the target cpu's rq->lock, verifies the result and either continues or tries again. So we replace the global lock with cmpxchg like loops using 2 per-cpu locks. Our current SCHED_FIFO balancer does just this and is found to be a very good approximation of global-fifo (iirc there's one funny case, but I can't remember, maybe Steve or Gregory can remember the details). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 3 Aug 2010 05:50 On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote: > If you want to do G-EDF with limited and different budgets on each CPU > (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but > for 400 out of 1000 ms on CPU 1), then you are entering the domain of > restricted-supply scheduling, which is significantly more complicated > to analyze (see [1,2]). Would making the thing homogenious by assuming the worst for all cpus make the analysis easier? That is, in the above example, only allow the G-EDF scheduler to run for 100 out of 1000 ms on both cpus. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Gregory Haskins on 3 Aug 2010 08:10 Hello Peter, Steven, Thomas, et al. ! Its been a while. >>> On 8/3/2010 at 04:16 AM, in message <1280823360.1923.419.camel(a)laptop>, Peter Zijlstra <peterz(a)infradead.org> wrote: [snip] > > So we replace the global lock with cmpxchg like loops using 2 per-cpu > locks. Our current SCHED_FIFO balancer does just this and is found to be > a very good approximation of global-fifo (iirc there's one funny case, > but I can't remember, maybe Steve or Gregory can remember the details). The only thing I remember at the moment is that Steven and I found one last remaining bug in an extreme corner case where priority order may be violated. Sadly, I can no longer recall the exact specifics and would have to dig into the code to remember. The good news is I believe the issue is fixable. Its just a matter of impl+test, which neither of us ever seem to have found the time to properly dedicate. Perhaps this will be the catalyst ;) -Greg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Bastoni on 4 Aug 2010 00:40
On 08/03/2010 05:41 AM, Peter Zijlstra wrote: > On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote: >> >> If you want to do G-EDF with limited and different budgets on each CPU >> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but >> for 400 out of 1000 ms on CPU 1), then you are entering the domain of >> restricted-supply scheduling, which is significantly more complicated >> to analyze (see [1,2]). > > Without having looked at the refs, won't the soft case still have > bounded tardiness? Since the boundedness property mostly depends on > u<=1, that is, as long as we can always run everything within the > available time we won't start drifting. Yes, the soft case will still have bounded tardiness (see [2]), although the reason is more related to the fact that priorities are defined by deadlines, than to U<=1. Anyway, both hard and soft real-time cases become very difficult to analyze if limited/different budgets are allowed on each CPU. >> As far as I know there is no exiting analysis for "almost G-EDF", >> i.e., the case where each task may only migrate among a subset of the >> processors (= affinity masks), except for the special case of >> clustered EDF (C-EDF), wherein the subsets of processors are >> non-overlapping. > > Right, affinity masks are a pain, hence I proposed to limit that to > either 1 cpu (yielding fully paritioned) or the full cluster. I'm not sure I get what you mean by "full cluster". With G-EDF-like scheduling policies it only makes sense to cluster cores around some memory level (cache Lx, NUMA node...), as the idea is to reduce the cost of a task migration among cores. Depending on the workload, a higher (lower) level of clustering may perform better. A "full cluster" therefore should be created around some memory level. But if a socket has, for example, two level of caches (L2 + L3) and a "full cluster" forces to select all cores in the socket (first hierarchy level in cpusets), we miss the possibility to cluster the cores that shares the L2 (and this configuration may lead better performance). In these clusters the processors are _non-overlapping_. Instead, if you want to use the cpuset + affinity to define possibly _overlapping_ clusters (or containers, or servers) to support different budgets on each CPU (something similar to cgroup, see [1,3]), forcing only two configuration (single cpu/full cluster) may be restrictive. Thanks, - Andrea [3] H. Leontyev and J. Anderson, " A Hierarchical Multiprocessor Bandwidth Reservation Scheme with Timing Guarantees", Real-Time Systems, Volume 43, Number 1, pp. 60-92, September 2009. http://www.cs.unc.edu/~anderson/papers/rtj09b.pdf -- Andrea Bastoni Visiting Ph.D. Student Dept. of Computer Science University of North Carolina at Chapel Hill http://www.sprg.uniroma2.it/home/bastoni/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |