Prev: [PATCH 09/11] Removing dead SERIAL_BFIN_{HARD_, }CTSRTS
Next: Badness at fs/sysfs/symlink.c:82 during qeth initalization
From: Luca Abeni on 10 Jul 2010 03:20 Hi all, first of all, thanks for including me in these emails, and sorry for the delay... On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote: > Hi all, > > So, talking to Peter and Thomas at the OSPERT workshop in Brussels [1], > the so called "sporaidic task model" came out many many times! I assume here you are simply talking about tasks with relative deadline different from period, right? (the term "sporadic" is more often associated with non-periodic activation patterns). [...] > - do you think it could be useful to have a different syscall to deal > with the period parameter (if it's different from deadline), e.g., > something useful to make the task periodic as you have (if I remember > well) in Xenomai or RTAI? Maybe I am confused because I missed the initial part of the discussion, but here I think there is the risk to mix two different concepts: the "reservation period" (that is, the period P used to postpone the scheduling deadline when the budget arrives to 0), and the "task period" (which has to do with the periodicity of tasks activations). For implementing a periodic behaviour in the task (this is, AFAIK, what RTAI similar API provide), no new syscall is needed: clock_nanosleep() is enough. See http://www.disi.unitn.it/~abeni/RTOS/rtapi.pdf for a (stupid) example. The reservation period, on the other hand, is a scheduling parameter, and I think that setting it with extended versions of sched_setparam(), sched_setscheduler() and similar is ok. > If you think it's worth doing that, do you think the > task_wait_interval() syscall that we already added could/should do > the job? I do not remember what task_wait_interval() does :) Is it the syscall you added to indicate the end of a job? > Basically, from the scheduling point of view, what it could happen is > that I'm still _NOT_ going to allow a task with runtime Q_i, deadline > D_i and period P_i to use more bandwidth than Q_i/P_i, I'm still using D > for scheduling but the passing of the simple in-kernel admission test > Sum_i(Q_i/P_i)<1 won't guarantee that the task will always finish its > jobs before D. I think if you want a different P_i and D_i you can use D_i for generating new scheduling deadlines on task arrivals as "d = t + D_i", and P_i to postpone the scheduling deadlines as "d = d + T_i" when the budget is 0. Depending on the replenishment amount you use, you might need to modify the admission test as "Sum_i(Q_i/min{P_i,D_i}) < 1" or not (if you always replenish to Q_i, then you need a modified admission test; otherwise, you can compute the replenishment amount so that the admission test is unaffected). Thanks, Luca -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Luca Abeni on 10 Jul 2010 03:30 On Fri, 2010-07-09 at 16:24 +0200, Peter Zijlstra wrote: > On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote: > > Basically, from the scheduling point of view, what it could happen is > > that I'm still _NOT_ going to allow a task with runtime Q_i, deadline > > D_i and period P_i to use more bandwidth than Q_i/P_i, I'm still using D > > for scheduling but the passing of the simple in-kernel admission test > > Sum_i(Q_i/P_i)<1 won't guarantee that the task will always finish its > > jobs before D. > > But the tardiness would still be bounded, right? So its a valid Soft-RT > model? I think that if Sum_i(Q_i/P_i)<1 but Sum_i(Q_i/min{P_i,D_i})>=1 then you can have sporadic deadline misses, but it should still be possible to compute an upper bound for the tardiness. But this is just a feeling, I have no proof... :) Luca -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Raistlin on 10 Jul 2010 04:00 On Fri, 2010-07-09 at 16:32 +0200, Peter Zijlstra wrote: > On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote: > > - do you think it could be useful to have a different syscall to deal > > with the period parameter (if it's different from deadline), e.g., > > something useful to make the task periodic as you have (if I remember > > well) in Xenomai or RTAI? > > If you think it's worth doing that, do you think the > > task_wait_interval() syscall that we already added could/should do > > the job? > > Again, I'm afraid I need some extra education here in order to make a > sensible comment. > Hey, fine, where's the problem? :-P > What are the exact semantics of this extra proposed syscall? > Right now, it is: task_wait_interval(t) --> "wake me up at the first instant after t when you can give me my full runtime" > What exactly are the benefits over not having it, and simply rely on the > task to not wake up more often, but if it does have it run into the lack > of budget and sort it that way? > What you're saying obviously will always work, and it is actually a quite common usage pattern (we use it like that a lot! :-)). The new syscall might help when it is important for a task to synchronize with the budget provisioning mechanism. It might be uncommon, but there could be situations --more in hard than in soft scenarios-- where you want to be sure that you're next job (and all the subsequent ones, if you behave well) will get its full runtime, even if this means waiting a little bit. what I was wondering was if this semantic should be modified by the introduction of the "period", but I also agree with Luca that we must do our best to avoid confusion! Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ---------------------------------------------------------------------- Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy) http://blog.linux.it/raistlin / raistlin(a)ekiga.net / dario.faggioli(a)jabber.org
From: Raistlin on 10 Jul 2010 05:10 On Fri, 2010-07-09 at 18:35 +0200, Peter Zijlstra wrote: > I think the easiest path for now would indeed be to split between hard > and soft rt tasks, and limit hard to d==p, and later worry about > supporting d<p for hard. > Mmm... I see... Are you thinking of another scheduling class? Or maybe just another queue with "higher priority" inside the same scheduling class (sched_dl.c)? I mean I can do both, so I prefer to do what you like most in the first place, instead of having to do it twice!! :-P Maybe having two policies inside the same class (maybe handled in separate queues/rb-trees) might save a lot of code duplication. If we want to go like this, suggestions on the name(s) of the new (or of both) policy(-ies) are more than welcome. :-D > It will very much depend on how we're going to go about doing things > with that 'load-balancer' thingy anyway. > Agree. The "load-balancer" right now pushes/pulls tasks to/from the various runqueue --just how the saame thing happens in sched-rt-- to, say, approximate G-EDF. Code is on the git... I just need some time to clean up a little bit more and post the patches, but it's already working at least... :-) > The idea is that we approximate G-EDF by moving tasks around, but Dario > told me the admission tests are still assuming P-EDF. > Yep, as said above, that's what we've done since now. Regarding "partitioned admission", let me try to explain this. You asked me to use sched_dl_runtime_us/sched_dl_period_us to let people decide how much bandwidth should be devoted to EDF tasks. This obviously yields to _only_one_ bandwidth value that is then utilized as the utilization cap on *each* CPU, mainly for consistency reasons with sched_rt_{runtime,period}_us. At that time I was using such value as the "overall EDF bandwidth", but I changed to your suggested semantic. Obviously this works perfectly as long as tasks stay on the CPU were they are created, and if they're manually migrated (by explicitly changing their affinity) I can easily check if there's enough bandwidth on the target CPU, and if yes move the task and its bandwidth there. That's how things were before the 'load-balancer' (and still does, if you set affinities of tasks so to have a fully partitioned setup). With global scheduling in place, we have this new situation. A task is forked on a CPU (say 0), and I allow that if there's enough bandwidth for it on that processor (and obviously, if yes, I also consume such amount of bw). When the task is dynamically migrated to CPU 1 I have two choices: (a) I move the bandwidth it occupies from 0 to 1 or, (b) I leave it (the bw, not the task) where it is, on 0. If I go for (b) and the scheduler wants to move a 0.2 task from CPU 0 (loaded up to 1.0) to CPU 1 loaded up to 0.9, I'm having a "transitory" situation with load 0.7 on CPU 0 and load 1.1 on CPU 1 --which I really don't like--, but at least I'm still enforcing Sum_i(EDFtask_i)<1.9. Moreover, If a new 0.1 task is being forked by someone on CPU 1 (independently whether it finds 1.1 or 1.0 load there), it will fail, even if there is room for it in the system (on CPU 1) --which I really don't like!! This is what, as I said to Peter in Brussels, I mean with "still partitioned" admission test. If I go for (a), and again the scheduler tries to move a 0.2 task from CPU 0 (loaded up to 1) to CPU 1 (loaded up to 0.9) I again have, two choices, failing or permitting this. Failing would mean another limitation to global scheduling --which I really don't like-- but allowing that would mean that another task of 0.2 can be created on CPU 0 so that I end up in bw(CPU0)+bw(CPU1)=1+1.1=2.1 --which I really don't like!! :-O :-O More-moreover, if my bw limit is 0.7 on each of the 2 CPUs I have, keeping the bandwidth separated forbids me to create a 0.2 task if both the CPU are loaded up to 0.6, while it probably could be scheduled, since we have global EDF! :-O If you look at the code, you'll find (b) implemented right now, but, as you might have understood, it's something I really don't like! :-( If we want something better I cannot think on anything that doesn't include having a global (per-domain should be fine as well) mechanism for bandwidth accounting... > Add to that the interesting problems of task affinity and we might soon > all have a head-ache ;-) > We right now support affinity, i.e., tasks will be pushed/pulled to/by CPUs where they can run. I'm not aware of any academic work that analyzes such a situation, but this doesn't mean we can't figure something out... Just to give people an example of "why real-time scheduling theory still matters"!! ;-P ;-P > One thing we can do is limit the task affinity to either 1 cpu or all > cpus in the load-balance domain. Since there don't yet exist any > applications we can disallow things to make life easier. > > If we only allow pinned tasks and free tasks, splitting the admission > test in two would suffice I think, if keep one per-cpu utilization > measure and use the maximum of these over all cpus to start the global > utilization measure, things ought to work out. > Ok, that seems possible to me, but since I have to write the code you must tell me what you want the semantic of (syswide and per-group) sched_dl_{runtime,period} to become and how should I treat them! :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ---------------------------------------------------------------------- Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy) http://blog.linux.it/raistlin / raistlin(a)ekiga.net / dario.faggioli(a)jabber.org
From: Raistlin on 10 Jul 2010 05:20
On Fri, 2010-07-09 at 16:30 +0200, Peter Zijlstra wrote: > On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote: > > > > Therefore, here it is my proposal: > > - if the programmer specify both period and deadline, I act as above, > > running the _only_necessary_ test in the kernel, assuming that > > userspace performed some other kind of (correct!) admission test by > > its own, or that it is cool with missing some deadlines... What do > > you think? > > It would be good to provide room in the interface to improve upon this > situation. > Well, room is already there, problem is decide how to use it. > One thing we could do, although this would make the proposed scheduler a > wee bit more complex, is split between hard and soft realtime. Only > accept P==rel_D for hard, and schedule the soft tasks in the hard slack > or something like that. > As said in the other mail, this could be done, just let me know as much as you can how you think it should happen (scheduling class, scheduling policy, two (or more) separate queues, ecc. Btw, Dhaval introduced me to the obscure world of IRC... I am (if everything worked! :-P) dariof on #sched and/or #sched-rt, so feel free to contact me even there, if at te PC I'll be glad to discuss even there! :-D Thanks and regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ---------------------------------------------------------------------- Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy) http://blog.linux.it/raistlin / raistlin(a)ekiga.net / dario.faggioli(a)jabber.org |