CFS scheduler, -v8 [Kernel]

Prev: Bug in current -git tree causing dbus and gnome to chew up cpu time
Next: Introduce O_CLOEXEC (take >2)

From: Vegard Nossum on 2 May 2007 09:10

On Tue, May 1, 2007 11:22 pm, Ingo Molnar wrote:
> As usual, any sort of feedback, bugreport, fix and suggestion is more
than welcome,

Hi,

The sys_sched_yield_to() is not callable from userspace on i386 because it
is not part of the syscall table (arch/i386/kernel/syscall_table.S). This
causes sysenter_entry (arch/i386/kernel/entry.S) to use the wrong count
for nr_syscalls (320 instead of 321) and return with -ENOSYS.

Vegard

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 2 May 2007 12:50

* Vegard Nossum <vegard(a)peltkore.net> wrote:

> The sys_sched_yield_to() is not callable from userspace on i386
> because it is not part of the syscall table
> (arch/i386/kernel/syscall_table.S). This causes sysenter_entry
> (arch/i386/kernel/entry.S) to use the wrong count for nr_syscalls (320
> instead of 321) and return with -ENOSYS.

oops, indeed - the patch below should fix this. (x86 should really adopt
the nice x86_64 technique of building the syscall table out of the
unistd.h enumeration definitions.)

Ingo

Index: linux/arch/i386/kernel/syscall_table.S
===================================================================
--- linux.orig/arch/i386/kernel/syscall_table.S
+++ linux/arch/i386/kernel/syscall_table.S
@@ -319,3 +319,4 @@ ENTRY(sys_call_table)
.long sys_move_pages
.long sys_getcpu
.long sys_epoll_pwait
+ .long sys_sched_yield_to /* 320 */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Srivatsa Vaddagiri on 2 May 2007 13:30

On Tue, May 01, 2007 at 10:57:14PM -0400, Ting Yang wrote:
> "A Proportional Share REsource Allocation Algorithm for Real-Time,
> Time-Shared Systems", by Ion Stoica. You can find the paper here:
> http://citeseer.ist.psu.edu/37752.html

Good paper ..thanks for the pointer.

I briefly went thr' the paper and my impression is it expect each task
to specify the length of each new request it initiates. Is that correct?

If we have to apply EEVDF to SCHED_NORMAL task scheduling under CFS, how
would we calculate that "length of each new request" (which is reqd
before we calculate its virtual deadline)?

> EXAMPLE: assume the system runs at 1000 tick/second, i.e. 1ms a tick,
> and the granularity of pre-exemption for CFS is 5 virtual ticks (the
> current setting). If, at time t=0, we start 2 tasks: p1 and p2, both
> have nice value 0 (weight 1024), and rq->fair_clock is initialized to 0.
> Now we have:
> p1->fair_key = p2->fair_key = rq->fair_clock = 0.
> CFS breaks the tie arbitrarily, say it executes p1. After 1 system tick
> (1ms later) t=1, we have:
> rq->fair_clock = 1/2, p1->fair_key = 1, p2->fair_key = 0.
> Suppose, a new task p3 starts with nice value -10 at this moment, that
> is p3->fair_key=1/2. In this case, CFS will not schedule p3 for
> execution until the fair_keys of p1 and p2 go beyond 5+1/2 (which
> translates to about 10ms later in this setting), _regardless_ the
> priority (weight) of p3.

There is also p->wait_runtime which is taken into account when
calculating p->fair_key. So if p3 had waiting in runqueue for long
before, it can get to run quicker than 10ms later.

--
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: William Lee Irwin III on 2 May 2007 13:50

On Tue, May 01, 2007 at 10:57:14PM -0400, Ting Yang wrote:
>> "A Proportional Share REsource Allocation Algorithm for Real-Time,
>> Time-Shared Systems", by Ion Stoica. You can find the paper here:
>> http://citeseer.ist.psu.edu/37752.html

On Wed, May 02, 2007 at 11:06:34PM +0530, Srivatsa Vaddagiri wrote:
> Good paper ..thanks for the pointer.
> I briefly went thr' the paper and my impression is it expect each task
> to specify the length of each new request it initiates. Is that correct?
> If we have to apply EEVDF to SCHED_NORMAL task scheduling under CFS, how
> would we calculate that "length of each new request" (which is reqd
> before we calculate its virtual deadline)?

l_i and w_i are both functions of the priority. You essentially arrange
l_i to express QoS wrt. latency, and w_i to express QoS wrt. bandwidth.

On Tue, May 01, 2007 at 10:57:14PM -0400, Ting Yang wrote:
>> EXAMPLE: assume the system runs at 1000 tick/second, i.e. 1ms a tick,
>> and the granularity of pre-exemption for CFS is 5 virtual ticks (the
>> current setting). If, at time t=0, we start 2 tasks: p1 and p2, both
>> have nice value 0 (weight 1024), and rq->fair_clock is initialized to 0.
>> Now we have:
>> p1->fair_key = p2->fair_key = rq->fair_clock = 0.
>> CFS breaks the tie arbitrarily, say it executes p1. After 1 system tick
>> (1ms later) t=1, we have:
>> rq->fair_clock = 1/2, p1->fair_key = 1, p2->fair_key = 0.
>> Suppose, a new task p3 starts with nice value -10 at this moment, that
>> is p3->fair_key=1/2. In this case, CFS will not schedule p3 for
>> execution until the fair_keys of p1 and p2 go beyond 5+1/2 (which
>> translates to about 10ms later in this setting), _regardless_ the
>> priority (weight) of p3.

On Wed, May 02, 2007 at 11:06:34PM +0530, Srivatsa Vaddagiri wrote:
> There is also p->wait_runtime which is taken into account when
> calculating p->fair_key. So if p3 had waiting in runqueue for long
> before, it can get to run quicker than 10ms later.

Virtual time is time from the task's point of view, which it has spent
executing. ->wait_runtime is a device to subtract out time spent on the
runqueue but not running from what would otherwise be virtual time to
express lag, whether deliberately or coincidentally. ->wait_runtime
would not be useful for EEVDF AFAICT, though it may be interesting to
report.

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 2 May 2007 14:20

* William Lee Irwin III <wli(a)holomorphy.com> wrote:

> > There is also p->wait_runtime which is taken into account when
> > calculating p->fair_key. So if p3 had waiting in runqueue for long
> > before, it can get to run quicker than 10ms later.
>
> Virtual time is time from the task's point of view, which it has spent
> executing. ->wait_runtime is a device to subtract out time spent on
> the runqueue but not running from what would otherwise be virtual time
> to express lag, whether deliberately or coincidentally. [...]

CFS is in fact _built around_ the ->wait_runtime metric (which, as its
name suggests already, expresses the precise lag a task observes
relative to 'ideal' fair execution), so what exactly makes you suspect
that this property of the ->wait_runtime metric might be 'coincidental'?
;-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: Bug in current -git tree causing dbus and gnome to chew up cpu time
Next: Introduce O_CLOEXEC (take >2)