From: Peter Zijlstra on
On Fri, 2010-05-28 at 16:26 +0200, Frederic Weisbecker wrote:
> We have various tracepoints that tell us when a task is going to
> be enqueued in a runqueue: fork, wakeup, migrate.
>
> But they don't always provide us the level of information necessary
> to know what is actually in which runqueue, precisely because the
> migrate event is only fired if the task is queued on another
> cpu than its previous one. So we don't always know where a waking up
> task goes.
>
> And moreover we don't have events that tells a task goes to sleep,
> and even that wouldn't cover every cases when a task is dequeued.
>
> So bring these two new tracepoints to get informations about the
> load of each runqueues.

NAK, aside from a few corner cases wakeup and sleep are the important
points.

The activate and deactivate functions are implementation details.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Peter Zijlstra <peterz(a)infradead.org> wrote:

> On Fri, 2010-05-28 at 16:26 +0200, Frederic Weisbecker wrote:
> > We have various tracepoints that tell us when a task is going to
> > be enqueued in a runqueue: fork, wakeup, migrate.
> >
> > But they don't always provide us the level of information necessary
> > to know what is actually in which runqueue, precisely because the
> > migrate event is only fired if the task is queued on another
> > cpu than its previous one. So we don't always know where a waking up
> > task goes.
> >
> > And moreover we don't have events that tells a task goes to sleep,
> > and even that wouldn't cover every cases when a task is dequeued.
> >
> > So bring these two new tracepoints to get informations about the
> > load of each runqueues.
>
> NAK, aside from a few corner cases wakeup and sleep are the important
> points.
>
> The activate and deactivate functions are implementation details.

Frederic, can you show us a concrete example of where we dont know what is
going on due to inadequate instrumentation? Can we fix that be extending the
existing tracepoints?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Mon, 2010-05-31 at 10:00 +0200, Ingo Molnar wrote:
> >
> > NAK, aside from a few corner cases wakeup and sleep are the important
> > points.
> >
> > The activate and deactivate functions are implementation details.
>
> Frederic, can you show us a concrete example of where we dont know what is
> going on due to inadequate instrumentation? Can we fix that be extending the
> existing tracepoints?

Right, so a few of those corner cases I mentioned above are things like
re-nice, PI-boosts etc.. Those use deactivate, modify task-state,
activate cycles. so if you want to see those, we can add an explicit
tracepoint for those actions.

An explicit nice/PI-boost tracepoint is much clearer than trying to
figure out wth the deactivate/activate cycle was for.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Mon, 2010-05-31 at 10:12 +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-31 at 10:00 +0200, Ingo Molnar wrote:
> > >
> > > NAK, aside from a few corner cases wakeup and sleep are the important
> > > points.
> > >
> > > The activate and deactivate functions are implementation details.
> >
> > Frederic, can you show us a concrete example of where we dont know what is
> > going on due to inadequate instrumentation? Can we fix that be extending the
> > existing tracepoints?
>
> Right, so a few of those corner cases I mentioned above are things like
> re-nice, PI-boosts etc.. Those use deactivate, modify task-state,
> activate cycles. so if you want to see those, we can add an explicit
> tracepoint for those actions.
>
> An explicit nice/PI-boost tracepoint is much clearer than trying to
> figure out wth the deactivate/activate cycle was for.

Another advantage of explicit tracepoints is that you'd see them even
for non-running tasks, because we only do the deactivate/activate thingy
for runnable tasks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on
On Mon, May 31, 2010 at 10:54:59AM +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-31 at 10:12 +0200, Peter Zijlstra wrote:
> > On Mon, 2010-05-31 at 10:00 +0200, Ingo Molnar wrote:
> > > >
> > > > NAK, aside from a few corner cases wakeup and sleep are the important
> > > > points.
> > > >
> > > > The activate and deactivate functions are implementation details.
> > >
> > > Frederic, can you show us a concrete example of where we dont know what is
> > > going on due to inadequate instrumentation? Can we fix that be extending the
> > > existing tracepoints?
> >
> > Right, so a few of those corner cases I mentioned above are things like
> > re-nice, PI-boosts etc.. Those use deactivate, modify task-state,
> > activate cycles. so if you want to see those, we can add an explicit
> > tracepoint for those actions.
> >
> > An explicit nice/PI-boost tracepoint is much clearer than trying to
> > figure out wth the deactivate/activate cycle was for.
>
> Another advantage of explicit tracepoints is that you'd see them even
> for non-running tasks, because we only do the deactivate/activate thingy
> for runnable tasks.


Yeah. So I agree with you that activate/deactivate are too much
implementation related, they even don't give much sense as we
don't know the cause of the event, could be a simple renice, or
could be a sleep.

So agreed, this sucks.

For the corner cases like re-nice and PI-boost or so, we can indeed plug
some higher level tracepoints there.

But there is one more important problem these tracepoints were solving and
that still need something:

We don't know when a task goes to sleep. We have two wait tracepoints,
sched_wait_task() to wait for a task to unschedule, and sched_process_wait()
that is a hooks for waitid and wait4 syscalls. So we are missing all
the event waiting from inside the kernel. But even with that, wait and sleep
doesn't mean the same thing. Sleeping don't always involve using the waiting
API.

I think we need such tracepoint:

diff --git a/kernel/sched.c b/kernel/sched.c
index 8c0b90d..5f67c04 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3628,8 +3628,10 @@ need_resched_nonpreemptible:
if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
if (unlikely(signal_pending_state(prev->state, prev)))
prev->state = TASK_RUNNING;
- else
+ else {
+ trace_sched_task_sleep(prev);
deactivate_task(rq, prev, DEQUEUE_SLEEP);
+ }
switch_count = &prev->nvcsw;
}


And if people need tracepoints in the events waiting API, we can add that
later.


And concerning the task waking up, if it is not migrated, it means it stays
on its orig cpu. This is something that can be dealt from the post-processing.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/