perf events finer grained context instrumentation / context exclusion [Kernel]

Prev: [PATCH v2 4/5] of/gpio: add default of_xlate function if device has a node pointer
Next: block: Allow drivers to implement BLKDISCARD and add BLKSECDISCARD

From: Ingo Molnar on 10 Jun 2010 02:30

* Frederic Weisbecker <fweisbec(a)gmail.com> wrote:

> Here is the new version of per context exclusion, based on hooks on
> irq_enter/irq_exit. I haven't observed slowdowns but I haven't actually
> measured the impact.

One thing that would be nice to see in this discussion is a comparison of
before/after perf stat --repeat runs.

Something like:

perf stat --repeat ./hackbench 5

Done with full stat, and then also done with hardirqs/softirqs excluded. (i.e.
task context stats only)

I.e. does the feature really give us the expected statistical stability in
results? Does it really exclude hardirq/softirq workloads, etc.?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 10 Jun 2010 06:20

* Frederic Weisbecker <fweisbec(a)gmail.com> wrote:

> Performance counter stats for './hackbench 5' (10 runs):
>
> 1313640764 instructions # 0,241 IPC ( +- 1,393% ) (scaled from 100,05%)
> 214737441 branches ( +- 0,948% )
>
> 1293802776 instructions # 0,245 IPC ( +- 0,343% )
> 209495435 branches ( +- 0,392% )

Indeed it's about 4 times less noise, not bad.

Cycles is fundamentally random.

> So yeah, the results look a bit better. Still not perfects:
>
> - we are still instrumenting the tiny parts between the true interrupt
> and irq_enter() (same for irq_exit() and the end). Same for softirqs.
>
> - random randomnesses...

Random randomness shouldnt occur for something like instructions or branches.

Could you try some 'must not be variable' workload, like:

taskset 1 ./hackbench 1

If the workload is pinned to a single CPU then it ought to not be variable at
all. (modulo things like hash chain lengths and slab caching details, but
those should not cause 0.4% kind of noise IMO)

Btw., we could try to record all branches of an execution (using BTS, of a
relatively short but static-length run), and see where the variance comes
from. I doubt the current BTS code is ready for that, but it would be 'the'
magic trace-from-hell that includes all execution of the task, recorded at the
hardware level.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: [PATCH v2 4/5] of/gpio: add default of_xlate function if device has a node pointer
Next: block: Allow drivers to implement BLKDISCARD and add BLKSECDISCARD