Prev: sched: Track and export per task [hard|soft]irq time
Next: [patch] mempolicy: ERR_PTR dereference in mpol_shared_policy_init()
From: Venkatesh Pallipadi on 25 May 2010 17:50 On Mon, May 24, 2010 at 11:35 PM, Peter Zijlstra <peterz(a)infradead.org> wrote: > On Mon, 2010-05-24 at 17:11 -0700, Venkatesh Pallipadi wrote: >> +void account_system_vtime(struct task_struct *tsk) >> +{ >> + � � � unsigned long flags; >> + � � � int cpu; >> + � � � u64 now; >> + >> + � � � local_irq_save(flags); >> + � � � cpu = task_cpu(tsk); >> + � � � now = sched_clock_cpu(cpu); >> + � � � if (hardirq_count()) >> + � � � � � � � tsk->hi_time += now - per_cpu(irq_start_time, cpu); >> + � � � else if (softirq_count()) >> + � � � � � � � tsk->si_time += now - per_cpu(irq_start_time, cpu); >> + >> + � � � per_cpu(irq_start_time, cpu) = now; >> + � � � local_irq_restore(flags); >> +} > > Right, so this gets called from irq_enter/exit() and __do_softirq(). > > The reason I never pressed onwards with this (I had patches to add IRQ > time accounting) is that it sucks terribly for anything falling back to > jiffies -- maybe find some smart way to disable the whole call when > there's no TSC available, preferably without adding conditionals, using > alternatives maybe? > > I guess you mostly side-stepped that by adding a IRQ_TIME_ACCOUNTING > config (but forgot to make it depend on X86_TSC). Yes. The TSC dependency is unfortunately not CONFIG time. Even with X86_TSC, there is support for tsc_disabled. So, this should be a run time conditional or an alternative. > Another thing I dislike about this account_system_vtime() is that its > the same call for both IRQ and SoftIRQ, leaving us to add conditionals > inside the call to figure out what context we got called from. > > [ Adedd the s390 and ppc guys who already use this stuff ] We can probably add a parameter on from where it is being called. Even with that we still need some conditionals to handle hardirq overlapping a softirq. That part kind of happens naturally with the current code. > Anyway, once we have this, please also add it to sched_rt_avg_update() > (which should really be called sched_!fair_avg_update()). You mean take out the hi and si time from rt_delta? > Also, did you measure the overhead of doing this? sched_clock_cpu() adds > a cmpxchg64 on all systems that don't have a rock solid TSC (ie. most of > todays machines). > > Another thing that would be real nice is if you could find a way to not > make all of this x86 specific. Yes. Thats one of the reason I made this based of sched_clock. May be we can have a has_fast_sched_clock feature that archs and opt in to that enables this with an alternative? overhead: My initial testing was on reliable TSC system, where sched_clock_cpu() takes <50 cycles. So, no noticable overhead there. I still have to run this on other platforms, will post the data once I do that. Thanks, Venki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |