Prev: [PATCH] serial: Add OMAP high-speed UART driver.
Next: [RFC PATCH] CIFS posix acl permission checking
From: Divyesh Shah on 1 Mar 2010 10:50 On Sat, Feb 27, 2010 at 3:43 AM, Peter Zijlstra <peterz(a)infradead.org> wrote: > > On Fri, 2010-02-26 at 18:03 -0800, Divyesh Shah wrote: > > This can be used by applications to get finer granularity cputime usage on > > platforms that use timestamp counters or HPET. > > I guess the patch looks good, I'm just not sure what HPET got to do with > anything.. the scheduler certainly doesn't use HPET for timekeeping, its > terribly slow to read. Yes you're right. Please ignore the HPET comment. > > Also, it would be good to get some more justification than 'some > applications can use this', which is basically a truism for any patch > that adds a user interface. 1) This should be useful for a shared or virtualized environment where the shared management infrastructure needs to charge each application as accurately as possible for cpu usage. 2) An application that services different users and does some cpu-intensive work may want to measure cpu time spent for each user by the serving thread. I think applications like web servers, remote database servers, etc. fit into the second category. For units of work smaller than a jiffy, this really helps as some threads could potentially hide from the jiffy based accounting. Please let me know if you want me to send the patch again with the corrected description and added justification. > > > Signed-off-by: Divyesh Shah<dpshah(a)google.com> > > --- > > > > �fs/proc/array.c � �| � 40 ++++++++++++++++++++++++++++++++++++++++ > > �fs/proc/base.c � � | � �2 ++ > > �fs/proc/internal.h | � �2 ++ > > �3 files changed, 44 insertions(+), 0 deletions(-) > > > > diff --git a/fs/proc/array.c b/fs/proc/array.c > > index 13b5d07..54604b8 100644 > > --- a/fs/proc/array.c > > +++ b/fs/proc/array.c > > @@ -547,3 +547,43 @@ int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns, > > > > � � � return 0; > > �} > > + > > +static int do_task_cputime(struct task_struct *task, char * buffer, int whole) > > +{ > > + � � int res; > > + � � unsigned long flags; > > + � � unsigned long long sum_exec_runtime = 0; > > + � � struct task_struct *t; > > + > > + � � if (lock_task_sighand(task, &flags)) { > > + � � � � � � if (whole) { > > + � � � � � � � � � � t = task; > > + � � � � � � � � � � /* > > + � � � � � � � � � � �* Add up live thread sum_exec_runtime at the group > > + � � � � � � � � � � �* level. > > + � � � � � � � � � � �*/ > > + � � � � � � � � � � do { > > + � � � � � � � � � � � � � � sum_exec_runtime += t->se.sum_exec_runtime; > > + � � � � � � � � � � � � � � t = next_thread(t); > > + � � � � � � � � � � } while (t != task); > > + � � � � � � � � � � sum_exec_runtime += task->signal->sum_sched_runtime; > > + � � � � � � } > > + � � � � � � unlock_task_sighand(task, &flags); > > + � � } > > + > > + � � if (!whole) > > + � � � � � � sum_exec_runtime = task->se.sum_exec_runtime; > > + > > + � � res = sprintf(buffer,"%llu\n", sum_exec_runtime); > > + � � return res; > > +} > > + > > +int proc_tid_cputime(struct task_struct *task, char * buffer) > > +{ > > + � � return do_task_cputime(task, buffer, 0); > > +} > > + > > +int proc_tgid_cputime(struct task_struct *task, char * buffer) > > +{ > > + � � return do_task_cputime(task, buffer, 1); > > +} > > diff --git a/fs/proc/base.c b/fs/proc/base.c > > index 58324c2..8fbc785 100644 > > --- a/fs/proc/base.c > > +++ b/fs/proc/base.c > > @@ -2595,6 +2595,7 @@ static const struct pid_entry tgid_base_stuff[] = { > > � � � REG("numa_maps", �S_IRUGO, proc_numa_maps_operations), > > �#endif > > � � � REG("mem", � � � �S_IRUSR|S_IWUSR, proc_mem_operations), > > + � � INF("cputime_ns", S_IRUGO, proc_tgid_cputime), > > � � � LNK("cwd", � � � �proc_cwd_link), > > � � � LNK("root", � � � proc_root_link), > > � � � LNK("exe", � � � �proc_exe_link), > > @@ -2930,6 +2931,7 @@ static const struct pid_entry tid_base_stuff[] = { > > � � � REG("numa_maps", S_IRUGO, proc_numa_maps_operations), > > �#endif > > � � � REG("mem", � � � S_IRUSR|S_IWUSR, proc_mem_operations), > > + � � INF("cputime_ns", S_IRUGO, proc_tid_cputime), > > � � � LNK("cwd", � � � proc_cwd_link), > > � � � LNK("root", � � �proc_root_link), > > � � � LNK("exe", � � � proc_exe_link), > > diff --git a/fs/proc/internal.h b/fs/proc/internal.h > > index 1f24a3e..f9e9799 100644 > > --- a/fs/proc/internal.h > > +++ b/fs/proc/internal.h > > @@ -51,6 +51,8 @@ extern int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, > > � � � � � � � � � � � � � � � struct pid *pid, struct task_struct *task); > > �extern int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns, > > � � � � � � � � � � � � � � � struct pid *pid, struct task_struct *task); > > +extern int proc_tid_cputime(struct task_struct *, char *); > > +extern int proc_tgid_cputime(struct task_struct *, char *); > > �extern loff_t mem_lseek(struct file *file, loff_t offset, int orig); > > > > �extern const struct file_operations proc_maps_operations; > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Divyesh Shah on 22 Mar 2010 15:00
On Mon, Mar 1, 2010 at 10:05 AM, Peter Zijlstra <peterz(a)infradead.org> wrote: > On Mon, 2010-03-01 at 07:44 -0800, Divyesh Shah wrote: >> On Sat, Feb 27, 2010 at 3:43 AM, Peter Zijlstra <peterz(a)infradead.org> wrote: >> > >> > On Fri, 2010-02-26 at 18:03 -0800, Divyesh Shah wrote: >> > > This can be used by applications to get finer granularity cputime usage on >> > > platforms that use timestamp counters or HPET. >> > >> > I guess the patch looks good, I'm just not sure what HPET got to do with >> > anything.. the scheduler certainly doesn't use HPET for timekeeping, its >> > terribly slow to read. >> >> Yes you're right. Please ignore the HPET comment. >> >> > >> > Also, it would be good to get some more justification than 'some >> > applications can use this', which is basically a truism for any patch >> > that adds a user interface. >> >> 1) This should be useful for a shared or virtualized environment where >> the shared management infrastructure needs to charge each application >> as accurately as possible for cpu usage. >> 2) An application that services different users and does some >> cpu-intensive work may want to measure cpu time spent for each user by >> the serving thread. >> >> I think applications like web servers, remote database servers, etc. >> fit into the second category. >> >> For units of work smaller than a jiffy, this really helps as some >> threads could potentially hide from the jiffy based accounting. >> >> Please let me know if you want me to send the patch again with the >> corrected description and added justification. > > > Its all should and may, anything concrete? Both of the use cases above are what we've been using (the shared environment case) and plan on using this interface for at Google. So they are indeed concrete use-cases. It should be useful for similar scenarios to other users in the linux community. > > Also, doesn't muck like schedstat, task_delay_accounting, cpuacct-cgroup > or some other stuff expose this number already? I took a look at /proc/<pid>/schedstat. It does export this number along with some other info but it doesn't do any aggregation of threads when the pid is a thread group leader and also is less accurate than cputime_ns as it does not include the cputime since last tick. I have a patch that adds both of these but it I am not sure if this affects any userspace tools which depend on the semantics of the existing interface. Please advise whether it makes sense to change the shcedstat interface or add this one. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |