From: Arjan van de Ven on 30 Jan 2010 19:50 On Sat, 30 Jan 2010 18:35:49 -0600 Shawn Bohrer <shawn.bohrer(a)gmail.com> wrote: \ > > I agree that we are currently depending on a bug in epoll. The epoll > implementation currently rounds up to the next jiffie, so specifying a > timeout of 1 ms really just wakes the process up at the next timer > tick. I have a patch to fix epoll by converting it to use > schedule_hrtimeout_range() that I'll gladly send, but I still need a > way to achieve the same thing. it's not going to help you; your expectation is incorrect. you CANNOT get 1000 iterations per second if you do <wait 1 msec> <do a bunch of work> <wait 1 msec> etc in a loop the more accurate (read: not rounding down) the implementation, the more not-1000 you will get, because to hit 1000 the two actions <wait 1 msec> <do a bunch of work> combined are not allowed to take more than 1000 microseconds wallcock time. Assuming "do a bunch of work" takes 100 microseconds, for you to hit 1000 there would need to be 900 microseconds in a milliseconds... and sadly physics don't work that way. (and that's even ignoring various OS, CPU wakeup and scheduler contention overheads) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Shawn Bohrer on 30 Jan 2010 19:50 On Sat, Jan 30, 2010 at 06:35:49PM -0600, Shawn Bohrer wrote: > On Sat, Jan 30, 2010 at 04:11:14PM -0800, Arjan van de Ven wrote: > > On Sat, 30 Jan 2010 17:45:51 -0600 > > Shawn Bohrer <shawn.bohrer(a)gmail.com> wrote: > > > > > Hello, > > > > > > Currently we have a workload that depends on around 50 processes that > > > wake up 1000 times a second do a small amount of work and go back to > > > sleep. This works great on RHEL 5 (2.6.18-164.6.1.el5), but on recent > > > kernels we are unable to achieve 1000 iterations per second. Using > > > the simple test application below on RHEL 5 2.6.18-164.6.1.el5 I can > > > run 500 of these processes on and still achieve 999.99 iterations per > > > second. Running just 10 of these processes on the same machine with > > > 2.6.32.6 produces results like: > > > ] > > > > there's an issue with your expectation btw. > > what your application does, in practice is > > > > <wait 1 millisecond> > > <do a bunch of work> > > <wait 1 millisecond> > > <do a bunch of work> > > etc > > > > you would only be able to get close to 1000 per second if "bunch of > > work" is nothing.....but it isn't. > > so lets assume "bunch of work" is 100 microseconds.. the basic period > > of your program (ignoring any costs/overhead in the implementation) > > is 1.1 milliseconds, which is approximately 909 per second, not 1000! > > > > I suspect that the 1000 you get on RHEL5 is a bug in the RHEL5 kernel > > where it gives you a shorter delay than what you asked for; since it's > > clearly not a correct number to get. > > > > (and yes, older kernels had such rounding bugs, current kernels go > > through great length to give applications *exactly* the delay they are > > asking for....) > > I agree that we are currently depending on a bug in epoll. The epoll > implementation currently rounds up to the next jiffie, so specifying a > timeout of 1 ms really just wakes the process up at the next timer tick. > I have a patch to fix epoll by converting it to use > schedule_hrtimeout_range() that I'll gladly send, but I still need a way > to achieve the same thing. I guess I should add that I think we could achieve the same effect by adding a 1 ms (or less) periodic timerfd to our epoll set. However, it still appears that newer kernels have a much larger scheduler delay and I still need a way to fix that in order for us to move to a newer kernel. -- Shawn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Shawn Bohrer on 30 Jan 2010 22:50 On Sat, Jan 30, 2010 at 04:47:16PM -0800, Arjan van de Ven wrote: > On Sat, 30 Jan 2010 18:35:49 -0600 > Shawn Bohrer <shawn.bohrer(a)gmail.com> wrote: > \ > > > > I agree that we are currently depending on a bug in epoll. The epoll > > implementation currently rounds up to the next jiffie, so specifying a > > timeout of 1 ms really just wakes the process up at the next timer > > tick. I have a patch to fix epoll by converting it to use > > schedule_hrtimeout_range() that I'll gladly send, but I still need a > > way to achieve the same thing. > > it's not going to help you; your expectation is incorrect. > you CANNOT get 1000 iterations per second if you do > > <wait 1 msec> > <do a bunch of work> > <wait 1 msec> > etc in a loop > > the more accurate (read: not rounding down) the implementation, the > more not-1000 you will get, because to hit 1000 the two actions Of course that patch makes my situation worse, which was my point. We are depending on the _current_ epoll_wait() implementation which calls schedule_timeout(1). You do agree that the current epoll_wait() implementation sleeps less than 1 msec with HZ == 1000 correct? So as long as: work + scheduling_overhead < 1 msec We _should_ be able to achieve 1000 iterations per second. I also realize that with multiple worker processes I need: (total_work + scheduling_overhead)/number_cpus < 1 msec With the old kernel I can run 500 of these processes, and I'm hoping that I'm simply missing the knob I need to tweak to achieve similar performance on a recent kernel. Thanks, Shawn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on 31 Jan 2010 13:30 On Sat, 30 Jan 2010 21:47:18 -0600 Shawn Bohrer <shawn.bohrer(a)gmail.com> wrote: > > Of course that patch makes my situation worse, which was my point. We > are depending on the _current_ epoll_wait() implementation which calls > schedule_timeout(1). > You do agree that the current epoll_wait() > implementation sleeps less than 1 msec with HZ == 1000 correct? I agree with your hypothesis, but I wouldn't call the behavior correct ;-) First of all, jiffies based timeouts are supposed to round *up*, not down, and second.. it should really be just 1 msec. > With the old kernel I can run 500 of these processes, and I'm hoping > that I'm simply missing the knob I need to tweak to achieve similar > performance on a recent kernel. can you run powertop during your workload? maybe you're getting hit by some C state exit latencies tilting the rounding over the top just too many times... -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Shawn Bohrer on 31 Jan 2010 16:00 On Sun, Jan 31, 2010 at 10:28:46AM -0800, Arjan van de Ven wrote: > can you run powertop during your workload? maybe you're getting hit by > some C state exit latencies tilting the rounding over the top just too > many times... Running 50 of the example processes and powertop on 2.6.32.7, with the performance cpu governor: Cn Avg residency P-states (frequencies) C0 (cpu running) (24.7%) polling 0.0ms ( 0.0%) C1 mwait 0.3ms ( 0.0%) C3 mwait 0.8ms (75.3%) Wakeups-from-idle per second : 980.1 interval: 10.0s no ACPI power usage estimate available Top causes for wakeups: 76.2% (45066.9) worker_process : sys_epoll_wait (process_timeout) 22.0% (13039.2) <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 1.5% (892.7) kipmi0 : schedule_timeout_interruptible (process_timeout) 0.2% (105.0) <kernel core> : add_timer (smi_timeout) 0.0% ( 10.5) <interrupt> : ata_piix 0.0% ( 10.1) <kernel core> : ipmi_timeout (ipmi_timeout) I also tried booting with processor.max_cstate=0 which causes powertop to no longer show any cstate information but I assumed that would keep me fixed at C0. Booting with processor.max_cstate=0 didn't seem to make any difference. Thanks, Shawn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: [PATCH 3/3] net: macvtap driver Next: recti tude foods tuff alkal izes |