Prev: DRM / radeon / KMS: Fix hibernation regression related to radeon PM (was: Re: [Regression, post-2.6.34] Hibernation broken on machines with radeon/KMS and r300)
Next: [PATCH] x86, UV: make kdump avoid stack dumps
From: Manfred Spraul on 25 Jun 2010 17:10 On 06/24/2010 09:22 PM, Luca Tettamanti wrote: > Argh, "seems" was indeed appropriate. Manfred your patch does > alleviate the problem but something is still wrong. I noticed (I'm > developing an ajax heavy web app) that sometimes an apache worker > hangs; I can reproduce the problem with ab (apache benchmark) and a > high concurrency level (I'm testing with 100 and 10k requests, and I > get only 2-5 dropped requests). This does not happen with 2.4.34. > Any idea on how I can debug this further? > > Are there still hung httpd processes? -- Manfred -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Luca Tettamanti on 26 Jun 2010 09:00 On Fri, Jun 25, 2010 at 11:09 PM, Manfred Spraul <manfred(a)colorfullife.com> wrote: > On 06/24/2010 09:22 PM, Luca Tettamanti wrote: >> >> Argh, "seems" was indeed appropriate. Manfred your patch does >> alleviate the problem but something is still wrong. I noticed (I'm >> developing an ajax heavy web app) that sometimes an apache worker >> hangs; I can reproduce the problem with ab (apache benchmark) and a >> high concurrency level (I'm testing with 100 and 10k requests, and I >> get only 2-5 dropped requests). This does not happen with 2.4.34. >> Any idea on how I can debug this further? >> >> > > Are there still hung httpd processes? They don't seem really hung as before, I see two different behaviours: * Near the end of the run ab is frozen for a few seconds, but in the end all requests are processed; however I see a few "length" errors, meaning that the received page does not match the expected content (I'm testing a static page): Concurrency Level: 100 Time taken for tests: 14.682 seconds Complete requests: 10000 Failed requests: 10 (Connect: 0, Receive: 0, Length: 10, Exceptions: 0) Write errors: 0 Total transferred: 79987946 bytes HTML transferred: 77190466 bytes Requests per second: 681.11 [#/sec] (mean) Time per request: 146.819 [ms] (mean) Time per request: 1.468 [ms] (mean, across all concurrent requests) Transfer rate: 5320.36 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 1 0.4 0 3 Processing: 3 29 428.9 15 14678 Waiting: 0 15 1.3 15 19 Total: 5 30 429.0 16 14682 ERROR: The median and mean for the initial connection time are more than twice the standard deviation apart. These results are NOT reliable. Percentage of the requests served within a certain time (ms) 50% 16 66% 17 75% 17 80% 17 90% 17 95% 17 98% 18 99% 18 100% 14682 (longest request) strace on apache shows: [pid 3787] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 3789] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 3788] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 3784] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 3783] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 3782] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 3239] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 3233] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 3238] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 3237] restart_syscall(<... resuming interrupted call ...> * Sometimes ab pauses at the end but not all requests are successful; ab says: apr_socket_recv: Connection timed out (110) Total of 9932 requests completed The problem seems more acute if there's other application running on the system (e.g. right now I have kde, firefox, thunderbird active and I see more errors; note that nothing is using much CPU time). I've taken a sysrq-t log during one of ab pauses, it's attached. Might be related: KDE (4.4) takes an usual amount of time to start up: the splash screen is fronzen for a while on the screen before the desktop is available (.xsession-error does not show anything interesting). Luca PS: I'll be offline until Monday 5th...
From: Manfred Spraul on 2 Jul 2010 12:20
Hi Luca, On 06/26/2010 02:52 PM, Luca Tettamanti wrote: > They don't seem really hung as before, I see two different behaviours: > * Near the end of the run ab is frozen for a few seconds, but in the > end all requests are processed; however I see a few "length" errors, > meaning that the received page does not match the expected content > (I'm testing a static page): > > Concurrency Level: 100 > Time taken for tests: 14.682 seconds > Complete requests: 10000 > Failed requests: 10 > (Connect: 0, Receive: 0, Length: 10, Exceptions: 0) > Write errors: 0 > Total transferred: 79987946 bytes > HTML transferred: 77190466 bytes > Requests per second: 681.11 [#/sec] (mean) > Time per request: 146.819 [ms] (mean) > Time per request: 1.468 [ms] (mean, across all concurrent requests) > Transfer rate: 5320.36 [Kbytes/sec] received > > Connection Times (ms) > min mean[+/-sd] median max > Connect: 0 1 0.4 0 3 > Processing: 3 29 428.9 15 14678 > Waiting: 0 15 1.3 15 19 > Total: 5 30 429.0 16 14682 > ERROR: The median and mean for the initial connection time are more > than twice the standard > deviation apart. These results are NOT reliable. > Percentage of the requests served within a certain time (ms) > 50% 16 > 66% 17 > 75% 17 > 80% 17 > 90% 17 > 95% 17 > 98% 18 > 99% 18 > 100% 14682 (longest request) > > strace on apache shows: > [pid 3787] restart_syscall(<... resuming interrupted call ...> <unfinished ...> > [pid 3789] restart_syscall(<... resuming interrupted call ...> <unfinished ...> > [pid 3788] restart_syscall(<... resuming interrupted call ...> <unfinished ...> > [pid 3784] restart_syscall(<... resuming interrupted call ...> <unfinished ...> > [pid 3783] restart_syscall(<... resuming interrupted call ...> <unfinished ...> > [pid 3782] restart_syscall(<... resuming interrupted call ...> <unfinished ...> > [pid 3239] restart_syscall(<... resuming interrupted call ...> <unfinished ...> > [pid 3233] restart_syscall(<... resuming interrupted call ...> <unfinished ...> > [pid 3238] restart_syscall(<... resuming interrupted call ...> <unfinished ...> > [pid 3237] restart_syscall(<... resuming interrupted call ...> > > > * Sometimes ab pauses at the end but not all requests are successful; ab says: > > apr_socket_recv: Connection timed out (110) > Total of 9932 requests completed > > The problem seems more acute if there's other application running on > the system (e.g. right now I have kde, firefox, thunderbird active and > I see more errors; note that nothing is using much CPU time). I've > taken a sysrq-t log during one of ab pauses, it's attached. > > On my system, #ab -n 100000 -c 1000 http://127.0.0.1/icons/poweredby.png Also produces hangs/failures. But this happens regardless of the kernel version: Stock 2.6.33, 2.6.33.5 from Fedora, 2.6.35-rc3, 2.6.35-rc3 with the semop changes reverted. Thus I'd propose to close the current bug. The IN_WAKEUP fix fixes the semop() and especially semtimedop() failures. -- Manfred -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |