Prev: [PATCH 2/2] inotify: send IN_UNMOUNT events
Next: Bluetooth: hidp: Add support for hidraw HIDIOCGFEATURE and HIDIOCSFEATURE
From: Andres Freund on 8 Jul 2010 17:00 Hi all, I recently got a dual-socket E5520 (only one cpu attached right now, problems where the same with both though) system where I regularly get errors like [ 288.281073] INFO: rcu_sched_state detected stall on CPU 1 (t=5890 jiffies) [ 288.281086] INFO: rcu_sched_state detected stall on CPU 5 (t=5890 jiffies) [ 288.281087] sending NMI to all CPUs: [ 288.281096] sending NMI to all CPUs: After deactivating all power saving mechanisms it seems to have gotten a bit more stable - it still crashes pretty reliably under io-load. Graphics-intensive work seems also be able trigger it reliably. The crashes also occured with the cheap on-board intel graphics card. Without the rcu debugging producing the messages above I pretty regularly get hangs or missing inputs regularly - at times ending fatal (no sysrq, no keyboard reaction) Normally I would try to do a bisect, but in this case I am in the unfortunate Sitation that with earlier kernels I get problems with other hardware (particularly the sas controller which currently holds the only disks). So I have no known good version to start from. Perhaps you have and Idea? dmesg of different, likely related crashes, lspci -v and my latest ..config are attached. As I am not sure what kernel code is actually causing the problem - the backtraces looked innocent enoug on a short, clueless glance - I dont know who to explicitly CC. As small additional datapoints: using latencytop I get latencies in the second area for various things (creating md request, creating block layer request, radeon_fence_wait). The problems seem to get more frequent after I enabled lockdep and RCU debugging - possibly simply making the race more likely? Thanks, Andres
From: Andres Freund on 8 Jul 2010 17:40 Err, > After deactivating all power saving mechanisms it seems to have gotten > a bit more stable - it still crashes pretty reliably under > io-load. Graphics-intensive work seems also be able trigger it > reliably. The crashes also occured with the cheap on-board intel > graphics card. Its not a intel one, but aspeed... Remembered the wrong system. Sorry. Also so that you dont have to read the full dmesg: Its 2.6.35-rc4 (reproduced it with 2.6.32 onwards). Andres -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andres Freund on 6 Aug 2010 16:20
On Thursday 08 July 2010 22:51:13 Andres Freund wrote: > Hi all, > > I recently got a dual-socket E5520 (only one cpu attached right now, > problems where the same with both though) system where I regularly get > errors like The (attached in the other msg) errors still occur with 2.6.35. Anything I can do to help? Andres -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |