Prev: [PATCH] via-velocity: FLOW_CNTL_RX does not disable Asymmetric pause in set_mii_flow_control()
Next: [PATCH] bfa: wrong fcport H2I message tested in bfa_fcport_isr()
From: Danny Cox on 22 Mar 2010 16:50 Kernel Gurus, A colleague of mine is experiencing severe Denial Of Service multiple times a day. When a disk intensive process is started, in our case, a Subversion check out, we observe the load average spike from 6 to 10, all CPUs are at idle, but the IO wait time is in the thousands of milliseconds (1500 - 3000). If we wait long enough, the load average begins to drop, but will hover around 5 for a couple of minutes. Afterward, it will quickly drop toward 0. The machine is fairly new, having been purchased in the December timeframe. It uses an ASUS P7P55D-LE with an Intel Core I7 860 with 8 GB of ram. It is running Ubuntu 9.10 with all patches applied. The kernel is 2.6.31-20. It uses two WD 500 GB Caviar green drives, with software RAID1 on 3 of the partitions: /, /boot, and /home. During the hang time, almost nothing can be started. We've been using top, atop, vmstat, and the Gnome system monitor to see what's occurring. Our only hints are the high load average, and the I/O wait times. At this point, Google has been unable to provide answers, and my colleague is ready to perform physical violence on the system. I don't even know what I can or should measure next. Hints are welcome. A solution would be even better, if any of the above strikes a chord. One other data point: we have 5+ other identical systems, none of which have this issue. My colleague notes that his system was fine for a couple of weeks. It is possible that he installed some package that causes this behavior. That's merely speculation, of course. Please include me in the CC: header, as I'm not subscribed to linux-kernel. I'd like to be, but the volume is too much. Thanks for your time! P.S. Things we've tried: * move the drives to another (identical) machine. The issue persists. * disable one of the drives in the RAID. The issue persists. -- Danny Cox 770-236-6148 Cisco Service Provider Video Technology Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |