Prev: [PATCH] cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging.
Next: [PATCH] x86: let 'reservetop' functioning right
From: Andi Kleen on 24 Mar 2010 20:40 On Thu, Mar 25, 2010 at 12:08:23AM +0100, Thomas Gleixner wrote: > On Wed, 24 Mar 2010, Thomas Gleixner wrote: > > > On Wed, 24 Mar 2010, Andi Kleen wrote: > > > > > Prevent nested interrupts when the IRQ stack is near overflowing v2 > > > > > > Interrupts can always nest when they don't run with IRQF_DISABLED. > > > > > > When a lot of interrupts hit the same vector on the same > > > CPU nested interrupts can overflow the irq stack and cause hangs. > That's utter nonsense. An interrupt storm on the same vector does not > cause irq nesting. The irq code prevents reentering a handler and in Sorry it's the same CPU, not the same vector. Yes the reference to same vector was misleading. " Multiple vectors on a multi port NIC pointing to the same CPU, all hitting the irq stack until it overflows. " > case of MSI-X it just disables the IRQ when it comes again while the > first irq on that vector is still in progress. So the maximum nesting > is two up to handle_edge_irq() where it disables the IRQ and returns > right away. Real maximum nesting is all IRQs running with interrupts on pointing to the same CPU. Enough from multiple busy IRQ sources and you go boom. -Andi -- ak(a)linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 25 Mar 2010 08:00 > Pretty much the only 'core' driver today which enables IRQs in the irq > handlers and needs it is the old IDE layer. There are also a couple of Thanks. I'm tempted to just ignore it in this case, but in theory it might still have troubles if there are a lot of interrupts to the same CPU. I've only had a report on a very large system with a very high interrupt rate on a very fast NIC though, so presumably it's not too common. Anyways this patch will fix the problem for all drivers that do not explicitely enable interrupts, which is the overwhelming majority. -Andi -- ak(a)linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 25 Mar 2010 08:10 On Thu, Mar 25, 2010 at 12:16:17PM +0100, Thomas Gleixner wrote: > > Pretty much the only 'core' driver today which enables IRQs in the irq > > handlers and needs it is the old IDE layer. There are also a couple of > > drivers which play games with disable/enable_irq in the IRQ paths for > > other reasons (lack of irq threads when written and a hardware model thats > > totally SMP unfriendly). 8390 is the obvious one here and it at least > > would be far far saner using threaded IRQs and normal locking with IRQs > > unmasked. > > Right, but that's not the problem here. We talk about a (hopefully) > well written interrupt handler which runs for a very short > time. The NIC handlers can do quite some work under high traffic. Even with interrupt mitigation and NAPI. > What's the point of running it with interrupts enabled ? Other interrupts. > Nothing, we just run into stack overflow problems. So what's better: > an unreliable and ugly hackaround I don't think that's a accurate description of the patch at all. Besides I believe it's reliable in all cases that matter. -Andi -- ak(a)linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 25 Mar 2010 11:40 > > Well it's simply the current state of affairs today. I'm merely > > attempting to make the current state slightly safer without breaking > > anything in the process. > > Well, I'd agree if those stack overflows would be a massive reported > problem. At least the people who reported it to me thought it was a massive problem @) > Right now they happen with a weird test case which points out a > trouble spot. Multi vector NICs under heavy load. So why not go there > and change the handful of drivers to run their handlers with irqs > disabled? Ok, but afaik it's not that small a number: MSI-X support is getting more and more wide spread. It's pretty universal in the 10+GbitE space for space and starts getting deployed for block too. -Andi -- ak(a)linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 25 Mar 2010 14:30
> I think the patch as posted solves a real problem, but also perpetuates a bad > situation. > > At minimum we should print a (one-time) warning that some badness occured. > That would push us either in the direction of improving drivers, or towards > improving the generic code. What should a driver do to prevent that? I don't see what it could do short of castrating itself (like refusing to use multiple ports) As Linus says the driver doesn't know if setting IRQF_DISABLED is safe. -Andi -- ak(a)linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |