From: Andi Kleen on 16 Jul 2010 18:10 > And the thing is, if we just do NMI's correctly, and allow nesting, > ALL THOSE PROBLEMS GO AWAY. And there is no reason what-so-ever to do > stupid things elsewhere. One issue I have with nesting NMIs is that you need a nesting limit, otherwise you'll overflow the NMI stack. We just got rid of nesting for normal interrupts because of this stack overflow problem which hit in real situations. In some cases you can get quite high NMI frequencies, e.g. with performance counters. Now the current performance counter handlers do not nest by themselves of course, but they might nest with other longer running NMI users. I think none of the current handlers are likely to nest for very long, but there's more and more NMI coded all the time, so it's definitely a concern. -Andi -- ak(a)linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 16 Jul 2010 18:30 On Fri, Jul 16, 2010 at 3:02 PM, Jeffrey Merkey <jeffmerkey(a)gmail.com> wrote: > > So Linus, my understanding of Intel's processor design is that the > processor will NEVER singal a nested NMI until it sees an iret from > the first NMI exception. Wrong. I like x86, but it has warts. The NMI blocking is one of them. The NMI's will be nested until the _next_ "iret", but it has no nesting. So if you take a fault during the NMI (debug, page table fixup, whatever), the iret in the faulthandler will re-enable NMI's even though we're still busy with the original NMI. There is no nesting, or any way to say that "this is a NMI-releasing iret". They could even do it still - make a new "iret that doesn't clear NMI" by adding a segment override prefix to iret or whatever. But it's not going to happen, and it's just one of those ugly special cases that has various historical reasons (recursive faults during NMI sure as hell didn't make sense back in the real-mode 8086 days). So we have to handle it in software. Or not ever trap at all inside the NMI handler. The original patch - and the patch I detest - is to make the normal fault paths use a "popf + ret" to emulate iret, but without the NMI release. Now, I could live with that if it's the only solution, but it _is_ pretty damn ugly. If somebody shows that it's actually faster to do "popf + ret" when retuning to kernel space (a poor mans special-case iret), maybe it would be worth it, but the really critical code sequence is actually not "return to kernel space", but the "return to user space" case that really wants the iret. And I just think it's disgusting to add extra tests to that path. The other alternative would be to just make the rule be "NMI can never take traps". It's possible to do that, but quite frankly, it's a pain. It's a pain for page faults due to the whole vmalloc thing, and it's a pain if you ever want to debug an NMI in any way (or put a breakpoint on anything that is accessed from an NMI, which could potentially be quite a lot of things). If it was just the debug issue, I'd say "neener neener, debuggers are for wimps", but it's clearly not just about debug. It's a whole lot of other thigs. Random percpu datastructures used for tracing, kernel pointer verification code, yadda yadda. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 16 Jul 2010 18:40 On Fri, Jul 16, 2010 at 3:07 PM, Andi Kleen <andi(a)firstfloor.org> wrote: > > One issue I have with nesting NMIs is that you need > a nesting limit, otherwise you'll overflow the NMI stack. Have you actually looked at the suggestion I (and now Mathieu) suggested code for? The nesting is very limited. NMI's would nest just once, and when that happens, the nested NMI would never use more than something like a hundred bytes of stack (most of which is what the CPU pushes directly). And there would be no device interrupts that nest, and practically the faults that nest obviously aren't going to be complex faults either (ie the page fault would be the simple case that never calls to 'handle_vm_fault()', but handles it all in arch/x86/mm/fault.c. IOW, there is absolutely _no_ issues with nesting. It's two levels deep, and a much smaller stack footprint than our regular exception nesting for those two levels too. And your argument that there would be more and more NMI usage only makes it more important that we handle NMI's without going crazy. Just handle them cleanly instead of making them something totally special. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on 16 Jul 2010 18:50 * Andi Kleen (andi(a)firstfloor.org) wrote: > > And the thing is, if we just do NMI's correctly, and allow nesting, > > ALL THOSE PROBLEMS GO AWAY. And there is no reason what-so-ever to do > > stupid things elsewhere. > > One issue I have with nesting NMIs is that you need > a nesting limit, otherwise you'll overflow the NMI stack. > > We just got rid of nesting for normal interrupts because > of this stack overflow problem which hit in real situations. > > In some cases you can get quite high NMI frequencies, e.g. with > performance counters. Now the current performance counter handlers > do not nest by themselves of course, but they might nest > with other longer running NMI users. > > I think none of the current handlers are likely to nest > for very long, but there's more and more NMI coded all the time, > so it's definitely a concern. We're not proposing to actually "nest" NMIs per se. We copy the stack at the beginning of the NMI handler (and then use the copy) to permit nesting of faults over NMI handlers. Following NMIs that would "try" to nest over the NMI handler would see their regular execution postponed until the end of the currently running NMI handler. It's OK for these "nested" NMI handlers to use the bottom of NMI stack because the NMI handler on which they are trying to nest is only using the stack copy. These "nested" handlers return to the original NMI handler very early just after setting a "pending nmi" flag. There is more to it (e.g. handling NMI handler exit atomically with respect to incoming NMIs); please refer to the last assembly code snipped I sent to Linus a little earlier today for details. Thanks, Mathieu > > -Andi > > -- > ak(a)linux.intel.com -- Speaking for myself only. -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 16 Jul 2010 18:50
On Fri, Jul 16, 2010 at 03:26:32PM -0700, Linus Torvalds wrote: > On Fri, Jul 16, 2010 at 3:07 PM, Andi Kleen <andi(a)firstfloor.org> wrote: > > > > One issue I have with nesting NMIs is that you need > > a nesting limit, otherwise you'll overflow the NMI stack. > > Have you actually looked at the suggestion I (and now Mathieu) > suggested code for? Maybe I'm misunderstanding everything (and it has been a lot of emails in the thread), but the case I was thinking of would be if the second NMI faults too, and then another one comes in after the IRET etc. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |