From: Avi Kivity on 16 Jul 2010 08:40 On 07/14/2010 06:49 PM, Mathieu Desnoyers wrote: > Implements an alternative iret with popf and return so trap and exception > handlers can return to the NMI handler without issuing iret. iret would cause > NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 has to > copy the return instruction pointer to the top of the previous stack, issue a > popf, loads the previous esp and issue a near return (ret). > > It allows placing dynamically patched static jumps in asm gotos, which will be > used for optimized tracepoints, in NMI code since returning from a breakpoint > would be valid. Accessing vmalloc'd memory, which allows executing module code > or accessing vmapped or vmalloc'd areas from NMI context, would also be valid. > This is very useful to tracers like LTTng. > > This patch makes all faults, traps and exception safe to be called from NMI > context*except* single-stepping, which requires iret to restore the TF (trap > flag) and jump to the return address in a single instruction. Sorry, no kprobes > support in NMI handlers because of this limitation. This cannot be emulated > with popf/lret, because lret would be single-stepped. It does not apply to > "immediate values" because they do not use single-stepping. This code detects if > the TF flag is set and uses the iret path for single-stepping, even if it > reactivates NMIs prematurely. > You need to save/restore cr2 in addition, otherwise the following hits you - page fault - processor writes cr2, enters fault handler - nmi - page fault - cr2 overwritten I guess you would usually not notice the corruption since you'd just see a spurious fault on the page the NMI handler touched, but if the first fault happened in a kvm guest, then we'd corrupt the guest's cr2. But the whole thing strikes me as overkill. If it's 8k per-cpu, what's wrong with using a per-cpu pointer to a kmalloc() area? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on 16 Jul 2010 10:50 * Avi Kivity (avi(a)redhat.com) wrote: > On 07/14/2010 06:49 PM, Mathieu Desnoyers wrote: >> Implements an alternative iret with popf and return so trap and exception >> handlers can return to the NMI handler without issuing iret. iret would cause >> NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 has to >> copy the return instruction pointer to the top of the previous stack, issue a >> popf, loads the previous esp and issue a near return (ret). >> >> It allows placing dynamically patched static jumps in asm gotos, which will be >> used for optimized tracepoints, in NMI code since returning from a breakpoint >> would be valid. Accessing vmalloc'd memory, which allows executing module code >> or accessing vmapped or vmalloc'd areas from NMI context, would also be valid. >> This is very useful to tracers like LTTng. >> >> This patch makes all faults, traps and exception safe to be called from NMI >> context*except* single-stepping, which requires iret to restore the TF (trap >> flag) and jump to the return address in a single instruction. Sorry, no kprobes >> support in NMI handlers because of this limitation. This cannot be emulated >> with popf/lret, because lret would be single-stepped. It does not apply to >> "immediate values" because they do not use single-stepping. This code detects if >> the TF flag is set and uses the iret path for single-stepping, even if it >> reactivates NMIs prematurely. >> > > You need to save/restore cr2 in addition, otherwise the following hits you > > - page fault > - processor writes cr2, enters fault handler > - nmi > - page fault > - cr2 overwritten > > I guess you would usually not notice the corruption since you'd just see > a spurious fault on the page the NMI handler touched, but if the first > fault happened in a kvm guest, then we'd corrupt the guest's cr2. OK, just to make sure: you mean we'd have to save/restore the cr2 register at the beginning/end of the NMI handler execution, right ? The shouldn't we save/restore cr3 too ? > But the whole thing strikes me as overkill. If it's 8k per-cpu, what's > wrong with using a per-cpu pointer to a kmalloc() area? Well, it seems like all the kernel code calling "vmalloc_sync_all()" (which is much more than perf) can potentially cause large latencies, which could be squashed by allowing page faults in NMI handlers. This looks like a stronger argument to me. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 16 Jul 2010 11:40 > Well, it seems like all the kernel code calling "vmalloc_sync_all()" (which is > much more than perf) can potentially cause large latencies, which could be You need to fix all other code too that walks tasks lists to avoid all those. % gid for_each_process | wc -l In fact the mm-struct walk is cheaper than a task-list walk because there are always less than tasks. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on 16 Jul 2010 11:50 * Andi Kleen (andi(a)firstfloor.org) wrote: > > Well, it seems like all the kernel code calling "vmalloc_sync_all()" (which is > > much more than perf) can potentially cause large latencies, which could be > > You need to fix all other code too that walks tasks lists to avoid all those. > > % gid for_each_process | wc -l This can very well be done incrementally. And I agree, these should eventually targeted too, especially those which hold locks. We've already started hearing about tasklist lock live-locks in the past year, so I think we're pretty much at the point where it should be looked at. Thanks, Mathieu > > In fact the mm-struct walk is cheaper than a task-list walk because there > are always less than tasks. -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on 16 Jul 2010 12:50
On 07/16/2010 05:49 PM, Mathieu Desnoyers wrote: > >> You need to save/restore cr2 in addition, otherwise the following hits you >> >> - page fault >> - processor writes cr2, enters fault handler >> - nmi >> - page fault >> - cr2 overwritten >> >> I guess you would usually not notice the corruption since you'd just see >> a spurious fault on the page the NMI handler touched, but if the first >> fault happened in a kvm guest, then we'd corrupt the guest's cr2. >> > OK, just to make sure: you mean we'd have to save/restore the cr2 register > at the beginning/end of the NMI handler execution, right ? Yes. > The shouldn't we > save/restore cr3 too ? > > No, faults should not change cr3. >> But the whole thing strikes me as overkill. If it's 8k per-cpu, what's >> wrong with using a per-cpu pointer to a kmalloc() area? >> > Well, it seems like all the kernel code calling "vmalloc_sync_all()" (which is > much more than perf) can potentially cause large latencies, which could be > squashed by allowing page faults in NMI handlers. This looks like a stronger > argument to me. Why is that kernel code calling vmalloc_sync_all()? If it is only NMI which cannot take vmalloc faults, why bother? If not, why not? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |