Prev: [PATCH] enable readback to get HPET working on ATI SB4x00, kernel 2.6.35_rc5
Next: input: Fix wrong dimensions check for synaptics
From: Linus Torvalds on 15 Jul 2010 14:50 On Thu, Jul 15, 2010 at 11:43 AM, Linus Torvalds <torvalds(a)linux-foundation.org> wrote: > > But maybe I'm missing something. Hmm. Of course - one way of solving this might be to just make the 32-bit case switch stacks in software. That might be a good idea regardless, and would not be complicated. We already do that for sysenter, but the NMI case would be simpler because we don't need to worry about being re-entered by NMI/DEBUG during the stack switch. And since we have to play some games with moving the data on the stack around _anyway_, doing the whole "switch stacks entirely rather than just subtract a bit from the old stack" would be fairly logical. So I think you may end up being right: we don't need to save the original NMI stack pointer, because we can make sure that the replacement stack (that we need anyway) is always deterministic. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 15 Jul 2010 15:10 On Thu, Jul 15, 2010 at 11:31 AM, Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> wrote: > > Hrm, we could probably get away with only keeping the nmi_stack_nested per-cpu > variable. The nmi_stack_ptr could be known statically if we set it at a fixed > offset from the bottom of stack rather than using an offset relative to the top > (which can change depending if we are nested over the kernel or userspace). > We just have to reserve enough space for the bottom of stack. I thought about trying that, but I don't think it's true. At least not for the 32-bit case. The thing is, the 32-bit case will re-use the kernel stack if it happens in kernel space, and will thus start from a random space (and won't push all the information anyway). So a nested NMI really doesn't know where the original NMI stack is to be found unless we save it off. In the case of x86-64, I think the stack will always be at a fixed address, and push a fixed amount of data (because we use the IST thing). So there you could probably just use the flag, but you'd still have to handle the 32-bit case, and quite frankly, I think it would be much nicer if the logic could be shared for the 32-bit and 64-bit cases. But maybe I'm missing something. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on 15 Jul 2010 16:50 On 07/15/2010 10:38 AM, Mathieu Desnoyers wrote: >> >> Of course, if there is some trap that re-enables interrupts even if >> the trap happened in an interrupt-disabled region, then that would >> change things, but that would be a bad bug regardless (and totally >> independently of any NMI issues). So in that sense it's a "could >> happen", but it's something that would be a totally separate bug. > > Yep, this kind of scenario would therefore be a bug that does not belong to the > specific realm of nmis. > Yes, the only specific issue here is NMI -> trap -> IRET -> [nested NMI], which is what this whole thing is about. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on 15 Jul 2010 18:10 Hi Linus, I modified your code, intenting to handle the fake NMI entry gracefully given that NMIs are not necessarily disabled at the entry point. It uses a "need fake NMI" flag rather than playing games with CS and faults. When a fake NMI is needed, it simply jumps back to the beginning of regular nmi code. NMI exit code and fake NMI entry are made reentrant with respect to NMI handler interruption by testing, at the very beginning of the NMI handler, if a NMI is nested over the whole nmi_atomic .. nmi_atomic_end code region. This code assumes NMIs have a separate stack. This code is still utterly untested and might eat your Doritos, only provided for general enjoyment. Thanks, Mathieu # # Two per-cpu variables: a "are we nested" flag (one byte). # a "do we need to execute a fake NMI" flag (one byte). # The %rsp at which the stack copy is saved is at a fixed address, which leaves # enough room at the bottom of NMI stack for the "real" NMI entry stack. This # assumes we have a separate NMI stack. # The NMI stack copy top of stack is at nmi_stack_copy. # The NMI stack copy "rip" is at nmi_stack_copy_rip, which is set to # nmi_stack_copy-32. # nmi: # Test if nested over atomic code. cmpq $nmi_atomic,0(%rsp) jae nmi_addr_is_ae # Test if nested over general NMI code. cmpb $0,%__percpu_seg:nmi_stack_nesting jne nmi_nested_set_fake_and_return # create new stack is_unnested_nmi: # Save some space for nested NMI's. The exception itself # will never use more space, but it might use less (since # if will be a kernel-kernel transition). # Save %rax on top of the stack (need to temporarily use it) pushq %rax movq %rsp, %rax movq $nmi_stack_copy,%rsp # copy the five words of stack info. rip starts at 8+0(%rax). pushq 8+32(%rax) # ss pushq 8+24(%rax) # rsp pushq 8+16(%rax) # eflags pushq 8+8(%rax) # cs pushq 8+0(%rax) # rip movq 0(%rax),%rax # restore %rax set_nmi_nesting: # and set the nesting flags movb $0xff,%__percpu_seg:nmi_stack_nesting regular_nmi_code: ... # regular NMI code goes here, and can take faults, # because this sequence now has proper nested-nmi # handling ... nmi_atomic: # An NMI nesting over the whole nmi_atomic .. nmi_atomic_end region will # be handled specially. This includes the fake NMI entry point. cmpb $0,%__percpu_seg:need_fake_nmi jne fake_nmi movb $0,%__percpu_seg:nmi_stack_nesting iret # This is the fake NMI entry point. fake_nmi: movb $0x0,%__percpu_seg:need_fake_nmi jmp regular_nmi_code nmi_atomic_end: # Make sure the address is in the nmi_atomic range and in CS segment. nmi_addr_is_ae: cmpq $nmi_atomic_end,0(%rsp) jae is_unnested_nmi # The saved rip points to the final NMI iret. Check the CS segment to # make sure. cmpw $__KERNEL_CS,8(%rsp) jne is_unnested_nmi # This is the case when we hit just as we're supposed to do the atomic code # of a previous nmi. We run the NMI using the old return address that is still # on the stack, rather than copy the new one that is bogus and points to where # the nested NMI interrupted the original NMI handler! # Easy: just set the stack pointer to point to the stack copy, clear # need_fake_nmi (because we are directly going to execute the requested NMI) and # jump to "nesting flag set" (which is followed by regular nmi code execution). movq $nmi_stack_copy_rip,%rsp movb $0x0,%__percpu_seg:need_fake_nmi jmp set_nmi_nesting # This is the actual nested case. Make sure we branch to the fake NMI handler # after this handler is done. nmi_nested_set_fake_and_return: movb $0xff,%__percpu_seg:need_fake_nmi popfq jmp *(%rsp) -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 15 Jul 2010 18:20
On Thu, Jul 15, 2010 at 3:01 PM, Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> wrote: > > . NMI exit code > and fake NMI entry are made reentrant with respect to NMI handler interruption > by testing, at the very beginning of the NMI handler, if a NMI is nested over > the whole nmi_atomic .. nmi_atomic_end code region. That is totally bogus. The NMI can be nested by exceptions and function calls - the whole _point_ of this thing. So testing "rip" for anything else than the specific final "iret" is meaningless. You will be in an NMI region regardless of what rip is. > This code assumes NMIs have a separate stack. It also needs to be made per-cpu (and the flags be per-cpu). Then you could in fact possibly test the stack pointer for whether it is in the NMI stack area, and use the value of %rsp itself as the flag. So you could avoid the flag entirely. Because testing %rsp is valid - testing %rip is not. That would also avoid the race, because %rsp (as a flag) now gets cleared atomically by the "iret". So that might actually solve things. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |