Prev: SNATed connections show as original ip in /proc/net/tcp
Next: KVM: MMU: introduce gfn_to_page_many_atomic() function
From: Avi Kivity on 11 Jul 2010 08:30 On 07/06/2010 01:44 PM, Xiao Guangrong wrote: > In the speculative path, we should check guest pte's reserved bits just as > the real processor does > > Reported-by: Marcelo Tosatti<mtosatti(a)redhat.com> > Signed-off-by: Xiao Guangrong<xiaoguangrong(a)cn.fujitsu.com> > --- > arch/x86/kvm/mmu.c | 3 +++ > arch/x86/kvm/paging_tmpl.h | 3 ++- > 2 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 104756b..3dcd55d 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -2781,6 +2781,9 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > break; > } > > + if (is_rsvd_bits_set(vcpu, gentry, PT_PAGE_TABLE_LEVEL)) > + gentry = 0; > + > That only works if the gpte is for the same mode as the current vcpu mmu mode. In some cases it is too strict (vcpu in pae mode writing a 32-bit gpte), which is not too bad, in some cases it is too permissive (vcpu in nonpae mode writing a pae gpte). (once upon a time mixed modes were rare, only on OS setup, but with nested virt they happen all the time). > mmu_guess_page_from_pte_write(vcpu, gpa, gentry); > spin_lock(&vcpu->kvm->mmu_lock); > if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter) > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > index dfb2720..19f0077 100644 > --- a/arch/x86/kvm/paging_tmpl.h > +++ b/arch/x86/kvm/paging_tmpl.h > @@ -628,7 +628,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, > pte_gpa = first_pte_gpa + i * sizeof(pt_element_t); > > if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa,&gpte, > - sizeof(pt_element_t))) > + sizeof(pt_element_t)) || > + is_rsvd_bits_set(vcpu, gpte, PT_PAGE_TABLE_LEVEL)) > return -EINVAL; > This is better done a few lines down where we check for !is_present_gpte(), no? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Xiao Guangrong on 11 Jul 2010 22:50 Avi Kivity wrote: >> + if (is_rsvd_bits_set(vcpu, gentry, PT_PAGE_TABLE_LEVEL)) >> + gentry = 0; >> + >> > > That only works if the gpte is for the same mode as the current vcpu mmu > mode. In some cases it is too strict (vcpu in pae mode writing a 32-bit > gpte), which is not too bad, in some cases it is too permissive (vcpu in > nonpae mode writing a pae gpte). > Avi, thanks for your review. Do you mean that the VM has many different mode vcpu? For example, both nonpae vcpu and pae vcpu are running in one VM? I forgot to consider this case. > (once upon a time mixed modes were rare, only on OS setup, but with > nested virt they happen all the time). I'm afraid it's still has problem, it will cause access corruption: 1: if nonpae vcpu write pae gpte, it will miss NX bit 2: if pae vcpu write nonpae gpte, it will add NX bit that over gpte's width How about only update the shadow page which has the same pae set with the written vcpu? Just like this: @@ -3000,6 +3000,10 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, while (npte--) { entry = *spte; mmu_pte_write_zap_pte(vcpu, sp, spte); + + if (!!is_pae(vcpu) != sp->role.cr4_pae) + continue; + if (gentry) mmu_pte_write_new_pte(vcpu, sp, spte, &gentry); > >> mmu_guess_page_from_pte_write(vcpu, gpa, gentry); >> spin_lock(&vcpu->kvm->mmu_lock); >> if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter) >> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h >> index dfb2720..19f0077 100644 >> --- a/arch/x86/kvm/paging_tmpl.h >> +++ b/arch/x86/kvm/paging_tmpl.h >> @@ -628,7 +628,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, >> struct kvm_mmu_page *sp, >> pte_gpa = first_pte_gpa + i * sizeof(pt_element_t); >> >> if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa,&gpte, >> - sizeof(pt_element_t))) >> + sizeof(pt_element_t)) || >> + is_rsvd_bits_set(vcpu, gpte, PT_PAGE_TABLE_LEVEL)) >> return -EINVAL; >> > > This is better done a few lines down where we check for > !is_present_gpte(), no? Yeah, it's a better way, that will avoid zap whole shadow page if reserved bits set, will fix it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on 12 Jul 2010 09:20 On 07/12/2010 05:37 AM, Xiao Guangrong wrote: > >>> + if (is_rsvd_bits_set(vcpu, gentry, PT_PAGE_TABLE_LEVEL)) >>> + gentry = 0; >>> + >>> >>> >> That only works if the gpte is for the same mode as the current vcpu mmu >> mode. In some cases it is too strict (vcpu in pae mode writing a 32-bit >> gpte), which is not too bad, in some cases it is too permissive (vcpu in >> nonpae mode writing a pae gpte). >> >> > Avi, thanks for your review. > > Do you mean that the VM has many different mode vcpu? For example, both > nonpae vcpu and pae vcpu are running in one VM? I forgot to consider this > case. > Yes. This happens while the guest brings up other vcpus, and when using nested virtualization. >> (once upon a time mixed modes were rare, only on OS setup, but with >> nested virt they happen all the time). >> > I'm afraid it's still has problem, it will cause access corruption: > 1: if nonpae vcpu write pae gpte, it will miss NX bit > 2: if pae vcpu write nonpae gpte, it will add NX bit that over gpte's width > > How about only update the shadow page which has the same pae set with the written > vcpu? Just like this: > > @@ -3000,6 +3000,10 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, > while (npte--) { > entry = *spte; > mmu_pte_write_zap_pte(vcpu, sp, spte); > + > + if (!!is_pae(vcpu) != sp->role.cr4_pae) > + continue; > + > Not enough, one vcpu can have nx set while the other has it reset, etc. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Xiao Guangrong on 12 Jul 2010 22:10 Avi Kivity wrote: >> >> How about only update the shadow page which has the same pae set with >> the written >> vcpu? Just like this: >> >> @@ -3000,6 +3000,10 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, >> gpa_t gpa, >> while (npte--) { >> entry = *spte; >> mmu_pte_write_zap_pte(vcpu, sp, spte); >> + >> + if (!!is_pae(vcpu) != sp->role.cr4_pae) >> + continue; >> + >> > > Not enough, one vcpu can have nx set while the other has it reset, etc. > Yeah, so we also need check sp->role.nxe here -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Xiao Guangrong on 13 Jul 2010 21:20
Marcelo Tosatti wrote: entry = *spte; >> mmu_pte_write_zap_pte(vcpu, sp, spte); >> + >> + if (!!is_pae(vcpu) != sp->role.cr4_pae || >> + is_nx(vcpu) != sp->role.nxe) >> + continue; >> + > > This breaks remote_flush assignment below. Ah, Oops, will fix -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |