From: Xiao Guangrong on 29 Jun 2010 05:10 Avi Kivity wrote: > On 06/29/2010 10:35 AM, Xiao Guangrong wrote: >> >>> We have now >>> >>> if (is_shadow_present_pte(*sptep)&& !is_large_pte(*sptep)) >>> continue; >>> >>> So we need to add a check, if sp->role.access doesn't match pt_access& >>> pte_access, we need to get a new sp with the correct access (can only >>> change read->write). >>> >>> >> Umm, we should update the spte at the gw->level, so we need get the child >> sp, and compare its access at this point, just like this: >> >> if (level == gw->level&& is_shadow_present_pte(*sptep)) { >> child_sp = page_header(__pa(*sptep& PT64_BASE_ADDR_MASK)); >> >> if (child_sp->access != pt_access& pte_access& (diry ? 1 : >> ~ACC_WRITE_MASK )) { >> /* Zap sptep */ >> ...... >> } >> >> } >> >> So, why not use the new spte flag (SPTE_NO_DIRTY in my patch) to mark >> this spte then we can see >> this spte whether need updated directly? i think it more simpler ;-) >> > > It's new state, and new state means more maintenance of that state and > the need to consider the state in all relevant code paths. > > In terms of maintainability, changing walk_addr() is best, since it > maintains the tight invariant that PT_PAGE_DIRECTORY_LEVEL sptes are > always consistent with their sptes. Updating fetch() to allow for a > relaxed invariant (spte may be read-only while gpte is write-dirty) is > more complicated, but performs better. This is also consistent with > what we do with PT_PAGE_TABLE_LEVEL gptes/sptes and with unsync pages. > Maybe you are right, i just think is more quickly by using SPTE_NO_DIRTY flag to judge whether need updated. I'll modify this patch as your suggestion. > btw, how can the patch work? > >> >> + if (level == gw->level&& !dirty&& >> + access& gw->pte_access& ACC_WRITE_MASK) >> + spte |= SPTE_NO_DIRTY; >> + >> spte = __pa(sp->spt) >> | PT_PRESENT_MASK | PT_ACCESSED_MASK >> | PT_WRITABLE_MASK | PT_USER_MASK; >> > > spte is immediately overwritten by the following assignment. > Ah, sorry, i miss it, spte |= SPTE_NO_DIRTY should behind of following assignment. > However, the other half of the patch can be adapted: > >> >> + if (*sptep& SPTE_NO_DIRTY) { >> + struct kvm_mmu_page *child; >> + >> + WARN_ON(level != gw->level); >> + WARN_ON(!is_shadow_present_pte(*sptep)); >> + if (dirty) { >> + child = page_header(*sptep& >> + PT64_BASE_ADDR_MASK); >> + mmu_page_remove_parent_pte(child, sptep); >> + __set_spte(sptep, shadow_trap_nonpresent_pte); >> + kvm_flush_remote_tlbs(vcpu->kvm); >> + } >> + } >> + >> if (is_shadow_present_pte(*sptep)&& !is_large_pte(*sptep)) >> continue; >> > > Simply replace (*spte & SPTE_NO_DIRTY) with a condition that checks > whether sp->access is consistent with gw->pt(e)_access. > If the guest mapping is writable and it !dirty, we mark SPTE_NO_DIRTY flag in the spte, when the next #PF occurs, we just need check this flag and see whether gpte's D bit is set, if it's true, we zap this spte and map to the correct sp. > Can you write a test case for qemu-kvm.git/kvm/test that demonstrates > the problem and the fix? It will help ensure we don't regress in this > area. > OK, but allow me do it later :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on 29 Jun 2010 05:20 On 06/29/2010 12:04 PM, Xiao Guangrong wrote: > >> Simply replace (*spte& SPTE_NO_DIRTY) with a condition that checks >> whether sp->access is consistent with gw->pt(e)_access. >> >> > If the guest mapping is writable and it !dirty, we mark SPTE_NO_DIRTY flag in > the spte, when the next #PF occurs, we just need check this flag and see whether > gpte's D bit is set, if it's true, we zap this spte and map to the correct sp. > My point is, SPTE_NO_DIRTY is equivalent to an sp->role.access check (the access check is a bit slower, but that shouldn't matter). >> Can you write a test case for qemu-kvm.git/kvm/test that demonstrates >> the problem and the fix? It will help ensure we don't regress in this >> area. >> >> > OK, but allow me do it later :-) > > Sure, but please do it soon. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Xiao Guangrong on 29 Jun 2010 05:20 Avi Kivity wrote: > On 06/29/2010 12:04 PM, Xiao Guangrong wrote: >> >>> Simply replace (*spte& SPTE_NO_DIRTY) with a condition that checks >>> whether sp->access is consistent with gw->pt(e)_access. >>> >>> >> If the guest mapping is writable and it !dirty, we mark SPTE_NO_DIRTY >> flag in >> the spte, when the next #PF occurs, we just need check this flag and >> see whether >> gpte's D bit is set, if it's true, we zap this spte and map to the >> correct sp. >> > > My point is, SPTE_NO_DIRTY is equivalent to an sp->role.access check > (the access check is a bit slower, but that shouldn't matter). > I see. > >>> Can you write a test case for qemu-kvm.git/kvm/test that demonstrates >>> the problem and the fix? It will help ensure we don't regress in this >>> area. >>> >>> >> OK, but allow me do it later :-) >> >> > > Sure, but please do it soon. Sure, i will do it as soon as possible. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Xiao Guangrong on 29 Jun 2010 05:20 Avi Kivity wrote: > On 06/29/2010 10:45 AM, Xiao Guangrong wrote: >> >>> - there was once talk that instead of folding pt_access and pte_access >>> together into the leaf sp->role.access, each sp level would have its own >>> access permissions. In this case we don't even have to get a new direct >>> sp, only change the PT_DIRECTORY_LEVEL spte to add write permissions >>> (all direct sp's would be writeable and permissions would be controlled >>> at their parent_pte level). Of course that's a much bigger change than >>> this bug fix. >>> >>> >> Yeah, i have considered this way, but it will change the shadow page's >> mapping >> way: it control the access at the upper level, but in the current >> code, we allow >> the upper level have the ALL_ACCESS and control the access right at >> the last level. >> It will break many things, such as write-protected... >> > > spte's access bits have dual purpose, both to map guest protection and > for host protection (like for shadowed pages, or ksm pages). So the > last level sptes still need to consider host write protection. > Yeah, i see your mean, thanks, :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Marcelo Tosatti on 30 Jun 2010 16:50 On Wed, Jun 30, 2010 at 04:03:28PM +0800, Xiao Guangrong wrote: > If the mapping is writable but the dirty flag is not set, we will find > the read-only direct sp and setup the mapping, then if the write #PF > occur, we will mark this mapping writable in the read-only direct sp, > now, other real read-only mapping will happily write it without #PF. > > It may hurt guest's COW > > Fixed by re-install the mapping when write #PF occur. Applied 1, 2 and 4, thanks. > Signed-off-by: Xiao Guangrong <xiaoguangrong(a)cn.fujitsu.com> > --- > arch/x86/kvm/paging_tmpl.h | 28 ++++++++++++++++++++++++++-- > 1 files changed, 26 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > index 28c8493..f28f09d 100644 > --- a/arch/x86/kvm/paging_tmpl.h > +++ b/arch/x86/kvm/paging_tmpl.h > @@ -325,8 +325,32 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, > break; > } > > - if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) > - continue; > + if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) { > + struct kvm_mmu_page *child; > + unsigned direct_access; > + > + if (level != gw->level) > + continue; This will skip the check for the sp at level 1 when emulating 1GB pages with 4k host pages (where there are direct sp's at level 2 and 1). Should be > instead of !=. > + > + /* > + * For the direct sp, if the guest pte's dirty bit > + * changed form clean to dirty, it will corrupt the > + * sp's access: allow writable in the read-only sp, > + * so we should update the spte at this point to get > + * a new sp with the correct access. > + */ > + direct_access = gw->pt_access & gw->pte_access; > + if (!is_dirty_gpte(gw->ptes[gw->level - 1])) > + direct_access &= ~ACC_WRITE_MASK; > + > + child = page_header(*sptep & PT64_BASE_ADDR_MASK); > + if (child->role.access == direct_access) > + continue; > + > + mmu_page_remove_parent_pte(child, sptep); > + __set_spte(sptep, shadow_trap_nonpresent_pte); > + kvm_flush_remote_tlbs(vcpu->kvm); > + } > > if (is_large_pte(*sptep)) { > rmap_remove(vcpu->kvm, sptep); > -- > 1.6.1.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: KVM: MMU: fix forgot to flush all vcpu's tlb Next: [GIT pull] core fixes for 2.6.35 |