Prev: BUG: unable to handle kernel paging request at 40000000 __alloc_memory_core_early+0x147/0x1d6
Next: Yet another 2.6.35 regression (AGP)?
From: Jeremy Fitzhardinge on 9 Jul 2010 11:10 I just noticed that the original mmu notifier change (cddb8a5c14a) adds calls to mmu_notifier_invalidate_range_start/end to apply_to_page_range(). This doesn't seem correct to me, since apply_to_page_range can perform arbitrary operations to the range of pages, not just invalidation of the pages. It seems to me that the appropriate mmu notifiers should be called either around the call to apply_to_page_range(), or from within the callback function. Andrea, what's the rationale for mmu_notifier_invalidate_range_start/end here? Thanks, J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on 9 Jul 2010 11:20 On Fri, Jul 09, 2010 at 08:06:20AM -0700, Jeremy Fitzhardinge wrote: > I just noticed that the original mmu notifier change (cddb8a5c14a) adds > calls to mmu_notifier_invalidate_range_start/end to > apply_to_page_range(). This doesn't seem correct to me, since > apply_to_page_range can perform arbitrary operations to the range of > pages, not just invalidation of the pages. It seems to me that the > appropriate mmu notifiers should be called either around the call to > apply_to_page_range(), or from within the callback function. > > Andrea, what's the rationale for mmu_notifier_invalidate_range_start/end > here? As long as the secondary mappings are teardown in range_start and allowed to be established again only after range_end, all modifications will be picked up by the secondary mmu. Imagine secondary mmu like a tlb, that you only invalidate, then it'll be refilled later (after range_end). The exception is set_pte_at_notify that is called by ksm to establish a readonly secondary pte in KVM, KVM only calls get_user_pages(write=1) (never write=0 even for reads) so until that is optimized set_pte_at_notify allows guest to access readonly data without breaking the cow. set_pte_at_notify invokes a change_pte method, if not implemented it'll just fallback to the invalidate_page method that is backwards compatible, so no mmu notifier user is required to call change_pte (especially if the secondary page fault - kind of secondary tlb-miss software handler invokes get_user_pages with write=0 for reads, ->change_pte can only eliminate one minor fault so no big deal). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jeremy Fitzhardinge on 9 Jul 2010 12:00 On 07/09/2010 08:12 AM, Andrea Arcangeli wrote: > On Fri, Jul 09, 2010 at 08:06:20AM -0700, Jeremy Fitzhardinge wrote: > >> I just noticed that the original mmu notifier change (cddb8a5c14a) adds >> calls to mmu_notifier_invalidate_range_start/end to >> apply_to_page_range(). This doesn't seem correct to me, since >> apply_to_page_range can perform arbitrary operations to the range of >> pages, not just invalidation of the pages. It seems to me that the >> appropriate mmu notifiers should be called either around the call to >> apply_to_page_range(), or from within the callback function. >> >> Andrea, what's the rationale for mmu_notifier_invalidate_range_start/end >> here? >> > As long as the secondary mappings are teardown in range_start and > allowed to be established again only after range_end, all > modifications will be picked up by the secondary mmu. Imagine > secondary mmu like a tlb, that you only invalidate, then it'll be > refilled later (after range_end). > Yes, but apply_to_page_range isn't necessarily making changes which requires that teardown/refill. The most common user is vmalloc, which is just using a side-effect of apply_to_page_range to make sure that all the intermediate levels of the pagetable are allocated over the vmalloced range. I think all the other users of it are within Xen code. In particular, we're using it in the gntdev driver, which also uses mmu notifiers to keep grant mappings in sync with process mappings, so the recursive mmu notifier call is causing problems. It seems to me that apply_to_page_range should be agnostic to its use, and its up to its callers to do any appropriate mmu notifier work. Would you be happy with a patch to remove those calls? Thanks, J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on 9 Jul 2010 12:30 On Fri, Jul 09, 2010 at 08:51:39AM -0700, Jeremy Fitzhardinge wrote: > Yes, but apply_to_page_range isn't necessarily making changes which > requires that teardown/refill. The most common user is vmalloc, which > is just using a side-effect of apply_to_page_range to make sure that all > the intermediate levels of the pagetable are allocated over the > vmalloced range. I think all the other users of it are within Xen code. > In particular, we're using it in the gntdev driver, which also uses mmu > notifiers to keep grant mappings in sync with process mappings, so the > recursive mmu notifier call is causing problems. > > It seems to me that apply_to_page_range should be agnostic to its use, > and its up to its callers to do any appropriate mmu notifier work. > Would you be happy with a patch to remove those calls? mmu notifier only relevant for userland mappings, not kernel mappings. I don't know about the xen use, but for vmalloc certainly it can't be a problem to remove those two mmu notifier invalidates. Only bit that is worrysome is the mm == &init_mm pte_alloc_kernel|pte_alloc_map_lock. That seems to imply it may also be used to mangle over userland. But apparently all users are passing &init_mm as expected. I guess if you remove the mm parameter and you default to &init_mm definitely there will be no risk in removing the mmu notifier range_start/end invalidates. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on 9 Jul 2010 13:40
On Fri, Jul 09, 2010 at 10:30:48AM -0700, Jeremy Fitzhardinge wrote: > On 07/09/2010 09:22 AM, Andrea Arcangeli wrote: > > mmu notifier only relevant for userland mappings, not kernel > > mappings. I don't know about the xen use, but for vmalloc certainly it > > can't be a problem to remove those two mmu notifier invalidates. > > > > Only bit that is worrysome is the mm == &init_mm > > pte_alloc_kernel|pte_alloc_map_lock. That seems to imply it may also > > be used to mangle over userland. But apparently all users are passing > > &init_mm as expected. I guess if you remove the mm parameter and you > > default to &init_mm definitely there will be no risk in removing the > > mmu notifier range_start/end invalidates. > > > > No, we do have some users which use it on user memory. But those users > are using it as part of their own mmu notifier backend, so the recursive > calls are causing a problem. My point is that anyone using > apply_to_page_range should be making their own calls to mmu notifiers as > appropriate to whatever they're doing. Makes sense, it was hard in fact to see how it would cause any problem for you considering nobody registers mmu notifiers into mm_init... I've to say it's next to trivial for them to detect recursion and skip the inner superflous call if they run it under a lock. But be careful that pte_alloc/pmd_alloc and friends can block and break the mmu notifier locking. I'm not really sure if it apply_to_page_range is a sane interface to use inside mmu notifier methods considering it's supposedly a blocking operation, caller must be careful to use that inside a mmu notifier callback anyway... I'm not opposed to removing it, I've been wondering if it made any sense in the first place but then there was no point not to add it. Just calling apply_to_page_range in non blocking context doesn't look so good. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |