From: Peter Zijlstra on 1 Apr 2010 12:00 On Thu, 2010-04-01 at 17:50 +0200, Peter Zijlstra wrote: > On Thu, 2010-04-01 at 17:42 +0200, Andrea Arcangeli wrote: > > On Thu, Apr 01, 2010 at 01:43:14PM +0200, Peter Zijlstra wrote: > > > On Thu, 2010-04-01 at 13:27 +0200, Peter Zijlstra wrote: > > > > > > > > I've almost got a patch done that converts those two, still need to look > > > > where that tasklist_lock muck happens. > > > > > > OK, so the below builds and boots, only need to track down that > > > tasklist_lock nesting, but I got to run an errand first. > > > > You should have a look at my old patchset where Christoph already > > implemented this (and not for decreasing latency but to allow > > scheduling in mmu notifier handlers, only needed by XPMEM): > > > > http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.26-rc7/mmu-notifier-v18/ > > > > The ugliest part of it (that I think you missed below) is the breakage > > of the RCU locking in the anon-vma which requires adding refcounting > > to it. That was the worst part of the conversion as far as I can tell. > > > > http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.26-rc7/mmu-notifier-v18/anon-vma > > > > I personally prefer read-write locks that Christoph used for both of > > them, but I'm not against mutex either. Still the refcounting problem > > should be the same as it's introduced by allowing the critical > > sections under anon_vma->lock to schedule (no matter if it's mutex or > > read-write sem). > > Right, so the problem with the rwsem is that, esp for very short hold > times, they introduce more pain than they're worth. Also the rwsem > doesn't do adaptive spinning nor allows for lock stealing, resulting in > a much much heavier sync. object than the mutex is. > > You also seem to move the tlb_gather stuff around, we have patches in > -rt that make tlb_gather preemptible, once i_mmap_lock is preemptible we > can do in mainline too. Another thing is mm->nr_ptes, that doens't appear to be properly serialized, __pte_alloc() does ++ under mm->page_table_lock, but free_pte_range() does -- which afaict isn't always with page_table_lock held, it does however always seem to have mmap_sem for writing. However __pte_alloc() callers do not in fact hold mmap_sem for writing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 1 Apr 2010 12:00 On Thu, 2010-04-01 at 18:51 +0300, Avi Kivity wrote: > On 04/01/2010 06:42 PM, Andrea Arcangeli wrote: > > On Thu, Apr 01, 2010 at 01:43:14PM +0200, Peter Zijlstra wrote: > > > >> On Thu, 2010-04-01 at 13:27 +0200, Peter Zijlstra wrote: > >> > >>> I've almost got a patch done that converts those two, still need to look > >>> where that tasklist_lock muck happens. > >>> > >> OK, so the below builds and boots, only need to track down that > >> tasklist_lock nesting, but I got to run an errand first. > >> > > You should have a look at my old patchset where Christoph already > > implemented this (and not for decreasing latency but to allow > > scheduling in mmu notifier handlers, only needed by XPMEM): > > > > http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.26-rc7/mmu-notifier-v18/ > > > > The ugliest part of it (that I think you missed below) is the breakage > > of the RCU locking in the anon-vma which requires adding refcounting > > to it. That was the worst part of the conversion as far as I can tell. > > > > http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.26-rc7/mmu-notifier-v18/anon-vma > > > > Can we use srcu now instead? I would much rather we make call_rcu_preempt() available at all times. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on 1 Apr 2010 12:10 On Thu, Apr 01, 2010 at 05:56:02PM +0200, Peter Zijlstra wrote: > Another thing is mm->nr_ptes, that doens't appear to be properly > serialized, __pte_alloc() does ++ under mm->page_table_lock, but > free_pte_range() does -- which afaict isn't always with page_table_lock > held, it does however always seem to have mmap_sem for writing. Not saying this is necessarily safe, but how can be that relevant with spinlock->mutex/rwsem conversion? Only thing that breaks with that conversion would be RCU (the very anon_vma rcu breaks because it rcu_read_lock disabling preempt and then takes the anon_vma->lock, that falls apart because taking the anon_vma->lock will imply a schedule), but nr_ptes is a write operation so it can't be protected by RCU. > However __pte_alloc() callers do not in fact hold mmap_sem for writing. As long as the mmap_sem readers always also take the page_table_lock we're safe. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on 1 Apr 2010 12:10 On Thu, Apr 01, 2010 at 06:51:17PM +0300, Avi Kivity wrote: > Can we use srcu now instead? We can always switch to srcu. Switching to srcu is not a noop for all mmu notifier invalidates only after these locks can schedule. At that point, so with srcu + mutex in the rmap locks, all the mmu notifier invalidates can schedule, allowing XPMEM to be synchronous its invalidates, and making it safe. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Paul E. McKenney on 1 Apr 2010 12:20 On Thu, Apr 01, 2010 at 05:56:46PM +0200, Peter Zijlstra wrote: > On Thu, 2010-04-01 at 18:51 +0300, Avi Kivity wrote: > > On 04/01/2010 06:42 PM, Andrea Arcangeli wrote: > > > On Thu, Apr 01, 2010 at 01:43:14PM +0200, Peter Zijlstra wrote: > > > > > >> On Thu, 2010-04-01 at 13:27 +0200, Peter Zijlstra wrote: > > >> > > >>> I've almost got a patch done that converts those two, still need to look > > >>> where that tasklist_lock muck happens. > > >>> > > >> OK, so the below builds and boots, only need to track down that > > >> tasklist_lock nesting, but I got to run an errand first. > > >> > > > You should have a look at my old patchset where Christoph already > > > implemented this (and not for decreasing latency but to allow > > > scheduling in mmu notifier handlers, only needed by XPMEM): > > > > > > http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.26-rc7/mmu-notifier-v18/ > > > > > > The ugliest part of it (that I think you missed below) is the breakage > > > of the RCU locking in the anon-vma which requires adding refcounting > > > to it. That was the worst part of the conversion as far as I can tell. > > > > > > http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.26-rc7/mmu-notifier-v18/anon-vma > > > > > > > Can we use srcu now instead? > > I would much rather we make call_rcu_preempt() available at all times. Even in !CONFIG_PREEMPT kernels? Or am I missing your point? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: [PATCH]slub: fix bad scope checking Next: [PATCH 0/2 v4] scsi: ftrace based scsi tracer |