From: Peter Zijlstra on 1 Apr 2010 12:20 On Thu, 2010-04-01 at 17:42 +0200, Andrea Arcangeli wrote: > The ugliest part of it (that I think you missed below) is the breakage > of the RCU locking in the anon-vma which requires adding refcounting > to it. That was the worst part of the conversion as far as I can tell. > One thing we can do there is to mutex_trylock() if we get the lock, see if we've got the right object, if the trylock fails we can do the refcount thing and sleep. That would allow the fast-path to remain a single atomic. The only thing is growing that anon_vma struct, but KSM seems to already do that for the most common .configs. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on 1 Apr 2010 12:30 On Thu, Apr 01, 2010 at 06:12:34PM +0200, Peter Zijlstra wrote: > One thing we can do there is to mutex_trylock() if we get the lock, see > if we've got the right object, if the trylock fails we can do the > refcount thing and sleep. That would allow the fast-path to remain a > single atomic. But then how do you know which anon_vma_unlink has to decrease the refcount and which not? That info should need to be stored in the kernel stack, can't be stored in the vma. I guess it's feasible but passing that info around sounds more tricky than the trylock itself (adding params to those functions with int &refcount). > The only thing is growing that anon_vma struct, but KSM seems to already > do that for the most common .configs. Ok, from my initial review of memory compaction I see that it already adds its own recount and unifies it too with the ksm_refcount. So it's worth stop calling it ksm_refcount and to remove the #ifdef (which memory compaction didn't remove but added an defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)). If you do this change we can have your change applied first, and then Mel can adapt memory compaction to it sharing the same unconditional compiled-in refcount. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on 1 Apr 2010 12:40 On Thu, Apr 01, 2010 at 05:50:02PM +0200, Peter Zijlstra wrote: > You also seem to move the tlb_gather stuff around, we have patches in Well that patchset is years old, there are much more recent patches to move tlb_gather around so that it will happen after the mmu_notifier invalidate calls. That is only relevant to allow mmu notifier handlers to schedule. > -rt that make tlb_gather preemptible, once i_mmap_lock is preemptible we > can do in mainline too. Ok. However just moving it inside the mmu notifier range calls won't slowdown anything or reduce its effectiveness, either ways will be fine with XPMEM. This is only XPMEM material and tlb_gather is the trivial part of it. The hard part is to make those locks schedule capable, and I'm sure XPMEM developers will greatly appreciate such a change. I thought initially it had to be conditional to XPMEM=y but maintaining two different locks is a bit of a pain (especially for distros that wants to ship a single kernel) but as this effort started, it'd provide some minor latency improvement in that 1msec when a new virtual machine starts or when a new taks registers itself to GRU or XPMEM device drivers. I'm personally fine to make these either mutexes or rwsem unconditionally as you prefer. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on 1 Apr 2010 12:40 On Thu, Apr 01, 2010 at 01:43:14PM +0200, Peter Zijlstra wrote: > On Thu, 2010-04-01 at 13:27 +0200, Peter Zijlstra wrote: > > > > I've almost got a patch done that converts those two, still need to look > > where that tasklist_lock muck happens. > > OK, so the below builds and boots, only need to track down that > tasklist_lock nesting, but I got to run an errand first. You should have a look at my old patchset where Christoph already implemented this (and not for decreasing latency but to allow scheduling in mmu notifier handlers, only needed by XPMEM): http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.26-rc7/mmu-notifier-v18/ The ugliest part of it (that I think you missed below) is the breakage of the RCU locking in the anon-vma which requires adding refcounting to it. That was the worst part of the conversion as far as I can tell. http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.26-rc7/mmu-notifier-v18/anon-vma I personally prefer read-write locks that Christoph used for both of them, but I'm not against mutex either. Still the refcounting problem should be the same as it's introduced by allowing the critical sections under anon_vma->lock to schedule (no matter if it's mutex or read-write sem). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 1 Apr 2010 12:40 On Thu, 2010-04-01 at 18:07 +0200, Andrea Arcangeli wrote: > On Thu, Apr 01, 2010 at 05:56:02PM +0200, Peter Zijlstra wrote: > > Another thing is mm->nr_ptes, that doens't appear to be properly > > serialized, __pte_alloc() does ++ under mm->page_table_lock, but > > free_pte_range() does -- which afaict isn't always with page_table_lock > > held, it does however always seem to have mmap_sem for writing. > > Not saying this is necessarily safe, but how can be that relevant with > spinlock->mutex/rwsem conversion? Not directly, but I keep running into that BUG_ON() at the end up exit_mmap() with my conversion patch, and I though that maybe I widened the race window. But I guess I simply messed something up. > Only thing that breaks with that > conversion would be RCU (the very anon_vma rcu breaks because it > rcu_read_lock disabling preempt and then takes the anon_vma->lock, > that falls apart because taking the anon_vma->lock will imply a > schedule), but nr_ptes is a write operation so it can't be protected > by RCU. > > > However __pte_alloc() callers do not in fact hold mmap_sem for writing. > > As long as the mmap_sem readers always also take the page_table_lock > we're safe. Ah, I see so its: down_read(mmap_sem) + page_table_lock that's exclusive against down_write(mmap_sem), nifty, should be a comment somewhere. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: [PATCH]slub: fix bad scope checking Next: [PATCH 0/2 v4] scsi: ftrace based scsi tracer |