Prev: [PATCH] block: remove 16 bytes of padding from struct request on 64bits
Next: ivtv: sizeof() => ARRAY_SIZE()
From: Nick Piggin on 18 Mar 2010 07:00 Hey, I've been looking at weird and wonderful ways to do scalable refcounting, for the vfs... Sadly, module refcounting doesn't fit my bill. But as far as I could see, it is racy. Can someone confirm that we do or do not have a race (and if so, whether this patch would solve it?) Race described in the comment below. Thanks, Nick Index: linux-2.6/include/linux/module.h =================================================================== --- linux-2.6.orig/include/linux/module.h +++ linux-2.6/include/linux/module.h @@ -365,7 +365,8 @@ struct module void (*exit)(void); struct module_ref { - int count; + unsigned int incs; + unsigned int decs; } __percpu *refptr; #endif @@ -459,9 +460,9 @@ static inline void __module_get(struct m { if (module) { preempt_disable(); - __this_cpu_inc(module->refptr->count); + __this_cpu_inc(module->refptr->incs); trace_module_get(module, _THIS_IP_, - __this_cpu_read(module->refptr->count)); + __this_cpu_read(module->refptr->incs)); preempt_enable(); } } @@ -474,11 +475,10 @@ static inline int try_module_get(struct preempt_disable(); if (likely(module_is_live(module))) { - __this_cpu_inc(module->refptr->count); + __this_cpu_inc(module->refptr->incs); trace_module_get(module, _THIS_IP_, - __this_cpu_read(module->refptr->count)); - } - else + __this_cpu_read(module->refptr->incs)); + } else ret = 0; preempt_enable(); Index: linux-2.6/kernel/module.c =================================================================== --- linux-2.6.orig/kernel/module.c +++ linux-2.6/kernel/module.c @@ -473,11 +473,13 @@ static void module_unload_init(struct mo int cpu; INIT_LIST_HEAD(&mod->modules_which_use_me); - for_each_possible_cpu(cpu) - per_cpu_ptr(mod->refptr, cpu)->count = 0; + for_each_possible_cpu(cpu) { + per_cpu_ptr(mod->refptr, cpu)->incs = 0; + per_cpu_ptr(mod->refptr, cpu)->decs = 0; + } /* Hold reference count during initialization. */ - __this_cpu_write(mod->refptr->count, 1); + __this_cpu_write(mod->refptr->incs, 1); /* Backwards compatibility macros put refcount during init. */ mod->waiter = current; } @@ -616,12 +618,28 @@ static int try_stop_module(struct module unsigned int module_refcount(struct module *mod) { - unsigned int total = 0; + unsigned int incs = 0, decs = 0; int cpu; for_each_possible_cpu(cpu) - total += per_cpu_ptr(mod->refptr, cpu)->count; - return total; + decs += per_cpu_ptr(mod->refptr, cpu)->decs; + /* + * ensure the incs are added up after the decs. + * module_put ensures incs are visible before decs with smp_wmb. + * + * This 2-count scheme avoids the situation where the refcount + * for CPU0 is read, then CPU0 increments the module refcount, + * then CPU1 drops that refcount, then the refcount for CPU1 is + * read. We would record a decrement but not its corresponding + * increment so we would see a low count (disaster). + * + * Rare situation? But module_refcount can be preempted, and we + * might be tallying up 4096+ CPUs. So it is not impossible. + */ + smp_rmb(); + for_each_possible_cpu(cpu) + incs += per_cpu_ptr(mod->refptr, cpu)->incs; + return incs - decs; } EXPORT_SYMBOL(module_refcount); @@ -798,10 +816,11 @@ void module_put(struct module *module) { if (module) { preempt_disable(); - __this_cpu_dec(module->refptr->count); + smp_wmb(); /* see comment in module_refcount */ + __this_cpu_inc(module->refptr->decs); trace_module_put(module, _RET_IP_, - __this_cpu_read(module->refptr->count)); + __this_cpu_read(module->refptr->decs)); /* Maybe they're waiting for us to drop reference? */ if (unlikely(!module_is_live(module))) wake_up_process(module->waiter); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Rusty Russell on 29 Mar 2010 05:20 On Thu, 18 Mar 2010 09:25:34 pm Nick Piggin wrote: > Hey, > > I've been looking at weird and wonderful ways to do scalable refcounting, > for the vfs... > > Sadly, module refcounting doesn't fit my bill. But as far as I could see, > it is racy. Other than for advisory purposes, the refcount is only checked against zero under stop_machine. For exactly this reason. Hope that helps, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 29 Mar 2010 13:00 On Mon, Mar 29, 2010 at 8:12 PM, Rusty Russell <rusty(a)rustcorp.com.au> wrote: > On Thu, 18 Mar 2010 09:25:34 pm Nick Piggin wrote: >> Hey, >> >> I've been looking at weird and wonderful ways to do scalable refcounting, >> for the vfs... >> >> Sadly, module refcounting doesn't fit my bill. But as far as I could see, >> it is racy. > > Other than for advisory purposes, the refcount is only checked against zero > under stop_machine. �For exactly this reason. There definitely looks to me like there is code that checks the refcount *without* stop_machine. module_refcount is an exported function, and you expect drivers to get this right (scsi_device_put for a trivial example), but it even looks like it is used in a racy way in kernel/module.c code. Either we need to take my patch, or audit t, and put a WARN_ON if it is called while not under stop_machine. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 1 Apr 2010 12:10 On Thu, 1 Apr 2010, Nick Piggin wrote: > > I think it can be done racelessly with my patch, which is not really too > much overhead. I think if this is considered too much, then we should > either fix code and preferably de-export and remove module_refcount from > drivers, or remove module removal completely. I doubt your patch matters too much, but I like it conceptually and it seems to be a nice basis for perhaps doing something clever in the long run. [ ie avoiding the stop_machine and instead perhaps doing some optimistic thing like "see if we seem to be unused right now, then unregister us, and see - after unregistering - that the usage counts haven't increased, and re-register if they have. ] So I'd like to apply it as a "good improvement, even if module unloading which is the only thing that _should_ care deeply should already be under stop-machine". But I'd like an ack or two first. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Rusty Russell on 5 Apr 2010 22:40 On Fri, 2 Apr 2010 02:25:59 am Linus Torvalds wrote: > > On Thu, 1 Apr 2010, Nick Piggin wrote: > > > > I think it can be done racelessly with my patch, which is not really too > > much overhead. I think if this is considered too much, then we should > > either fix code and preferably de-export and remove module_refcount from > > drivers, or remove module removal completely. > > I doubt your patch matters too much, but I like it conceptually and it > seems to be a nice basis for perhaps doing something clever in the long > run. > > [ ie avoiding the stop_machine and instead perhaps doing some optimistic > thing like "see if we seem to be unused right now, then unregister us, > and see - after unregistering - that the usage counts haven't increased, > and re-register if they have. ] I dislike that we can see spurious failure for some random try_module_get caller. But perhaps that's inherent in module removal: someone can miss out, and if you care, don't try to remove modules. And grepping for try_module_get() reveals a suspicious (growing) number of try_module_get(THIS_MODULE) which is almost always wrong. If we're not perfect, maybe we should aim for simple? > So I'd like to apply it as a "good improvement, even if module unloading > which is the only thing that _should_ care deeply should already be under > stop-machine". > > But I'd like an ack or two first. Yep. Acked-by: Rusty Russell <rusty(a)rustcorp.com.au> Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 Prev: [PATCH] block: remove 16 bytes of padding from struct request on 64bits Next: ivtv: sizeof() => ARRAY_SIZE() |