Prev: [regression] Please add bug 15788 and 15969, usb sound / radeon kms broken after snapshot cycle
Next: [PATCH] staging: comedi - correct parameter gainlkup for DAQCard-6024E in driver ni_mio_cs.c
From: Linus Torvalds on 31 May 2010 16:30 On Mon, 31 May 2010, Linus Torvalds wrote: > > It's entirely possible that an interim fix (if we can't just fix the > locking) is to _not_ use "strong_try_module_get()" at all, but instead > just use "try_module_get()", and then after we've dropped the > module_mutex, but _before_ we call the "init" function for the module, we > wait for all the modules that this module depends on. > > IOW, we'd link to other modules _before_ they are necessarily initialized > (their symbol tables will be as initialized as they are going to be), but > then before we call our own initialization routines we make sure that the > modules we linked to have finished theirs. > > Doesn't that sound like the logical thing to do? And it wouldn't change > any locking. Ok, this is a two-patch series to do exactly that. It's totally untested, although I _have_ booted the result, and tested that module loading and unloading works. But I haven't tested the race condition, obviously. It looks sane, and quite frankly, I think it's a much better design than what we have now. I also note that the old code was broken for the case of not having CONFIG_MODULE_UNLOAD - I didn't fix it, but it should be easy to do (basically, we should do the whole module dependency list regardless of whether we can unload modules or not - we _do_ need it in order to get that whole case of waiting for them to load right). The first patch doesn't really change anything, it just cleans things up and introduces the two-way module usage list that the second patch needs in order to wait for modules that we use to initialize. Comments? Brandon - does this work for you? I may well have missed something. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 31 May 2010 23:50 On Tue, 1 Jun 2010, Rusty Russell wrote: > On Tue, 1 Jun 2010 05:45:37 am Linus Torvalds wrote: > > > > basically, we should do the whole module dependency list regardless > > of whether we can unload modules or not > > Why? Because the current non-CONFIG_MODULE_UNLOAD code is currently broken, and it was broken exactly because the code had two totally different paths and totally different logic. And one part simply missed the case. We'd be much better off having as much of the logic shared as possible. No? Your 2/2 actually fixed that, because it moved the broken wait_event_interruptible_timeout() out of the (non-shared) use_module() into the (shared) resolve_symbol_wait(). But even that seemed to be almost accidental, and seemed to be more about the fact that now the locking rules required it (if you wanted to wait without holding the lock), rather than anything else. In contrast, wouldn't it be nice if we just made the waiting be a separate event entirely? And have the !MODULE_UNLOAD case also able to share the /proc/module format, for example? You do realize - although maybe you don't - that right now, even with your 2/2 case, the !MODULE_UNLOAD case seems to get the module ref-counts wrong. I didn't look through it all, but it _seems_ to increment the ref-count for every symbol it resolves. What happens if the module load fails in the middle (say, because some symbols referred to module 'a' that was loaded successfully, and others refer to module 'b' that had some probelsm)? I think the !MODULE_UNLOAD case is _broken_ again. Why? Take a small guess. It's broken - even with your 2/2 - exactly because the !MODULE_UNLOAD case does something fundamentally different from the regular case. It increments the module counts for each symbol it looks up, exactly BECAUSE IT DOESN'T TRACK MODULE DEPENDENCIES! Which means that it then cannot clean up properly afterwards if an error happens in the middle. So I'd suggest we should just track those module dependencies, and share more of the code and the logic. Because it looks to me like not sharing it continually results in bugs. No? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 1 Jun 2010 00:40 On Mon, 31 May 2010, Linus Torvalds wrote: > > Which means that it then cannot clean up properly afterwards if an error > happens in the middle. Oh, I guess it doesn't matter. For the non-unload case, we don't even do any module counts, so the fixup ends up being a no-op. So with the error handling fixed (which your 2/2 did), I guess it's actually ok. I still think it would be nice to have /proc/modules show the module dependencies regardless of whether they can be unloaded or not, but that's really just an oddity, maybe not a "bug" per se. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 1 Jun 2010 14:00 On Tue, 1 Jun 2010, Linus Torvalds wrote: > > So I do agree that your later two-patch series is the way to go. Oh, btw, did we ever get that version tested by the only person who seems to see the problem? Brandon? I'd like to have a Tested-By: line too. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 1 Jun 2010 20:00
On Tue, 1 Jun 2010, Brandon Philips wrote: > > When I tested a Kernel with Rusty's modules branch pulled onto > 2.6.35-rc1 I got duplicate sysfs path errors: Hmm. Yeah, the module_mutex used to be held across the whole "find -> add" state, but isn't any more. > To fix this we need to make sure that we only have one copy of a module > going through load_module at a time. Patch suggestion follows which > boots without errors. I am sure there is a better way to do what it does > ;) I think Rusty may have made the lock a bit _too_ finegrained there, and didn't add it to some places that needed it. It looks, for example, like PATCH 1/2 actually drops the lock in places where it's needed ("find_module()" is documented to need it, but now load_module() didn't hold it at all when it did the find_module()). Rather than adding a new "module_loading" list, I think we should be able to just use the existing "modules" list, and just fix up the locking a bit. In fact, maybe we could just move the "look up existing module" a bit later - optimistically assuming that the module doesn't exist, and then just undoing the work if it turns out that we were wrong, just before adding ourselves to the list. A patch something like the appended (TOTALLY UNTESTED!) Linus --- kernel/module.c | 13 ++++++++----- 1 files changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/module.c b/kernel/module.c index a1f46a5..21f7ffa 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -2198,11 +2198,6 @@ static noinline struct module *load_module(void __user *umod, goto free_mod; } - if (find_module(mod->name)) { - err = -EEXIST; - goto free_mod; - } - mod->state = MODULE_STATE_COMING; /* Allow arches to frob section contents and sizes. */ @@ -2486,6 +2481,13 @@ static noinline struct module *load_module(void __user *umod, * The mutex protects against concurrent writers. */ mutex_lock(&module_mutex); + + if (find_module(mod->name)) { + err = -EEXIST; + /* This will also unlock the mutex */ + goto already_exists; + } + list_add_rcu(&mod->list, &modules); mutex_unlock(&module_mutex); @@ -2511,6 +2513,7 @@ static noinline struct module *load_module(void __user *umod, mutex_lock(&module_mutex); /* Unlink carefully: kallsyms could be walking list. */ list_del_rcu(&mod->list); + already_exists: mutex_unlock(&module_mutex); synchronize_sched(); module_arch_cleanup(mod); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |