Prev: x86: enlightenment for ticket spin locks - eliminate NOPs introduced by first patch
Next: drm fixes + agp + one fb patch (bisected)
From: Frank Mayhar on 1 Jul 2010 12:30 On Thu, 2010-07-01 at 17:54 +1000, Nick Piggin wrote: > On Thu, Jul 01, 2010 at 12:36:18PM +1000, Dave Chinner wrote: > > Seriously: use a new lock for high level inode operations you are > > optimising - don't repurpose an existing lock with different usage > > rules just because it's convenient. > > That's what scalability development is all about, I'm afraid. Just > adding more and more locks is what makes things more complex, so > you have to juggle around or change locks when possible. If there is a > difficulty with locking pops up in future, I'd prefer to look at it > then. I must agree strongly with Nick here. Lock proliferation is a Bad Thing; adding a new lock here just adds complexity without really improving anything, since you still have to deal with lock ordering _and_ add a new one to the mix. It also makes struct inode bigger for no real gain. Since i_lock is already well understood and its use is pretty isolated, it seems ideal to extend it to more general use while keeping the isolation intact as much as possible. Hell, if the code were designed with this kind of scalability in mind, i_lock would be doing all the locking directly related to struct inode. Much as it is after Nick's work. My argument here is that it's not just convenient, it makes the implementation simpler since instead of dealing with two locks there's just one. -- Frank Mayhar <fmayhar(a)google.com> Google, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on 2 Jul 2010 22:10 On Wed, 30 Jun 2010 22:05:02 +1000 Nick Piggin <npiggin(a)suse.de> wrote: > On Wed, Jun 30, 2010 at 05:27:02PM +1000, Dave Chinner wrote: > > On Thu, Jun 24, 2010 at 01:02:41PM +1000, npiggin(a)suse.de wrote: > > > Protect inode->i_count with i_lock, rather than having it atomic. > > > Next step should also be to move things together (eg. the refcount increment > > > into d_instantiate, which will remove a lock/unlock cycle on i_lock). > > ..... > > > Index: linux-2.6/fs/inode.c > > > =================================================================== > > > --- linux-2.6.orig/fs/inode.c > > > +++ linux-2.6/fs/inode.c > > > @@ -33,14 +33,13 @@ > > > * inode_hash_lock protects: > > > * inode hash table, i_hash > > > * inode->i_lock protects: > > > - * i_state > > > + * i_state, i_count > > > * > > > * Ordering: > > > * inode_lock > > > * sb_inode_list_lock > > > * inode->i_lock > > > - * inode_lock > > > - * inode_hash_lock > > > + * inode_hash_lock > > > */ > > > > I thought that the rule governing the use of inode->i_lock was that > > it can be used anywhere as long as it is the innermost lock. > > > > Hmmm, no references in the code or documentation. Google gives a > > pretty good reference: > > > > http://www.mail-archive.com/linux-ext4(a)vger.kernel.org/msg02584.html > > > > Perhaps a different/new lock needs to be used here? > > Well I just changed the order (and documented it to boot :)). It's > pretty easy to verify that LOR is no problem. inode hash is only > taken in a very few places so other code outside inode.c is fine to > use i_lock as an innermost lock. um, nesting a kernel-wide singleton lock inside a per-inode lock is plain nutty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 2 Jul 2010 23:50 On Fri, Jul 02, 2010 at 07:03:55PM -0700, Andrew Morton wrote: > On Wed, 30 Jun 2010 22:05:02 +1000 Nick Piggin <npiggin(a)suse.de> wrote: > > > On Wed, Jun 30, 2010 at 05:27:02PM +1000, Dave Chinner wrote: > > > On Thu, Jun 24, 2010 at 01:02:41PM +1000, npiggin(a)suse.de wrote: > > > > Protect inode->i_count with i_lock, rather than having it atomic. > > > > Next step should also be to move things together (eg. the refcount increment > > > > into d_instantiate, which will remove a lock/unlock cycle on i_lock). > > > ..... > > > > Index: linux-2.6/fs/inode.c > > > > =================================================================== > > > > --- linux-2.6.orig/fs/inode.c > > > > +++ linux-2.6/fs/inode.c > > > > @@ -33,14 +33,13 @@ > > > > * inode_hash_lock protects: > > > > * inode hash table, i_hash > > > > * inode->i_lock protects: > > > > - * i_state > > > > + * i_state, i_count > > > > * > > > > * Ordering: > > > > * inode_lock > > > > * sb_inode_list_lock > > > > * inode->i_lock > > > > - * inode_lock > > > > - * inode_hash_lock > > > > + * inode_hash_lock > > > > */ > > > > > > I thought that the rule governing the use of inode->i_lock was that > > > it can be used anywhere as long as it is the innermost lock. > > > > > > Hmmm, no references in the code or documentation. Google gives a > > > pretty good reference: > > > > > > http://www.mail-archive.com/linux-ext4(a)vger.kernel.org/msg02584.html > > > > > > Perhaps a different/new lock needs to be used here? > > > > Well I just changed the order (and documented it to boot :)). It's > > pretty easy to verify that LOR is no problem. inode hash is only > > taken in a very few places so other code outside inode.c is fine to > > use i_lock as an innermost lock. > > um, nesting a kernel-wide singleton lock inside a per-inode lock is > plain nutty. I think it worked out OK. Because the kernel-wide locks are really restricted in where they are to be used (ie. not in filesystems). So they're really pretty constrained to the inode management subsystem. So filesystems still get to really use i_lock as an inner most lock for their purposes. And filesystems get to take i_lock and stop all manipulation of inode now. No changing of flags, moving it on/off lists etc behind its back. It really is about locking the data rather than the code. The final ordering outcome looks like this: * inode->i_lock * inode_list_lglock * zone->inode_lru_lock * wb->b_lock * inode_hash_bucket lock And it works like that because when you want to add or remove an inode from various data structures, you take the i_lock and then take each of these locks in turn, inside it. The alternative is to build a bigger lock ordering graph, and take all the locks up-front before taking i_lock. I did actaully try that and it ended up being worse, so I went this route. I think taking a global lock in mark_inode_dirty is nutty (especially when that global lock is shared with hash management, LRU scanning, writeback, i_flags access... :) It's just a question of which is less nutty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on 3 Jul 2010 00:40 On Sat, 3 Jul 2010 13:41:23 +1000 Nick Piggin <npiggin(a)suse.de> wrote: > On Fri, Jul 02, 2010 at 07:03:55PM -0700, Andrew Morton wrote: > > On Wed, 30 Jun 2010 22:05:02 +1000 Nick Piggin <npiggin(a)suse.de> wrote: > > > > > On Wed, Jun 30, 2010 at 05:27:02PM +1000, Dave Chinner wrote: > > > > On Thu, Jun 24, 2010 at 01:02:41PM +1000, npiggin(a)suse.de wrote: > > > > > Protect inode->i_count with i_lock, rather than having it atomic. > > > > > Next step should also be to move things together (eg. the refcount increment > > > > > into d_instantiate, which will remove a lock/unlock cycle on i_lock). > > > > ..... > > > > > Index: linux-2.6/fs/inode.c > > > > > =================================================================== > > > > > --- linux-2.6.orig/fs/inode.c > > > > > +++ linux-2.6/fs/inode.c > > > > > @@ -33,14 +33,13 @@ > > > > > * inode_hash_lock protects: > > > > > * inode hash table, i_hash > > > > > * inode->i_lock protects: > > > > > - * i_state > > > > > + * i_state, i_count > > > > > * > > > > > * Ordering: > > > > > * inode_lock > > > > > * sb_inode_list_lock > > > > > * inode->i_lock > > > > > - * inode_lock > > > > > - * inode_hash_lock > > > > > + * inode_hash_lock > > > > > */ > > > > > > > > I thought that the rule governing the use of inode->i_lock was that > > > > it can be used anywhere as long as it is the innermost lock. > > > > > > > > Hmmm, no references in the code or documentation. Google gives a > > > > pretty good reference: > > > > > > > > http://www.mail-archive.com/linux-ext4(a)vger.kernel.org/msg02584.html > > > > > > > > Perhaps a different/new lock needs to be used here? > > > > > > Well I just changed the order (and documented it to boot :)). It's > > > pretty easy to verify that LOR is no problem. inode hash is only > > > taken in a very few places so other code outside inode.c is fine to > > > use i_lock as an innermost lock. > > > > um, nesting a kernel-wide singleton lock inside a per-inode lock is > > plain nutty. > > I think it worked out OK. Because the kernel-wide locks are really > restricted in where they are to be used (ie. not in filesystems). So > they're really pretty constrained to the inode management subsystem. > So filesystems still get to really use i_lock as an inner most lock > for their purposes. > > And filesystems get to take i_lock and stop all manipulation of inode > now. No changing of flags, moving it on/off lists etc behind its back. > It really is about locking the data rather than the code. > > The final ordering outcome looks like this: > > * inode->i_lock > * inode_list_lglock > * zone->inode_lru_lock > * wb->b_lock > * inode_hash_bucket lock Apart from the conceptual vandalism, it means that any contention times and cache transfer times on those singleton locks will increase worst-case hold times of our nice, fine-grained i_lock. > And it works like that because when you want to add or remove an inode > from various data structures, you take the i_lock Well that would be wrong. i_lock protects things *within* its inode. It's nonsensical to take i_lock when the inode is being added to or removed from external containers because i_lock doesn't protect those containers! > and then take each > of these locks in turn, inside it. The alternative is to build a bigger > lock ordering graph, and take all the locks up-front before taking > i_lock. I did actaully try that and it ended up being worse, so I went > this route. > > I think taking a global lock in mark_inode_dirty is nutty (especially > when that global lock is shared with hash management, LRU scanning, > writeback, i_flags access... :) It's just a question of which is less > nutty. Yes, inode_lock is bad news. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 3 Jul 2010 01:10
On Fri, Jul 02, 2010 at 09:31:49PM -0700, Andrew Morton wrote: > On Sat, 3 Jul 2010 13:41:23 +1000 Nick Piggin <npiggin(a)suse.de> wrote: > > On Fri, Jul 02, 2010 at 07:03:55PM -0700, Andrew Morton wrote: > > > um, nesting a kernel-wide singleton lock inside a per-inode lock is > > > plain nutty. > > > > I think it worked out OK. Because the kernel-wide locks are really > > restricted in where they are to be used (ie. not in filesystems). So > > they're really pretty constrained to the inode management subsystem. > > So filesystems still get to really use i_lock as an inner most lock > > for their purposes. > > > > And filesystems get to take i_lock and stop all manipulation of inode > > now. No changing of flags, moving it on/off lists etc behind its back. > > It really is about locking the data rather than the code. > > > > The final ordering outcome looks like this: > > > > * inode->i_lock > > * inode_list_lglock > > * zone->inode_lru_lock > > * wb->b_lock > > * inode_hash_bucket lock > > Apart from the conceptual vandalism, it means that any contention times > and cache transfer times on those singleton locks will increase > worst-case hold times of our nice, fine-grained i_lock. Yes you are quite right about contention times. I'll answer in two parts. First, why I don't think it should prove to be a big problem; second, what we can do to improve it if it does. So a lot of things that previously required the much worse inode_lock now can use the i_lock (or other fine grained locks above). Also, the the contention only comes into play if we actually happen to hit the same i_lock at the same time, so the throughput oriented mainline should generally be OK, and -rt has priority inheretance that improves that situation. IF this proves to be a problem -- I doubt it will, in fact I think that worst case contention experienced by filesystems and vfs will go down, significantly -- but if it is a problem, we can easily fix it up. Because all of those above data structures can be traversed using RCU (hash list already is). That would make all the locks really only taken to put an inode on or off a list. The other thing that can be trivially done is to introduce different locks. I don't have a problem with that, but like any other data structure, I just would like to wait and see where we have problems. The same argument applies to a lot of places that we use a single lock to lock different properties of an object. We use page_lock to protect page membership on or off LRU and pagecache lists for example, as well as various state transitions (eg. to writeback). In summary, if there is a lock hold problem, it is easy to use RCU to reduce lock widths, or introduce a new per-inode lock to protect different parts of the inode structure. > > And it works like that because when you want to add or remove an inode > > from various data structures, you take the i_lock > > Well that would be wrong. i_lock protects things *within* its inode. > It's nonsensical to take i_lock when the inode is being added to or > removed from external containers because i_lock doesn't protect those > containers! Membership in a data structure is a property of the item, conceptually and literally when you're messing with list entries and such. There is no conceptual vandalism that I can tell. And it just works better this way when lifting inode_lock (and dcache_lock). Currently, we do things like: spin_lock(&inode_lock); add_to_list1 add_to_list2 inode->blah = something spin_unlock(&inode_lock); If you lock list1, list2, and blah seperately, it is no longer an equivalent transformation: other code could find an inode on list1 that is now not on list2 and blah is not set. The real data object of interest in all cases is the inode object. Other structures are just in aid of finding inode objects that have a particular property. So it makes a lot of sense to have a lock to rule the inode (as opposed to now we have a lock to rule *all* inodes). It is possible that locking can be reduced if some things are verified and carefully shown not to matter. I just don't see the need yet and it would make things overly complicated I think. Introducing any more complexity will sink this patchset. > > and then take each > > of these locks in turn, inside it. The alternative is to build a bigger > > lock ordering graph, and take all the locks up-front before taking > > i_lock. I did actaully try that and it ended up being worse, so I went > > this route. > > > > I think taking a global lock in mark_inode_dirty is nutty (especially > > when that global lock is shared with hash management, LRU scanning, > > writeback, i_flags access... :) It's just a question of which is less > > nutty. > > Yes, inode_lock is bad news. Hopefully not for long :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |