inode: Make unused inode LRU per superblock [Kernel]

Prev: [PATCH 4/17] drivers/isdn/hardware/mISDN: Add missing spin_unlock
Next: [PATCH] Typo in netdevice.h

From: Nick Piggin on 26 May 2010 12:20

On Tue, May 25, 2010 at 06:53:04PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner(a)redhat.com>
>
> The inode unused list is currently a global LRU. This does not match
> the other global filesystem cache - the dentry cache - which uses
> per-superblock LRU lists. Hence we have related filesystem object
> types using different LRU reclaimatin schemes.

Is this an improvement I wonder? The dcache is using per sb lists
because it specifically requires sb traversal.

What allocation/reclaim really wants (for good scalability and NUMA
characteristics) is per-zone lists for these things. It's easy to
convert a single list into per-zone lists.

It is much harder to convert per-sb lists into per-sb x per-zone lists.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Dave Chinner on 26 May 2010 19:10

On Thu, May 27, 2010 at 02:17:33AM +1000, Nick Piggin wrote:
> On Tue, May 25, 2010 at 06:53:04PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner(a)redhat.com>
> >
> > The inode unused list is currently a global LRU. This does not match
> > the other global filesystem cache - the dentry cache - which uses
> > per-superblock LRU lists. Hence we have related filesystem object
> > types using different LRU reclaimatin schemes.
>
> Is this an improvement I wonder? The dcache is using per sb lists
> because it specifically requires sb traversal.

Right - I originally implemented the per-sb dentry lists for
scalability purposes. i.e. to avoid monopolising the dentry_lock
during unmount looking for dentries on a specific sb and hanging the
system for several minutes.

However, the reason for doing this to the inode cache is not for
scalability, it's because we have a tight relationship between the
dentry and inode cacheѕ. That is, reclaim from the dentry LRU grows
the inode LRU. Like the registration of the shrinkers, this is kind
of an implicit, undocumented behavour of the current shrinker
implemenation.

What this patch series does is take that implicit relationship and
make it explicit. It also allows other filesystem caches to tie
into the relationship if they need to (e.g. the XFS inode cache).
What it _doesn't do_ is change the macro level behaviour of the
shrinkers...

> What allocation/reclaim really wants (for good scalability and NUMA
> characteristics) is per-zone lists for these things. It's easy to
> convert a single list into per-zone lists.
>
> It is much harder to convert per-sb lists into per-sb x per-zone lists.

No it's not. Just convert the s_{dentry,inode}_lru lists on each
superblock and call the shrinker with a new zone mask field to pick
the correct LRU. That's no harder than converting a global LRU.
Anyway, you'd still have to do per-sb x per-zone lists for the dentry LRUs,
so changing the inode cache to per-sb makes no difference.

However, this is a moot point because we don't have per-zone shrinker
interfaces. That's an entirely separate discussion because of the
macro-level behavioural changes it implies....

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nick Piggin on 26 May 2010 22:10

On Thu, May 27, 2010 at 09:01:29AM +1000, Dave Chinner wrote:
> On Thu, May 27, 2010 at 02:17:33AM +1000, Nick Piggin wrote:
> > On Tue, May 25, 2010 at 06:53:04PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner(a)redhat.com>
> > >
> > > The inode unused list is currently a global LRU. This does not match
> > > the other global filesystem cache - the dentry cache - which uses
> > > per-superblock LRU lists. Hence we have related filesystem object
> > > types using different LRU reclaimatin schemes.
> >
> > Is this an improvement I wonder? The dcache is using per sb lists
> > because it specifically requires sb traversal.
>
> Right - I originally implemented the per-sb dentry lists for
> scalability purposes. i.e. to avoid monopolising the dentry_lock
> during unmount looking for dentries on a specific sb and hanging the
> system for several minutes.
>
> However, the reason for doing this to the inode cache is not for
> scalability, it's because we have a tight relationship between the
> dentry and inode cacheѕ. That is, reclaim from the dentry LRU grows
> the inode LRU. Like the registration of the shrinkers, this is kind
> of an implicit, undocumented behavour of the current shrinker
> implemenation.

Right, that's why I wonder whether it is an improvement. It would
be interesting to see some tests (showing at least parity).

> What this patch series does is take that implicit relationship and
> make it explicit. It also allows other filesystem caches to tie
> into the relationship if they need to (e.g. the XFS inode cache).
> What it _doesn't do_ is change the macro level behaviour of the
> shrinkers...
>
> > What allocation/reclaim really wants (for good scalability and NUMA
> > characteristics) is per-zone lists for these things. It's easy to
> > convert a single list into per-zone lists.
> >
> > It is much harder to convert per-sb lists into per-sb x per-zone lists.
>
> No it's not. Just convert the s_{dentry,inode}_lru lists on each
> superblock and call the shrinker with a new zone mask field to pick
> the correct LRU. That's no harder than converting a global LRU.
> Anyway, you'd still have to do per-sb x per-zone lists for the dentry LRUs,
> so changing the inode cache to per-sb makes no difference.

Right, it just makes it harder to do. By much harder, I did mostly mean
the extra memory overhead. If there is *no* benefit from doing per-sb
icache then I would question whether we should.

> However, this is a moot point because we don't have per-zone shrinker
> interfaces. That's an entirely separate discussion because of the
> macro-level behavioural changes it implies....

Yep. I have some patches for it, but they're currently behind the other
fine grained locking stuff. But it's something that really needs to be
implemented, IMO.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Dave Chinner on 27 May 2010 00:10

On Thu, May 27, 2010 at 12:04:45PM +1000, Nick Piggin wrote:
> On Thu, May 27, 2010 at 09:01:29AM +1000, Dave Chinner wrote:
> > On Thu, May 27, 2010 at 02:17:33AM +1000, Nick Piggin wrote:
> > > On Tue, May 25, 2010 at 06:53:04PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner(a)redhat.com>
> > > >
> > > > The inode unused list is currently a global LRU. This does not match
> > > > the other global filesystem cache - the dentry cache - which uses
> > > > per-superblock LRU lists. Hence we have related filesystem object
> > > > types using different LRU reclaimatin schemes.
> > >
> > > Is this an improvement I wonder? The dcache is using per sb lists
> > > because it specifically requires sb traversal.
> >
> > Right - I originally implemented the per-sb dentry lists for
> > scalability purposes. i.e. to avoid monopolising the dentry_lock
> > during unmount looking for dentries on a specific sb and hanging the
> > system for several minutes.
> >
> > However, the reason for doing this to the inode cache is not for
> > scalability, it's because we have a tight relationship between the
> > dentry and inode cacheѕ. That is, reclaim from the dentry LRU grows
> > the inode LRU. Like the registration of the shrinkers, this is kind
> > of an implicit, undocumented behavour of the current shrinker
> > implemenation.
>
> Right, that's why I wonder whether it is an improvement. It would
> be interesting to see some tests (showing at least parity).

I've done some testing showing parity. They've been along the lines
of:
- populate cache with 1m dentries + inodes
- run 'time echo 2 > /proc/sys/vm/drop_caches'

I've used different methods of populating the caches to have them
non-sequential in the LRU (i.e. trigger fragmentation), have dirty
backing inodes (e.g. the VFS inode clean, the xfs inode dirty
because transactions haven't completed), etc.

The variation on the test is around +-10%, with the per-sb shrinkers
averaging about 5% lower time to reclaim. This is within the error
margin of the test, so it's not really a conclusive win, but it is
certainly shows that it does not slow anything down. If you've got a
better way to test it, then I'm all ears....

> > What this patch series does is take that implicit relationship and
> > make it explicit. It also allows other filesystem caches to tie
> > into the relationship if they need to (e.g. the XFS inode cache).
> > What it _doesn't do_ is change the macro level behaviour of the
> > shrinkers...
> >
> > > What allocation/reclaim really wants (for good scalability and NUMA
> > > characteristics) is per-zone lists for these things. It's easy to
> > > convert a single list into per-zone lists.
> > >
> > > It is much harder to convert per-sb lists into per-sb x per-zone lists.
> >
> > No it's not. Just convert the s_{dentry,inode}_lru lists on each
> > superblock and call the shrinker with a new zone mask field to pick
> > the correct LRU. That's no harder than converting a global LRU.
> > Anyway, you'd still have to do per-sb x per-zone lists for the dentry LRUs,
> > so changing the inode cache to per-sb makes no difference.
>
> Right, it just makes it harder to do. By much harder, I did mostly mean
> the extra memory overhead.

You've still got to allocate that extra memory on the per-sb dentry
LRUs so it's not really a valid argument. IOWs, if it's too much
memory for per-sb inode LRUs, then it's too much memory for the
per-sb dentry LRUs as well...

> If there is *no* benefit from doing per-sb
> icache then I would question whether we should.

The same vague questions wondering about the benefit of per-sb
dentry LRUs were raised when I first proposed them years ago, and
look where we are now. Besides, focussing on whether this one patch
is a benefit or not is really missing the point because it's the
benefits of this patchset as a whole that need to be considered....

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nick Piggin on 27 May 2010 00:30

On Thu, May 27, 2010 at 02:02:10PM +1000, Dave Chinner wrote:
> On Thu, May 27, 2010 at 12:04:45PM +1000, Nick Piggin wrote:
> > On Thu, May 27, 2010 at 09:01:29AM +1000, Dave Chinner wrote:
> > > On Thu, May 27, 2010 at 02:17:33AM +1000, Nick Piggin wrote:
> > > > On Tue, May 25, 2010 at 06:53:04PM +1000, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner(a)redhat.com>
> > > > >
> > > > > The inode unused list is currently a global LRU. This does not match
> > > > > the other global filesystem cache - the dentry cache - which uses
> > > > > per-superblock LRU lists. Hence we have related filesystem object
> > > > > types using different LRU reclaimatin schemes.
> > > >
> > > > Is this an improvement I wonder? The dcache is using per sb lists
> > > > because it specifically requires sb traversal.
> > >
> > > Right - I originally implemented the per-sb dentry lists for
> > > scalability purposes. i.e. to avoid monopolising the dentry_lock
> > > during unmount looking for dentries on a specific sb and hanging the
> > > system for several minutes.
> > >
> > > However, the reason for doing this to the inode cache is not for
> > > scalability, it's because we have a tight relationship between the
> > > dentry and inode cacheѕ. That is, reclaim from the dentry LRU grows
> > > the inode LRU. Like the registration of the shrinkers, this is kind
> > > of an implicit, undocumented behavour of the current shrinker
> > > implemenation.
> >
> > Right, that's why I wonder whether it is an improvement. It would
> > be interesting to see some tests (showing at least parity).
>
> I've done some testing showing parity. They've been along the lines
> of:
> - populate cache with 1m dentries + inodes
> - run 'time echo 2 > /proc/sys/vm/drop_caches'
>
> I've used different methods of populating the caches to have them
> non-sequential in the LRU (i.e. trigger fragmentation), have dirty
> backing inodes (e.g. the VFS inode clean, the xfs inode dirty
> because transactions haven't completed), etc.
>
> The variation on the test is around +-10%, with the per-sb shrinkers
> averaging about 5% lower time to reclaim. This is within the error
> margin of the test, so it's not really a conclusive win, but it is
> certainly shows that it does not slow anything down. If you've got a
> better way to test it, then I'm all ears....

I guess the problem is that inode LRU cache isn't very useful as
long as there are dentries in the way (which is most of the time,
isn't it?). I think nfsd will exercise them better? Dont know of
any other cases.

> > Right, it just makes it harder to do. By much harder, I did mostly mean
> > the extra memory overhead.
>
> You've still got to allocate that extra memory on the per-sb dentry
> LRUs so it's not really a valid argument.

Well it would be per-zone, per-sb list, but I don't think that
makes it an ivalid point.

> IOWs, if it's too much
> memory for per-sb inode LRUs, then it's too much memory for the
> per-sb dentry LRUs as well...

Not about how much is too much, it's about more cost or memory
usage for what benefit? I guess it isn't a lot more memory though.

> > If there is *no* benefit from doing per-sb
> > icache then I would question whether we should.
>
> The same vague questions wondering about the benefit of per-sb
> dentry LRUs were raised when I first proposed them years ago, and
> look where we are now.

To be fair that is because there were specific needs to do per-sb
pruning. This isn't the case with icache.

> Besides, focussing on whether this one patch
> is a benefit or not is really missing the point because it's the
> benefits of this patchset as a whole that need to be considered....

I would indeed like to focus on the benefits of the patchset as a
whole. Leaving aside the xfs changes, it would be interesting to
have at least a few numbers for dcache/icache heavy workloads.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2
Prev: [PATCH 4/17] drivers/isdn/hardware/mISDN: Add missing spin_unlock
Next: [PATCH] Typo in netdevice.h