From: Dave Chinner on
Per-superblock shrinkers are not baked well enough for 2.6.36. However, we
still need fixes for the XFS shrinker lockdep problems caused by the global
mount list lock and other problems before 2.6.35 releases. The lockdep issues
look like:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.35-rc5-dgc+ #34
-------------------------------------------------------
kswapd0/471 is trying to acquire lock:
(&(&ip->i_lock)->mr_lock){++++-.}, at: [<ffffffff81316feb>] xfs_ilock+0x10b/0x190

but task is already holding lock:
(&xfs_mount_list_lock){++++.-}, at: [<ffffffff81350fd6>] xfs_reclaim_inode_shrink+0xd6/0x150

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&xfs_mount_list_lock){++++.-}:
[<ffffffff810b4ad6>] lock_acquire+0xa6/0x160
[<ffffffff817a4dd5>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff8106db62>] __wake_up+0x32/0x70
[<ffffffff811196db>] wakeup_kswapd+0xab/0xb0
[<ffffffff811132cd>] __alloc_pages_nodemask+0x27d/0x760
[<ffffffff81145c72>] kmem_getpages+0x62/0x160
[<ffffffff81146cdf>] fallback_alloc+0x18f/0x260
[<ffffffff81146a6b>] ____cache_alloc_node+0x9b/0x180
[<ffffffff811473bb>] kmem_cache_alloc+0x16b/0x1e0
[<ffffffff81340d54>] kmem_zone_alloc+0x94/0xe0
[<ffffffff813173a9>] xfs_inode_alloc+0x29/0x1b0
[<ffffffff8131781c>] xfs_iget+0x2ec/0x7a0
[<ffffffff8133a697>] xfs_trans_iget+0x27/0x60
[<ffffffff8131a60a>] xfs_ialloc+0xca/0x790
[<ffffffff8133b37f>] xfs_dir_ialloc+0xaf/0x340
[<ffffffff8133c38c>] xfs_create+0x3dc/0x710
[<ffffffff8134d277>] xfs_vn_mknod+0xa7/0x1c0
[<ffffffff8134d3c0>] xfs_vn_create+0x10/0x20
[<ffffffff8115ab2c>] vfs_create+0xac/0xd0
[<ffffffff8115b6bc>] do_last+0x51c/0x620
[<ffffffff8115dbd4>] do_filp_open+0x224/0x640
[<ffffffff8114d969>] do_sys_open+0x69/0x140
[<ffffffff8114da80>] sys_open+0x20/0x30
[<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b

-> #0 (&(&ip->i_lock)->mr_lock){++++-.}:
[<ffffffff810b47a3>] __lock_acquire+0x11c3/0x1450
[<ffffffff810b4ad6>] lock_acquire+0xa6/0x160
[<ffffffff810a2035>] down_write_nested+0x65/0xb0
[<ffffffff81316feb>] xfs_ilock+0x10b/0x190
[<ffffffff8135023d>] xfs_reclaim_inode+0x9d/0x250
[<ffffffff81350d4b>] xfs_inode_ag_walk+0x8b/0x150
[<ffffffff81350e9b>] xfs_inode_ag_iterator+0x8b/0xf0
[<ffffffff8135100c>] xfs_reclaim_inode_shrink+0x10c/0x150
[<ffffffff81119be5>] shrink_slab+0x135/0x1a0
[<ffffffff8111bac1>] balance_pgdat+0x421/0x6a0
[<ffffffff8111be5d>] kswapd+0x11d/0x320
[<ffffffff8109cdb6>] kthread+0x96/0xa0
[<ffffffff81035de4>] kernel_thread_helper+0x4/0x10

other info that might help us debug this:

2 locks held by kswapd0/471:
#0: (shrinker_rwsem){++++..}, at: [<ffffffff81119aed>] shrink_slab+0x3d/0x1a0
#1: (&xfs_mount_list_lock){++++.-}, at: [<ffffffff81350fd6>] xfs_reclaim_inode_shrink+0xd6/0x150

There are also a few variations as these paths are traversed
from different locations in different workloads.

There are also scanning overhead problems caused by the global shrinker as seen
in https://bugzilla.kernel.org/show_bug.cgi?id=16348. This is not helped by
every shrinker call potentially traversing multiple filesystems to find one
with reclaimable inodes.

The context based shrinker solution is very simple and doesn't have any effect
outside XFS. For XFS, it allows us to avoid locking needed by a global list, as
well as remove the repeated scanning of clean filesystems on every shrinker
call. In combination with the tagging of the per-AG index to track AGs with
reclaimable inodes, all the unnecessary AG scanning is removed and the overhead
is minimised. Hence kswapd CPU usage and reclaim progress is not hindered
anymore.

The patch set is also available at:

git://git.kernel.org/pub/scm/git/linux/dgc/xfsdev shrinker

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/