From: Wu Fengguang on
Andrew,

The basic way of avoiding pageout() is to make the flusher sync inodes in the
right order. Oldest dirty inodes contains oldest pages. The smaller inode it
is, the more correlation between inode dirty time and its pages' dirty time.
So for small dirty inodes, syncing in the order of inode dirty time is able to
avoid pageout(). If pageout() is still triggered frequently in this case, the
30s dirty expire time may be too long and could be shrinked adaptively; or it
may be a stressed memcg list whose dirty inodes/pages are more hard to track.

For a large dirty inode, it may flush lots of newly dirtied pages _after_
syncing the expired pages. This is the normal case for a single-stream
sequential dirtier, where older pages are in lower offsets. In this case we
shall not insist on syncing the whole large dirty inode before considering the
other small dirty inodes. This risks wasting time syncing 1GB freshly dirtied
pages before syncing the other N*1MB expired dirty pages who are approaching
the end of the LRU list and hence pageout().

For a large dirty inode, it may also flush lots of newly dirtied pages _before_
hitting the desired old ones, in which case it helps for pageout() to do some
clustered writeback, and/or set mapping->writeback_index to help the flusher
focus on old pages.

For a large dirty inode, it may also have intermixed old and new dirty pages.
In this case we need to make sure the inode is queued for IO before some of
its pages hit pageout(). Adaptive dirty expire time helps here.

OK, end of the vapour ideas. As for this patchset, it fixes the current
kupdate/background writeback priority:

- the kupdate/background writeback shall include newly expired inodes at each
queue_io() time, as the large inodes left over from previous writeback rounds
are likely to have less density of old pages.

- the background writeback shall consider expired inodes first, just like the
kupdate writeback

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/