writeback: sync old inodes first in background writeback [Kernel]

Prev: [PATCH 4/6] Mtd: fixed open brace.
Next: Modpost error after changing CONFIG_SOUND from m to y

From: Wu Fengguang on 22 Jul 2010 05:00

> Some insight on how the other writeback changes that are being floated
> around might affect the number of dirty pages reclaim encounters would also
> be helpful.

Here is an interesting related problem about the wait_on_page_writeback() call
inside shrink_page_list():

http://lkml.org/lkml/2010/4/4/86

The problem is, wait_on_page_writeback() is called too early in the
direct reclaim path, which blocks many random/unrelated processes when
some slow (USB stick) writeback is on the way.

A simple dd can easily create a big range of dirty pages in the LRU
list. Therefore priority can easily go below (DEF_PRIORITY - 2) in a
typical desktop, which triggers the lumpy reclaim mode and hence
wait_on_page_writeback().

I proposed this patch at the time, which was confirmed to solve the problem:

--- linux-next.orig/mm/vmscan.c 2010-06-24 14:32:03.000000000 +0800
+++ linux-next/mm/vmscan.c 2010-07-22 16:12:34.000000000 +0800
@@ -1650,7 +1650,7 @@ static void set_lumpy_reclaim_mode(int p
*/
if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
sc->lumpy_reclaim_mode = 1;
- else if (sc->order && priority < DEF_PRIORITY - 2)
+ else if (sc->order && priority < DEF_PRIORITY / 2)
sc->lumpy_reclaim_mode = 1;
else
sc->lumpy_reclaim_mode = 0;

However KOSAKI and Minchan raised concerns about raising the bar.
I guess this new patch is more problem oriented and acceptable:

--- linux-next.orig/mm/vmscan.c 2010-07-22 16:36:58.000000000 +0800
+++ linux-next/mm/vmscan.c 2010-07-22 16:39:57.000000000 +0800
@@ -1217,7 +1217,8 @@ static unsigned long shrink_inactive_lis
count_vm_events(PGDEACTIVATE, nr_active);

nr_freed += shrink_page_list(&page_list, sc,
- PAGEOUT_IO_SYNC);
+ priority < DEF_PRIORITY / 3 ?
+ PAGEOUT_IO_SYNC : PAGEOUT_IO_ASYNC);
}

nr_reclaimed += nr_freed;

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wu Fengguang on 22 Jul 2010 05:10

Sorry, please ignore this hack, it's non sense..

>
> --- linux-next.orig/mm/vmscan.c 2010-07-22 16:36:58.000000000 +0800
> +++ linux-next/mm/vmscan.c 2010-07-22 16:39:57.000000000 +0800
> @@ -1217,7 +1217,8 @@ static unsigned long shrink_inactive_lis
> count_vm_events(PGDEACTIVATE, nr_active);
>
> nr_freed += shrink_page_list(&page_list, sc,
> - PAGEOUT_IO_SYNC);
> + priority < DEF_PRIORITY / 3 ?
> + PAGEOUT_IO_SYNC : PAGEOUT_IO_ASYNC);
> }
>
> nr_reclaimed += nr_freed;

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wu Fengguang on 22 Jul 2010 05:30

> I guess this new patch is more problem oriented and acceptable:
>
> --- linux-next.orig/mm/vmscan.c 2010-07-22 16:36:58.000000000 +0800
> +++ linux-next/mm/vmscan.c 2010-07-22 16:39:57.000000000 +0800
> @@ -1217,7 +1217,8 @@ static unsigned long shrink_inactive_lis
> count_vm_events(PGDEACTIVATE, nr_active);
>
> nr_freed += shrink_page_list(&page_list, sc,
> - PAGEOUT_IO_SYNC);
> + priority < DEF_PRIORITY / 3 ?
> + PAGEOUT_IO_SYNC : PAGEOUT_IO_ASYNC);
> }
>
> nr_reclaimed += nr_freed;

This one looks better:
---
vmscan: raise the bar to PAGEOUT_IO_SYNC stalls

Fix "system goes totally unresponsive with many dirty/writeback pages"
problem:

http://lkml.org/lkml/2010/4/4/86

The root cause is, wait_on_page_writeback() is called too early in the
direct reclaim path, which blocks many random/unrelated processes when
some slow (USB stick) writeback is on the way.

A simple dd can easily create a big range of dirty pages in the LRU
list. Therefore priority can easily go below (DEF_PRIORITY - 2) in a
typical desktop, which triggers the lumpy reclaim mode and hence
wait_on_page_writeback().

In Andreas' case, 512MB/1024 = 512KB, this is way too low comparing to
the 22MB writeback and 190MB dirty pages. There can easily be a
continuous range of 512KB dirty/writeback pages in the LRU, which will
trigger the wait logic.

To make it worse, when there are 50MB writeback pages and USB 1.1 is
writing them in 1MB/s, wait_on_page_writeback() may stuck for up to 50
seconds.

So only enter sync write&wait when priority goes below DEF_PRIORITY/3,
or 6.25% LRU. As the default dirty throttle ratio is 20%, sync write&wait
will hardly be triggered by pure dirty pages.

Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
---
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-next.orig/mm/vmscan.c 2010-07-22 16:36:58.000000000 +0800
+++ linux-next/mm/vmscan.c 2010-07-22 17:03:47.000000000 +0800
@@ -1206,7 +1206,7 @@ static unsigned long shrink_inactive_lis
* but that should be acceptable to the caller
*/
if (nr_freed < nr_taken && !current_is_kswapd() &&
- sc->lumpy_reclaim_mode) {
+ sc->lumpy_reclaim_mode && priority < DEF_PRIORITY / 3) {
congestion_wait(BLK_RW_ASYNC, HZ/10);

/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Minchan Kim on 22 Jul 2010 11:40

Hi, Wu.
Thanks for Cced me.

AFAIR, we discussed this by private mail and didn't conclude yet.
Let's start from beginning.

On Thu, Jul 22, 2010 at 05:21:55PM +0800, Wu Fengguang wrote:
> > I guess this new patch is more problem oriented and acceptable:
> >
> > --- linux-next.orig/mm/vmscan.c 2010-07-22 16:36:58.000000000 +0800
> > +++ linux-next/mm/vmscan.c 2010-07-22 16:39:57.000000000 +0800
> > @@ -1217,7 +1217,8 @@ static unsigned long shrink_inactive_lis
> > count_vm_events(PGDEACTIVATE, nr_active);
> >
> > nr_freed += shrink_page_list(&page_list, sc,
> > - PAGEOUT_IO_SYNC);
> > + priority < DEF_PRIORITY / 3 ?
> > + PAGEOUT_IO_SYNC : PAGEOUT_IO_ASYNC);
> > }
> >
> > nr_reclaimed += nr_freed;
>
> This one looks better:
> ---
> vmscan: raise the bar to PAGEOUT_IO_SYNC stalls
>
> Fix "system goes totally unresponsive with many dirty/writeback pages"
> problem:
>
> http://lkml.org/lkml/2010/4/4/86
>
> The root cause is, wait_on_page_writeback() is called too early in the
> direct reclaim path, which blocks many random/unrelated processes when
> some slow (USB stick) writeback is on the way.
>
> A simple dd can easily create a big range of dirty pages in the LRU
> list. Therefore priority can easily go below (DEF_PRIORITY - 2) in a
> typical desktop, which triggers the lumpy reclaim mode and hence
> wait_on_page_writeback().

I see oom message. order is zero.
How is lumpy reclaim work?
For working lumpy reclaim, we have to meet priority < 10 and sc->order > 0.

Please, clarify the problem.

>
> In Andreas' case, 512MB/1024 = 512KB, this is way too low comparing to
> the 22MB writeback and 190MB dirty pages. There can easily be a

What's 22MB and 190M?
It would be better to explain more detail.
I think the description has to be clear as summary of the problem
without the above link.

Thanks for taking out this problem, again. :)
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wu Fengguang on 23 Jul 2010 04:40

Hi Mel,

On Thu, Jul 22, 2010 at 05:42:09PM +0800, Mel Gorman wrote:
> On Thu, Jul 22, 2010 at 04:52:10PM +0800, Wu Fengguang wrote:
> > > Some insight on how the other writeback changes that are being floated
> > > around might affect the number of dirty pages reclaim encounters would also
> > > be helpful.
> >
> > Here is an interesting related problem about the wait_on_page_writeback() call
> > inside shrink_page_list():
> >
> > http://lkml.org/lkml/2010/4/4/86

I guess you've got the answers from the above thread, anyway here is
the brief answers to your questions.

> >
> > The problem is, wait_on_page_writeback() is called too early in the
> > direct reclaim path, which blocks many random/unrelated processes when
> > some slow (USB stick) writeback is on the way.
> >
> > A simple dd can easily create a big range of dirty pages in the LRU
> > list. Therefore priority can easily go below (DEF_PRIORITY - 2) in a
> > typical desktop, which triggers the lumpy reclaim mode and hence
> > wait_on_page_writeback().
> >
>
> Lumpy reclaim is for high-order allocations. A simple dd should not be
> triggering it regularly unless there was a lot of forking going on at the
> same time.

dd could create the dirty file fast enough, so that no other processes
are injecting pages into the LRU lists besides dd itself. So it's
creating a large range of hard-to-reclaim LRU pages which will trigger
this code

+ else if (sc->order && priority < DEF_PRIORITY - 2)
+ lumpy_reclaim = 1;

> Also, how would a random or unrelated process get blocked on
> writeback unless they were also doing high-order allocations? What was the
> source of the high-order allocations?

sc->order is 1 on fork().

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: [PATCH 4/6] Mtd: fixed open brace.
Next: Modpost error after changing CONFIG_SOUND from m to y