From: Wu Fengguang on
> Some insight on how the other writeback changes that are being floated
> around might affect the number of dirty pages reclaim encounters would also
> be helpful.

Here is an interesting related problem about the wait_on_page_writeback() call
inside shrink_page_list():

http://lkml.org/lkml/2010/4/4/86

The problem is, wait_on_page_writeback() is called too early in the
direct reclaim path, which blocks many random/unrelated processes when
some slow (USB stick) writeback is on the way.

A simple dd can easily create a big range of dirty pages in the LRU
list. Therefore priority can easily go below (DEF_PRIORITY - 2) in a
typical desktop, which triggers the lumpy reclaim mode and hence
wait_on_page_writeback().

I proposed this patch at the time, which was confirmed to solve the problem:

--- linux-next.orig/mm/vmscan.c 2010-06-24 14:32:03.000000000 +0800
+++ linux-next/mm/vmscan.c 2010-07-22 16:12:34.000000000 +0800
@@ -1650,7 +1650,7 @@ static void set_lumpy_reclaim_mode(int p
*/
if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
sc->lumpy_reclaim_mode = 1;
- else if (sc->order && priority < DEF_PRIORITY - 2)
+ else if (sc->order && priority < DEF_PRIORITY / 2)
sc->lumpy_reclaim_mode = 1;
else
sc->lumpy_reclaim_mode = 0;


However KOSAKI and Minchan raised concerns about raising the bar.
I guess this new patch is more problem oriented and acceptable:

--- linux-next.orig/mm/vmscan.c 2010-07-22 16:36:58.000000000 +0800
+++ linux-next/mm/vmscan.c 2010-07-22 16:39:57.000000000 +0800
@@ -1217,7 +1217,8 @@ static unsigned long shrink_inactive_lis
count_vm_events(PGDEACTIVATE, nr_active);

nr_freed += shrink_page_list(&page_list, sc,
- PAGEOUT_IO_SYNC);
+ priority < DEF_PRIORITY / 3 ?
+ PAGEOUT_IO_SYNC : PAGEOUT_IO_ASYNC);
}

nr_reclaimed += nr_freed;

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
Sorry, please ignore this hack, it's non sense..

>
> --- linux-next.orig/mm/vmscan.c 2010-07-22 16:36:58.000000000 +0800
> +++ linux-next/mm/vmscan.c 2010-07-22 16:39:57.000000000 +0800
> @@ -1217,7 +1217,8 @@ static unsigned long shrink_inactive_lis
> count_vm_events(PGDEACTIVATE, nr_active);
>
> nr_freed += shrink_page_list(&page_list, sc,
> - PAGEOUT_IO_SYNC);
> + priority < DEF_PRIORITY / 3 ?
> + PAGEOUT_IO_SYNC : PAGEOUT_IO_ASYNC);
> }
>
> nr_reclaimed += nr_freed;

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
> I guess this new patch is more problem oriented and acceptable:
>
> --- linux-next.orig/mm/vmscan.c 2010-07-22 16:36:58.000000000 +0800
> +++ linux-next/mm/vmscan.c 2010-07-22 16:39:57.000000000 +0800
> @@ -1217,7 +1217,8 @@ static unsigned long shrink_inactive_lis
> count_vm_events(PGDEACTIVATE, nr_active);
>
> nr_freed += shrink_page_list(&page_list, sc,
> - PAGEOUT_IO_SYNC);
> + priority < DEF_PRIORITY / 3 ?
> + PAGEOUT_IO_SYNC : PAGEOUT_IO_ASYNC);
> }
>
> nr_reclaimed += nr_freed;

This one looks better:
---
vmscan: raise the bar to PAGEOUT_IO_SYNC stalls

Fix "system goes totally unresponsive with many dirty/writeback pages"
problem:

http://lkml.org/lkml/2010/4/4/86

The root cause is, wait_on_page_writeback() is called too early in the
direct reclaim path, which blocks many random/unrelated processes when
some slow (USB stick) writeback is on the way.

A simple dd can easily create a big range of dirty pages in the LRU
list. Therefore priority can easily go below (DEF_PRIORITY - 2) in a
typical desktop, which triggers the lumpy reclaim mode and hence
wait_on_page_writeback().

In Andreas' case, 512MB/1024 = 512KB, this is way too low comparing to
the 22MB writeback and 190MB dirty pages. There can easily be a
continuous range of 512KB dirty/writeback pages in the LRU, which will
trigger the wait logic.

To make it worse, when there are 50MB writeback pages and USB 1.1 is
writing them in 1MB/s, wait_on_page_writeback() may stuck for up to 50
seconds.

So only enter sync write&wait when priority goes below DEF_PRIORITY/3,
or 6.25% LRU. As the default dirty throttle ratio is 20%, sync write&wait
will hardly be triggered by pure dirty pages.

Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
---
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-next.orig/mm/vmscan.c 2010-07-22 16:36:58.000000000 +0800
+++ linux-next/mm/vmscan.c 2010-07-22 17:03:47.000000000 +0800
@@ -1206,7 +1206,7 @@ static unsigned long shrink_inactive_lis
* but that should be acceptable to the caller
*/
if (nr_freed < nr_taken && !current_is_kswapd() &&
- sc->lumpy_reclaim_mode) {
+ sc->lumpy_reclaim_mode && priority < DEF_PRIORITY / 3) {
congestion_wait(BLK_RW_ASYNC, HZ/10);

/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
Hi, Wu.
Thanks for Cced me.

AFAIR, we discussed this by private mail and didn't conclude yet.
Let's start from beginning.

On Thu, Jul 22, 2010 at 05:21:55PM +0800, Wu Fengguang wrote:
> > I guess this new patch is more problem oriented and acceptable:
> >
> > --- linux-next.orig/mm/vmscan.c 2010-07-22 16:36:58.000000000 +0800
> > +++ linux-next/mm/vmscan.c 2010-07-22 16:39:57.000000000 +0800
> > @@ -1217,7 +1217,8 @@ static unsigned long shrink_inactive_lis
> > count_vm_events(PGDEACTIVATE, nr_active);
> >
> > nr_freed += shrink_page_list(&page_list, sc,
> > - PAGEOUT_IO_SYNC);
> > + priority < DEF_PRIORITY / 3 ?
> > + PAGEOUT_IO_SYNC : PAGEOUT_IO_ASYNC);
> > }
> >
> > nr_reclaimed += nr_freed;
>
> This one looks better:
> ---
> vmscan: raise the bar to PAGEOUT_IO_SYNC stalls
>
> Fix "system goes totally unresponsive with many dirty/writeback pages"
> problem:
>
> http://lkml.org/lkml/2010/4/4/86
>
> The root cause is, wait_on_page_writeback() is called too early in the
> direct reclaim path, which blocks many random/unrelated processes when
> some slow (USB stick) writeback is on the way.
>
> A simple dd can easily create a big range of dirty pages in the LRU
> list. Therefore priority can easily go below (DEF_PRIORITY - 2) in a
> typical desktop, which triggers the lumpy reclaim mode and hence
> wait_on_page_writeback().

I see oom message. order is zero.
How is lumpy reclaim work?
For working lumpy reclaim, we have to meet priority < 10 and sc->order > 0.

Please, clarify the problem.

>
> In Andreas' case, 512MB/1024 = 512KB, this is way too low comparing to
> the 22MB writeback and 190MB dirty pages. There can easily be a

What's 22MB and 190M?
It would be better to explain more detail.
I think the description has to be clear as summary of the problem
without the above link.

Thanks for taking out this problem, again. :)
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
Hi Mel,

On Thu, Jul 22, 2010 at 05:42:09PM +0800, Mel Gorman wrote:
> On Thu, Jul 22, 2010 at 04:52:10PM +0800, Wu Fengguang wrote:
> > > Some insight on how the other writeback changes that are being floated
> > > around might affect the number of dirty pages reclaim encounters would also
> > > be helpful.
> >
> > Here is an interesting related problem about the wait_on_page_writeback() call
> > inside shrink_page_list():
> >
> > http://lkml.org/lkml/2010/4/4/86

I guess you've got the answers from the above thread, anyway here is
the brief answers to your questions.

> >
> > The problem is, wait_on_page_writeback() is called too early in the
> > direct reclaim path, which blocks many random/unrelated processes when
> > some slow (USB stick) writeback is on the way.
> >
> > A simple dd can easily create a big range of dirty pages in the LRU
> > list. Therefore priority can easily go below (DEF_PRIORITY - 2) in a
> > typical desktop, which triggers the lumpy reclaim mode and hence
> > wait_on_page_writeback().
> >
>
> Lumpy reclaim is for high-order allocations. A simple dd should not be
> triggering it regularly unless there was a lot of forking going on at the
> same time.

dd could create the dirty file fast enough, so that no other processes
are injecting pages into the LRU lists besides dd itself. So it's
creating a large range of hard-to-reclaim LRU pages which will trigger
this code

+ else if (sc->order && priority < DEF_PRIORITY - 2)
+ lumpy_reclaim = 1;


> Also, how would a random or unrelated process get blocked on
> writeback unless they were also doing high-order allocations? What was the
> source of the high-order allocations?

sc->order is 1 on fork().

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/