Prev: staging/lirc: fix non-CONFIG_MODULES build horkage
Next: [PATCH] Staging: dt3155: remove useless dt3155_major parameter
From: Jan Kara on 29 Jul 2010 12:30 On Thu 29-07-10 19:51:44, Wu Fengguang wrote: > The periodic/background writeback can run forever. So when any > sync work is enqueued, increase bdi->sync_works to notify the > active non-sync works to exit. Non-sync works queued after sync > works won't be affected. Hmm, wouldn't it be simpler logic to just make for_kupdate and for_background work always yield when there's some other work to do (as they are livelockable from the definition of the target they have) and make sure any other work isn't livelockable? The only downside is that non-livelockable work cannot be "fair" in the sense that we cannot switch inodes after writing MAX_WRITEBACK_PAGES. I even had a patch for this but it's already outdated by now. But I can refresh it if we decide this is the way to go. Honza > > Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com> > --- > fs/fs-writeback.c | 13 +++++++++++++ > include/linux/backing-dev.h | 6 ++++++ > mm/backing-dev.c | 1 + > 3 files changed, 20 insertions(+) > > --- linux-next.orig/fs/fs-writeback.c 2010-07-29 17:13:23.000000000 +0800 > +++ linux-next/fs/fs-writeback.c 2010-07-29 17:13:49.000000000 +0800 > @@ -80,6 +80,8 @@ static void bdi_queue_work(struct backin > > spin_lock(&bdi->wb_lock); > list_add_tail(&work->list, &bdi->work_list); > + if (work->for_sync) > + atomic_inc(&bdi->wb.sync_works); > spin_unlock(&bdi->wb_lock); > > /* > @@ -633,6 +635,14 @@ static long wb_writeback(struct bdi_writ > break; > > /* > + * background/periodic works can run forever, need to abort > + * on seeing any pending sync work, to prevent livelock it. > + */ > + if (atomic_read(&wb->sync_works) && > + (work->for_background || work->for_kupdate)) > + break; > + > + /* > * For background writeout, stop when we are below the > * background dirty threshold > */ > @@ -765,6 +775,9 @@ long wb_do_writeback(struct bdi_writebac > > wrote += wb_writeback(wb, work); > > + if (work->for_sync) > + atomic_dec(&wb->sync_works); > + > /* > * Notify the caller of completion if this is a synchronous > * work item, otherwise just free it. > --- linux-next.orig/include/linux/backing-dev.h 2010-07-29 17:13:23.000000000 +0800 > +++ linux-next/include/linux/backing-dev.h 2010-07-29 17:13:31.000000000 +0800 > @@ -50,6 +50,12 @@ struct bdi_writeback { > > unsigned long last_old_flush; /* last old data flush */ > > + /* > + * sync works queued, background works shall abort on seeing this, > + * to prevent livelocking the sync works > + */ > + atomic_t sync_works; > + > struct task_struct *task; /* writeback task */ > struct list_head b_dirty; /* dirty inodes */ > struct list_head b_io; /* parked for writeback */ > --- linux-next.orig/mm/backing-dev.c 2010-07-29 17:13:23.000000000 +0800 > +++ linux-next/mm/backing-dev.c 2010-07-29 17:13:31.000000000 +0800 > @@ -257,6 +257,7 @@ static void bdi_wb_init(struct bdi_write > > wb->bdi = bdi; > wb->last_old_flush = jiffies; > + atomic_set(&wb->sync_works, 0); > INIT_LIST_HEAD(&wb->b_dirty); > INIT_LIST_HEAD(&wb->b_io); > INIT_LIST_HEAD(&wb->b_more_io); > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo(a)vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack(a)suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on 30 Jul 2010 00:10 On Fri, Jul 30, 2010 at 12:20:27AM +0800, Jan Kara wrote: > On Thu 29-07-10 19:51:44, Wu Fengguang wrote: > > The periodic/background writeback can run forever. So when any > > sync work is enqueued, increase bdi->sync_works to notify the > > active non-sync works to exit. Non-sync works queued after sync > > works won't be affected. > Hmm, wouldn't it be simpler logic to just make for_kupdate and > for_background work always yield when there's some other work to do (as > they are livelockable from the definition of the target they have) and > make sure any other work isn't livelockable? Good idea! > The only downside is that > non-livelockable work cannot be "fair" in the sense that we cannot switch > inodes after writing MAX_WRITEBACK_PAGES. Cannot switch indoes _before_ finish with the current MAX_WRITEBACK_PAGES batch? > I even had a patch for this but it's already outdated by now. But I > can refresh it if we decide this is the way to go. I'm very interested in your old patch, would you post it? Let's see which one is easier to work with :) Thanks, Fengguang > > Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com> > > --- > > fs/fs-writeback.c | 13 +++++++++++++ > > include/linux/backing-dev.h | 6 ++++++ > > mm/backing-dev.c | 1 + > > 3 files changed, 20 insertions(+) > > > > --- linux-next.orig/fs/fs-writeback.c 2010-07-29 17:13:23.000000000 +0800 > > +++ linux-next/fs/fs-writeback.c 2010-07-29 17:13:49.000000000 +0800 > > @@ -80,6 +80,8 @@ static void bdi_queue_work(struct backin > > > > spin_lock(&bdi->wb_lock); > > list_add_tail(&work->list, &bdi->work_list); > > + if (work->for_sync) > > + atomic_inc(&bdi->wb.sync_works); > > spin_unlock(&bdi->wb_lock); > > > > /* > > @@ -633,6 +635,14 @@ static long wb_writeback(struct bdi_writ > > break; > > > > /* > > + * background/periodic works can run forever, need to abort > > + * on seeing any pending sync work, to prevent livelock it. > > + */ > > + if (atomic_read(&wb->sync_works) && > > + (work->for_background || work->for_kupdate)) > > + break; > > + > > + /* > > * For background writeout, stop when we are below the > > * background dirty threshold > > */ > > @@ -765,6 +775,9 @@ long wb_do_writeback(struct bdi_writebac > > > > wrote += wb_writeback(wb, work); > > > > + if (work->for_sync) > > + atomic_dec(&wb->sync_works); > > + > > /* > > * Notify the caller of completion if this is a synchronous > > * work item, otherwise just free it. > > --- linux-next.orig/include/linux/backing-dev.h 2010-07-29 17:13:23.000000000 +0800 > > +++ linux-next/include/linux/backing-dev.h 2010-07-29 17:13:31.000000000 +0800 > > @@ -50,6 +50,12 @@ struct bdi_writeback { > > > > unsigned long last_old_flush; /* last old data flush */ > > > > + /* > > + * sync works queued, background works shall abort on seeing this, > > + * to prevent livelocking the sync works > > + */ > > + atomic_t sync_works; > > + > > struct task_struct *task; /* writeback task */ > > struct list_head b_dirty; /* dirty inodes */ > > struct list_head b_io; /* parked for writeback */ > > --- linux-next.orig/mm/backing-dev.c 2010-07-29 17:13:23.000000000 +0800 > > +++ linux-next/mm/backing-dev.c 2010-07-29 17:13:31.000000000 +0800 > > @@ -257,6 +257,7 @@ static void bdi_wb_init(struct bdi_write > > > > wb->bdi = bdi; > > wb->last_old_flush = jiffies; > > + atomic_set(&wb->sync_works, 0); > > INIT_LIST_HEAD(&wb->b_dirty); > > INIT_LIST_HEAD(&wb->b_io); > > INIT_LIST_HEAD(&wb->b_more_io); > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > > the body of a message to majordomo(a)vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > Jan Kara <jack(a)suse.cz> > SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on 2 Aug 2010 17:00 On Fri 30-07-10 12:03:06, Wu Fengguang wrote: > On Fri, Jul 30, 2010 at 12:20:27AM +0800, Jan Kara wrote: > > On Thu 29-07-10 19:51:44, Wu Fengguang wrote: > > > The periodic/background writeback can run forever. So when any > > > sync work is enqueued, increase bdi->sync_works to notify the > > > active non-sync works to exit. Non-sync works queued after sync > > > works won't be affected. > > Hmm, wouldn't it be simpler logic to just make for_kupdate and > > for_background work always yield when there's some other work to do (as > > they are livelockable from the definition of the target they have) and > > make sure any other work isn't livelockable? > > Good idea! > > > The only downside is that > > non-livelockable work cannot be "fair" in the sense that we cannot switch > > inodes after writing MAX_WRITEBACK_PAGES. > > Cannot switch indoes _before_ finish with the current > MAX_WRITEBACK_PAGES batch? Well, even after writing all those MAX_WRITEBACK_PAGES. Because what you want to do in a non-livelockable work is: take inode, write it, never look at it again for this work. Because if you later return to the inode, it can have newer dirty pages and thus you cannot really avoid livelock. Of course, this all assumes .nr_to_write isn't set to something small. That avoids the livelock as well. > > I even had a patch for this but it's already outdated by now. But I > > can refresh it if we decide this is the way to go. > > I'm very interested in your old patch, would you post it? Let's see > which one is easier to work with :) OK, attached is the patch. I've rebased it against 2.6.35. Honza -- Jan Kara <jack(a)suse.cz> SUSE Labs, CR
From: Wu Fengguang on 2 Aug 2010 23:10 On Tue, Aug 03, 2010 at 04:51:52AM +0800, Jan Kara wrote: > On Fri 30-07-10 12:03:06, Wu Fengguang wrote: > > On Fri, Jul 30, 2010 at 12:20:27AM +0800, Jan Kara wrote: > > > On Thu 29-07-10 19:51:44, Wu Fengguang wrote: > > > > The periodic/background writeback can run forever. So when any > > > > sync work is enqueued, increase bdi->sync_works to notify the > > > > active non-sync works to exit. Non-sync works queued after sync > > > > works won't be affected. > > > Hmm, wouldn't it be simpler logic to just make for_kupdate and > > > for_background work always yield when there's some other work to do (as > > > they are livelockable from the definition of the target they have) and > > > make sure any other work isn't livelockable? > > > > Good idea! > > > > > The only downside is that > > > non-livelockable work cannot be "fair" in the sense that we cannot switch > > > inodes after writing MAX_WRITEBACK_PAGES. > > > > Cannot switch indoes _before_ finish with the current > > MAX_WRITEBACK_PAGES batch? > Well, even after writing all those MAX_WRITEBACK_PAGES. Because what you > want to do in a non-livelockable work is: take inode, write it, never look at > it again for this work. Because if you later return to the inode, it can > have newer dirty pages and thus you cannot really avoid livelock. Of > course, this all assumes .nr_to_write isn't set to something small. That > avoids the livelock as well. I do have a poor man's solution that can handle this case. https://kerneltrap.org/mailarchive/linux-fsdevel/2009/10/7/6476473/thread It may do more extra works, but will stop livelock in theory. A related question is, what if some for_reclaim works get enqueued? Shall we postpone the sync work as well? The global sync is not likely to hit the dirty pages in a small memcg, or may take long time. It seems not a high priority task though. > > > I even had a patch for this but it's already outdated by now. But I > > > can refresh it if we decide this is the way to go. > > > > I'm very interested in your old patch, would you post it? Let's see > > which one is easier to work with :) > OK, attached is the patch. I've rebased it against 2.6.35. > Honza > -- > Jan Kara <jack(a)suse.cz> > SUSE Labs, CR > From a6df0d4db148f983fe756df4791409db28dff459 Mon Sep 17 00:00:00 2001 > From: Jan Kara <jack(a)suse.cz> > Date: Mon, 2 Aug 2010 22:30:25 +0200 > Subject: [PATCH] mm: Stop background writeback if there is other work queued for the thread > > Background writeback and kupdate-style writeback are easily livelockable > (from a definition of their target). This is inconvenient because it can > make sync(1) stall forever waiting on its queued work to be finished. > Fix the problem by interrupting background and kupdate writeback if there > is some other work to do. We can return to them after completing all the > queued work. > > Signed-off-by: Jan Kara <jack(a)suse.cz> > --- > fs/fs-writeback.c | 8 ++++++++ > 1 files changed, 8 insertions(+), 0 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index d5be169..542471e 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb, > break; > > /* > + * Background writeout and kupdate-style writeback are > + * easily livelockable. Stop them if there is other work > + * to do so that e.g. sync can proceed. > + */ > + if ((work->for_background || work->for_kupdate) && > + !list_empty(&wb->bdi->work_list)) > + break; > + /* I like it. It's much simpler. Reviewed-by: Wu Fengguang <fengguang.wu(a)intel.com> Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on 3 Aug 2010 07:00
On Tue 03-08-10 11:01:25, Wu Fengguang wrote: > On Tue, Aug 03, 2010 at 04:51:52AM +0800, Jan Kara wrote: > > On Fri 30-07-10 12:03:06, Wu Fengguang wrote: > > > On Fri, Jul 30, 2010 at 12:20:27AM +0800, Jan Kara wrote: > > > > On Thu 29-07-10 19:51:44, Wu Fengguang wrote: > > > > > The periodic/background writeback can run forever. So when any > > > > > sync work is enqueued, increase bdi->sync_works to notify the > > > > > active non-sync works to exit. Non-sync works queued after sync > > > > > works won't be affected. > > > > Hmm, wouldn't it be simpler logic to just make for_kupdate and > > > > for_background work always yield when there's some other work to do (as > > > > they are livelockable from the definition of the target they have) and > > > > make sure any other work isn't livelockable? > > > > > > Good idea! > > > > > > > The only downside is that > > > > non-livelockable work cannot be "fair" in the sense that we cannot switch > > > > inodes after writing MAX_WRITEBACK_PAGES. > > > > > > Cannot switch indoes _before_ finish with the current > > > MAX_WRITEBACK_PAGES batch? > > Well, even after writing all those MAX_WRITEBACK_PAGES. Because what you > > want to do in a non-livelockable work is: take inode, write it, never look at > > it again for this work. Because if you later return to the inode, it can > > have newer dirty pages and thus you cannot really avoid livelock. Of > > course, this all assumes .nr_to_write isn't set to something small. That > > avoids the livelock as well. > > I do have a poor man's solution that can handle this case. > https://kerneltrap.org/mailarchive/linux-fsdevel/2009/10/7/6476473/thread > It may do more extra works, but will stop livelock in theory. So I don't think sync work on it's own is a problem. There we can just give up any fairness and just go inode by inode. IMHO it's much simpler that way. The remaining types of work we have are "for_reclaim" and then ones triggered by filesystems to get rid of delayed allocated data. These cases can easily have well defined and low nr_to_write so they wouldn't be livelockable either. What do you think? > A related question is, what if some for_reclaim works get enqueued? > Shall we postpone the sync work as well? The global sync is not likely > to hit the dirty pages in a small memcg, or may take long time. It > seems not a high priority task though. I see some incentive to do this but the simple thing with for_background and for_kupdate work is that they are essentially state-less and so they can be easily (and automatically) restarted. It would be really hard to implement something like this for sync and still avoid livelocks. > > > > I even had a patch for this but it's already outdated by now. But I > > > > can refresh it if we decide this is the way to go. > > > > > > I'm very interested in your old patch, would you post it? Let's see > > > which one is easier to work with :) > > OK, attached is the patch. I've rebased it against 2.6.35. > > Honza > > -- > > Jan Kara <jack(a)suse.cz> > > SUSE Labs, CR > > > From a6df0d4db148f983fe756df4791409db28dff459 Mon Sep 17 00:00:00 2001 > > From: Jan Kara <jack(a)suse.cz> > > Date: Mon, 2 Aug 2010 22:30:25 +0200 > > Subject: [PATCH] mm: Stop background writeback if there is other work queued for the thread > > > > Background writeback and kupdate-style writeback are easily livelockable > > (from a definition of their target). This is inconvenient because it can > > make sync(1) stall forever waiting on its queued work to be finished. > > Fix the problem by interrupting background and kupdate writeback if there > > is some other work to do. We can return to them after completing all the > > queued work. > > > > Signed-off-by: Jan Kara <jack(a)suse.cz> > > --- > > fs/fs-writeback.c | 8 ++++++++ > > 1 files changed, 8 insertions(+), 0 deletions(-) > > > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > > index d5be169..542471e 100644 > > --- a/fs/fs-writeback.c > > +++ b/fs/fs-writeback.c > > @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb, > > break; > > > > /* > > + * Background writeout and kupdate-style writeback are > > + * easily livelockable. Stop them if there is other work > > + * to do so that e.g. sync can proceed. > > + */ > > + if ((work->for_background || work->for_kupdate) && > > + !list_empty(&wb->bdi->work_list)) > > + break; > > + /* > > I like it. It's much simpler. > > Reviewed-by: Wu Fengguang <fengguang.wu(a)intel.com> Thanks. I think I'll try to get this merged via Jens' tree in this merge window. Honza -- Jan Kara <jack(a)suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |