Prev: X25: Fix x25_create errors for bad protocol and ENOBUFS
Next: Remove unused macro, VM_MIN_READAHEAD.
From: KOSAKI Motohiro on 16 Feb 2010 00:40 > On Tue, 16 Feb 2010, KAMEZAWA Hiroyuki wrote: > > > > If memory has been depleted in lowmem zones even with the protection > > > afforded to it by /proc/sys/vm/lowmem_reserve_ratio, it is unlikely that > > > killing current users will help. The memory is either reclaimable (or > > > migratable) already, in which case we should not invoke the oom killer at > > > all, or it is pinned by an application for I/O. Killing such an > > > application may leave the hardware in an unspecified state and there is > > > no guarantee that it will be able to make a timely exit. > > > > > > Lowmem allocations are now failed in oom conditions so that the task can > > > perhaps recover or try again later. Killing current is an unnecessary > > > result for simply making a GFP_DMA or GFP_DMA32 page allocation and no > > > lowmem allocations use the now-deprecated __GFP_NOFAIL bit so retrying is > > > unnecessary. > > > > > > Previously, the heuristic provided some protection for those tasks with > > > CAP_SYS_RAWIO, but this is no longer necessary since we will not be > > > killing tasks for the purposes of ISA allocations. > > > > > > high_zoneidx is gfp_zone(gfp_flags), meaning that ZONE_NORMAL will be the > > > default for all allocations that are not __GFP_DMA, __GFP_DMA32, > > > __GFP_HIGHMEM, and __GFP_MOVABLE on kernels configured to support those > > > flags. Testing for high_zoneidx being less than ZONE_NORMAL will only > > > return true for allocations that have either __GFP_DMA or __GFP_DMA32. > > > > > > Acked-by: Rik van Riel <riel(a)redhat.com> > > > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> > > > Signed-off-by: David Rientjes <rientjes(a)google.com> > > > --- > > > mm/page_alloc.c | 3 +++ > > > 1 files changed, 3 insertions(+), 0 deletions(-) > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1914,6 +1914,9 @@ rebalance: > > > * running out of options and have to consider going OOM > > > */ > > > if (!did_some_progress) { > > > + /* The oom killer won't necessarily free lowmem */ > > > + if (high_zoneidx < ZONE_NORMAL) > > > + goto nopage; > > > if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) { > > > if (oom_killer_disabled) > > > goto nopage; > > > > WARN_ON((high_zoneidx < ZONE_NORMAL) && (gfp_mask & __GFP_NOFAIL)) > > plz. > > > > As I already explained when you first brought this up, the possibility of > not invoking the oom killer is not unique to GFP_DMA, it is also possible > for GFP_NOFS. Since __GFP_NOFAIL is deprecated and there are no current > users of GFP_DMA | __GFP_NOFAIL, that warning is completely unnecessary. > We're not adding any additional __GFP_NOFAIL allocations. No current user? I don't think so. int bio_integrity_prep(struct bio *bio) { (snip) buf = kmalloc(len, GFP_NOIO | __GFP_NOFAIL | q->bounce_gfp); and void blk_queue_bounce_limit(struct request_queue *q, u64 dma_mask) { (snip) if (dma) { init_emergency_isa_pool(); q->bounce_gfp = GFP_NOIO | GFP_DMA; q->limits.bounce_pfn = b_pfn; } I don't like rumor based discussion, I like fact based one. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 16 Feb 2010 01:50 On Mon, Feb 15, 2010 at 04:10:15PM -0800, David Rientjes wrote: > On Tue, 16 Feb 2010, KAMEZAWA Hiroyuki wrote: > > > > If memory has been depleted in lowmem zones even with the protection > > > afforded to it by /proc/sys/vm/lowmem_reserve_ratio, it is unlikely that > > > killing current users will help. The memory is either reclaimable (or > > > migratable) already, in which case we should not invoke the oom killer at > > > all, or it is pinned by an application for I/O. Killing such an > > > application may leave the hardware in an unspecified state and there is > > > no guarantee that it will be able to make a timely exit. > > > > > > Lowmem allocations are now failed in oom conditions so that the task can > > > perhaps recover or try again later. Killing current is an unnecessary > > > result for simply making a GFP_DMA or GFP_DMA32 page allocation and no > > > lowmem allocations use the now-deprecated __GFP_NOFAIL bit so retrying is > > > unnecessary. > > > > > > Previously, the heuristic provided some protection for those tasks with > > > CAP_SYS_RAWIO, but this is no longer necessary since we will not be > > > killing tasks for the purposes of ISA allocations. > > > > > > high_zoneidx is gfp_zone(gfp_flags), meaning that ZONE_NORMAL will be the > > > default for all allocations that are not __GFP_DMA, __GFP_DMA32, > > > __GFP_HIGHMEM, and __GFP_MOVABLE on kernels configured to support those > > > flags. Testing for high_zoneidx being less than ZONE_NORMAL will only > > > return true for allocations that have either __GFP_DMA or __GFP_DMA32. > > > > > > Acked-by: Rik van Riel <riel(a)redhat.com> > > > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> > > > Signed-off-by: David Rientjes <rientjes(a)google.com> > > > --- > > > mm/page_alloc.c | 3 +++ > > > 1 files changed, 3 insertions(+), 0 deletions(-) > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1914,6 +1914,9 @@ rebalance: > > > * running out of options and have to consider going OOM > > > */ > > > if (!did_some_progress) { > > > + /* The oom killer won't necessarily free lowmem */ > > > + if (high_zoneidx < ZONE_NORMAL) > > > + goto nopage; > > > if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) { > > > if (oom_killer_disabled) > > > goto nopage; > > > > WARN_ON((high_zoneidx < ZONE_NORMAL) && (gfp_mask & __GFP_NOFAIL)) > > plz. > > > > As I already explained when you first brought this up, the possibility of > not invoking the oom killer is not unique to GFP_DMA, it is also possible > for GFP_NOFS. Since __GFP_NOFAIL is deprecated and there are no current > users of GFP_DMA | __GFP_NOFAIL, that warning is completely unnecessary. > We're not adding any additional __GFP_NOFAIL allocations. Completely agree with this request. Actually, I think even better you should just add && !(gfp_mask & __GFP_NOFAIL). Deprecated doesn't mean it is OK to break the API (callers *will* oops or corrupt memory if __GFP_NOFAIL returns NULL). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 16 Feb 2010 03:00 On Mon, Feb 15, 2010 at 11:41:49PM -0800, David Rientjes wrote: > On Tue, 16 Feb 2010, Nick Piggin wrote: > > > > As I already explained when you first brought this up, the possibility of > > > not invoking the oom killer is not unique to GFP_DMA, it is also possible > > > for GFP_NOFS. Since __GFP_NOFAIL is deprecated and there are no current > > > users of GFP_DMA | __GFP_NOFAIL, that warning is completely unnecessary. > > > We're not adding any additional __GFP_NOFAIL allocations. > > > > Completely agree with this request. Actually, I think even better you > > should just add && !(gfp_mask & __GFP_NOFAIL). Deprecated doesn't mean > > it is OK to break the API (callers *will* oops or corrupt memory if > > __GFP_NOFAIL returns NULL). > > > > ... unless it's used with GFP_ATOMIC, which we've always returned NULL > for when even ALLOC_HARDER can't find pages, right? Ye, it's never worked with GFP_ATOMIC. > I'm wondering where this strong argument in favor of continuing to support > __GFP_NOFAIL was when I insisted we call the oom killer for them even for > allocations over PAGE_ALLOC_COSTLY_ORDER when __alloc_pages_nodemask() was > refactored back in 2.6.31. The argument was that nobody is allocating > that high of orders of __GFP_NOFAIL pages so we don't need to free memory > for them and that's where the deprecation of the modifier happened in the > first place. Ultimately, we did invoke the oom killer for those > allocations because there's no chance of forward progress otherwise and, > unlike __GFP_DMA, GFP_KERNEL | __GFP_NOFAIL actually is popular. I don't know. IMO we should never just randomly weaken or break such flag as the page allocator API. > > I'll add this check to __alloc_pages_may_oom() for the !(gfp_mask & > __GFP_NOFAIL) path since we're all content with endlessly looping. Thanks. Yes endlessly looping is far preferable to randomly oopsing or corrupting memory. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on 16 Feb 2010 19:00 On Tue, 16 Feb 2010 00:25:22 -0800 (PST) David Rientjes <rientjes(a)google.com> wrote: > On Tue, 16 Feb 2010, Nick Piggin wrote: > > > > I'll add this check to __alloc_pages_may_oom() for the !(gfp_mask & > > > __GFP_NOFAIL) path since we're all content with endlessly looping. > > > > Thanks. Yes endlessly looping is far preferable to randomly oopsing > > or corrupting memory. > > > > Here's the new patch for your consideration. > Then, can we take kdump in this endlessly looping situaton ? panic_on_oom=always + kdump can do that. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on 16 Feb 2010 19:10 On Tue, 16 Feb 2010 16:03:23 -0800 (PST) David Rientjes <rientjes(a)google.com> wrote: > On Wed, 17 Feb 2010, KAMEZAWA Hiroyuki wrote: > > > > > > I'll add this check to __alloc_pages_may_oom() for the !(gfp_mask & > > > > > __GFP_NOFAIL) path since we're all content with endlessly looping. > > > > > > > > Thanks. Yes endlessly looping is far preferable to randomly oopsing > > > > or corrupting memory. > > > > > > > > > > Here's the new patch for your consideration. > > > > > > > Then, can we take kdump in this endlessly looping situaton ? > > > > panic_on_oom=always + kdump can do that. > > > > The endless loop is only helpful if something is going to free memory > external to the current page allocation: either another task with > __GFP_WAIT | __GFP_FS that invokes the oom killer, a task that frees > memory, or a task that exits. > > The most notable endless loop in the page allocator is the one when a task > has been oom killed, gets access to memory reserves, and then cannot find > a page for a __GFP_NOFAIL allocation: > > do { > page = get_page_from_freelist(gfp_mask, nodemask, order, > zonelist, high_zoneidx, ALLOC_NO_WATERMARKS, > preferred_zone, migratetype); > > if (!page && gfp_mask & __GFP_NOFAIL) > congestion_wait(BLK_RW_ASYNC, HZ/50); > } while (!page && (gfp_mask & __GFP_NOFAIL)); > > We don't expect any such allocations to happen during the exit path, but > we could probably find some in the fs layer. > > I don't want to check sysctl_panic_on_oom in the page allocator because it > would start panicking the machine unnecessarily for the integrity > metadata GFP_NOIO | __GFP_NOFAIL allocation, for any > order > PAGE_ALLOC_COSTLY_ORDER, or for users who can't lock the zonelist > for oom kill that wouldn't have panicked before. > Then, why don't you check higzone_idx in oom_kill.c Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: X25: Fix x25_create errors for bad protocol and ENOBUFS Next: Remove unused macro, VM_MIN_READAHEAD. |