Prev: [PATCH 0/3] Bunch of fixes related to custom atoi() implementation
Next: MAINTAINERS: Document new "Q:" patchwork queue type
From: KOSAKI Motohiro on 26 Jan 2010 00:20 (cc to lots related person) > On Mon, 25 Jan 2010 02:48:08 +0100, KOSAKI Motohiro > <kosaki.motohiro(a)jp.fujitsu.com> wrote: > > >> Hi, > >> > >> since kernel 2.6.32.2 (also tried 2.6.32.3) I get a lot of oom-killer > >> kills when I do hard disk intensive tasks (mainly in VirtualBox which is > >> running Windows XP) and IMHO it kills processes even if I have a lot of > >> free memory. > >> > >> Is this a known bug? I have self compiled kernel so I can try patches. > > > > Can you please post your .config? Hi all, Strangely, all reproduce machine are x86_64 with Intel i915. but I don't have any solid evidence. Can anyone please apply following debug patch and reproduce this issue? this patch write some debug message into /var/log/messages. Thanks. --- mm/memory.c | 45 +++++++++++++++++++++++++++++++++++++-------- 1 files changed, 37 insertions(+), 8 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 09e4b1b..5c9ebd8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2128,17 +2128,23 @@ reuse: gotten: pte_unmap_unlock(page_table, ptl); - if (unlikely(anon_vma_prepare(vma))) + if (unlikely(anon_vma_prepare(vma))) { + printk(KERN_ERR "OOM at %s:%d\n", __FILE__, __LINE__); goto oom; + } if (is_zero_pfn(pte_pfn(orig_pte))) { new_page = alloc_zeroed_user_highpage_movable(vma, address); - if (!new_page) + if (!new_page) { + printk(KERN_ERR "OOM at %s:%d\n", __FILE__, __LINE__); goto oom; + } } else { new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address); - if (!new_page) + if (!new_page) { + printk(KERN_ERR "OOM at %s:%d\n", __FILE__, __LINE__); goto oom; + } cow_user_page(new_page, old_page, address, vma); } __SetPageUptodate(new_page); @@ -2153,8 +2159,10 @@ gotten: unlock_page(old_page); } - if (mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL)) + if (mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL)) { + printk(KERN_ERR "OOM at %s:%d\n", __FILE__, __LINE__); goto oom_free_new; + } /* * Re-check the pte - we dropped the lock @@ -2272,6 +2280,10 @@ oom: unwritable_page: page_cache_release(old_page); + + if (ret & VM_FAULT_OOM) + printk(KERN_ERR "do_wp ->page_mkwrite OOM %pf %x\n", vma->vm_ops->page_mkwrite, ret); + return ret; } @@ -2670,15 +2682,21 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, /* Allocate our own private page. */ pte_unmap(page_table); - if (unlikely(anon_vma_prepare(vma))) + if (unlikely(anon_vma_prepare(vma))) { + printk(KERN_ERR "OOM at %s:%d\n", __FILE__, __LINE__); goto oom; + } page = alloc_zeroed_user_highpage_movable(vma, address); - if (!page) + if (!page) { + printk(KERN_ERR "OOM at %s:%d\n", __FILE__, __LINE__); goto oom; + } __SetPageUptodate(page); - if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) + if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) { + printk(KERN_ERR "OOM at %s:%d\n", __FILE__, __LINE__); goto oom_free_page; + } entry = mk_pte(page, vma->vm_page_prot); if (vma->vm_flags & VM_WRITE) @@ -2742,8 +2760,12 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma, vmf.page = NULL; ret = vma->vm_ops->fault(vma, &vmf); - if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) + if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) { + if (ret & VM_FAULT_OOM) + printk(KERN_ERR "->fault OOM %pf %x %x\n", vma->vm_ops->fault, ret, flags); + return ret; + } if (unlikely(PageHWPoison(vmf.page))) { if (ret & VM_FAULT_LOCKED) @@ -2768,16 +2790,19 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (!(vma->vm_flags & VM_SHARED)) { anon = 1; if (unlikely(anon_vma_prepare(vma))) { + printk(KERN_ERR "OOM at %s:%d\n", __FILE__, __LINE__); ret = VM_FAULT_OOM; goto out; } page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address); if (!page) { + printk(KERN_ERR "OOM at %s:%d\n", __FILE__, __LINE__); ret = VM_FAULT_OOM; goto out; } if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) { + printk(KERN_ERR "OOM at %s:%d\n", __FILE__, __LINE__); ret = VM_FAULT_OOM; page_cache_release(page); goto out; @@ -2896,6 +2921,10 @@ out: unwritable_page: page_cache_release(page); + + if (ret & VM_FAULT_OOM) + printk(KERN_ERR "->page_mkwrite OOM %pf %x %x\n", vma->vm_ops->page_mkwrite, ret, flags); + return ret; } -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: A Rojas on 26 Jan 2010 03:00 KOSAKI Motohiro wrote: > > Hi all, > > Strangely, all reproduce machine are x86_64 with Intel i915. but I don't > have any solid evidence. Hi, I have the same issue in i686 (intel i915 too) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Roman Jarosz on 26 Jan 2010 04:10 On Tue, 26 Jan 2010 06:19:23 +0100, KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> wrote: > (cc to lots related person) > >> On Mon, 25 Jan 2010 02:48:08 +0100, KOSAKI Motohiro >> <kosaki.motohiro(a)jp.fujitsu.com> wrote: >> >> >> Hi, >> >> >> >> since kernel 2.6.32.2 (also tried 2.6.32.3) I get a lot of oom-killer >> >> kills when I do hard disk intensive tasks (mainly in VirtualBox >> which is >> >> running Windows XP) and IMHO it kills processes even if I have a lot >> of >> >> free memory. >> >> >> >> Is this a known bug? I have self compiled kernel so I can try >> patches. >> > >> > Can you please post your .config? > > Hi all, > > Strangely, all reproduce machine are x86_64 with Intel i915. but I don't > have any solid evidence. > Can anyone please apply following debug patch and reproduce this issue? > > this patch write some debug message into /var/log/messages. > Here it is Jan 26 09:34:32 kedge kernel: ->fault OOM shmem_fault 1 1 Jan 26 09:34:32 kedge kernel: X invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0 Jan 26 09:34:32 kedge kernel: Pid: 1927, comm: X Not tainted 2.6.33-rc5 #3 Jan 26 09:34:32 kedge kernel: Call Trace: Jan 26 09:34:32 kedge kernel: [<ffffffff8107feb4>] T.350+0x54/0x140 Jan 26 09:34:32 kedge kernel: [<ffffffff81080016>] T.349+0x76/0x140 Jan 26 09:34:32 kedge kernel: [<ffffffff81080200>] __out_of_memory+0x120/0x190 Jan 26 09:34:32 kedge kernel: [<ffffffff81080388>] pagefault_out_of_memory+0x48/0x90 Jan 26 09:34:32 kedge kernel: [<ffffffff81021eb0>] mm_fault_error+0x40/0xc0 Jan 26 09:34:33 kedge kernel: [<ffffffff810221f8>] do_page_fault+0x2c8/0x2d0 Jan 26 09:34:35 kedge kernel: [<ffffffff815719df>] page_fault+0x1f/0x30 Jan 26 09:34:35 kedge kernel: Mem-Info: Jan 26 09:34:35 kedge kernel: DMA per-cpu: Jan 26 09:34:35 kedge kernel: CPU 0: hi: 0, btch: 1 usd: 0 Jan 26 09:34:35 kedge kernel: CPU 1: hi: 0, btch: 1 usd: 0 Jan 26 09:34:35 kedge kernel: DMA32 per-cpu: Jan 26 09:34:35 kedge kernel: CPU 0: hi: 186, btch: 31 usd: 135 Jan 26 09:34:35 kedge kernel: CPU 1: hi: 186, btch: 31 usd: 119 Jan 26 09:34:35 kedge kernel: Normal per-cpu: Jan 26 09:34:35 kedge kernel: CPU 0: hi: 186, btch: 31 usd: 124 Jan 26 09:34:35 kedge kernel: CPU 1: hi: 186, btch: 31 usd: 0 Jan 26 09:34:35 kedge kernel: active_anon:299484 inactive_anon:124767 isolated_anon:32 Jan 26 09:34:35 kedge kernel: active_file:318642 inactive_file:192171 isolated_file:0 Jan 26 09:34:35 kedge kernel: unevictable:2 dirty:113272 writeback:19224 unstable:0 Jan 26 09:34:35 kedge kernel: free:6859 slab_reclaimable:19340 slab_unreclaimable:10057 Jan 26 09:34:35 kedge kernel: mapped:27902 shmem:50780 pagetables:4901 bounce:0 Jan 26 09:34:35 kedge kernel: DMA free:15776kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:160kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15332kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Jan 26 09:34:35 kedge kernel: lowmem_reserve[]: 0 2990 3937 3937 Jan 26 09:34:35 kedge kernel: DMA32 free:9792kB min:6084kB low:7604kB high:9124kB active_anon:936684kB inactive_anon:237096kB active_file:1032028kB inactive_file:636296kB unevictable:8kB isolated(anon):0kB isolated(file):0kB present:3062688kB mlocked:8kB dirty:381208kB writeback:65944kB mapped:68072kB shmem:70336kB slab_reclaimable:54692kB slab_unreclaimable:18084kB kernel_stack:128kB pagetables:2968kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:240 all_unreclaimable? no Jan 26 09:34:35 kedge kernel: lowmem_reserve[]: 0 0 946 946 Jan 26 09:34:35 kedge kernel: Normal free:1868kB min:1924kB low:2404kB high:2884kB active_anon:261252kB inactive_anon:261972kB active_file:242540kB inactive_file:132228kB unevictable:0kB isolated(anon):128kB isolated(file):0kB present:969600kB mlocked:0kB dirty:71880kB writeback:10952kB mapped:43536kB shmem:132784kB slab_reclaimable:22668kB slab_unreclaimable:22136kB kernel_stack:1984kB pagetables:16636kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:100 all_unreclaimable? no Jan 26 09:34:35 kedge kernel: lowmem_reserve[]: 0 0 0 0 Jan 26 09:34:35 kedge kernel: DMA: 0*4kB 2*8kB 1*16kB 2*32kB 1*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15776kB Jan 26 09:34:35 kedge kernel: DMA32: 158*4kB 7*8kB 10*16kB 6*32kB 1*64kB 0*128kB 4*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 9808kB Jan 26 09:34:35 kedge kernel: Normal: 467*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1868kB Jan 26 09:34:35 kedge kernel: 561725 total pagecache pages Jan 26 09:34:35 kedge kernel: 77 pages in swap cache Jan 26 09:34:35 kedge kernel: Swap cache stats: add 6320, delete 6243, find 0/0 Jan 26 09:34:35 kedge kernel: Free swap = 1982844kB Jan 26 09:34:35 kedge kernel: Total swap = 2008116kB Jan 26 09:34:35 kedge kernel: 1032192 pages RAM Jan 26 09:34:35 kedge kernel: 49925 pages reserved Jan 26 09:34:35 kedge kernel: 855983 pages shared Jan 26 09:34:35 kedge kernel: 172671 pages non-shared Jan 26 09:34:35 kedge kernel: Out of memory: kill process 1765 (hald) score 10056 or a child Jan 26 09:34:35 kedge kernel: Killed process 1766 (hald-runner) vsz:17860kB, anon-rss:24kB, file-rss:732kB -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on 26 Jan 2010 08:10 >> This comment is lie. __GFP_NORETY cause ENOMEM to shmem, not GEM itself. >> GEM can't handle nor recover it. I suspect following commit is wrong. > > Indeed, the NORETRY flag is required for the inode mapping routines to > return ENOMEM instead of triggering the OOM killer themselves. GEM has > code to handle the ENOMEM as returned from shmem (please at least read the > code before commenting, and comments are appreciated), by attempting to > free up some of its own inactive buffers before retrying the allocation > (with NORETRY removed, so the OOM killer will be invoked on the second > instance). The reason for this convoluted approach is that GEM's inactive > list shrinker requires the struct mutex and so cannot be run when GEM > itself is attempting and failing to allocate memory. We could recover from > more situations if we made some more invasive changes to our locking. > > This is without a doubt an area that needs improvement. Please consider to revert such commit at once. Lots people reported the same issue. I really hope to stop bug report storm. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Chris Wilson on 26 Jan 2010 08:20
On Tue, 26 Jan 2010 22:03:06 +0900, KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> wrote: > Please consider to revert such commit at once. Lots people reported > the same issue. > I really hope to stop bug report storm. Your CC did not reference the problem that you were discussing, nor that it is even easier to trigger an OOM without the shrinker. Memory exhaustion due to the excess usage of surfaces from userspace is not a new issuer. So what is the problem you have encountered and how does running the OOM killer earlier fix the issue of triggering the OOM killer? -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |