OOM killer, page fault [Kernel]

Prev: tracing: Fix to use unused attribute
Next: [PATCH 2/2] pci: pciehp update the slot bridge res to get big range for pcie devices - v8

From: Norbert Preining on 30 Oct 2009 02:40

Dear all,

(please Cc)

With 2.6.32-rc5 I got that one:
[13832.210068] Xorg invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0
[13832.210073] Pid: 11220, comm: Xorg Not tainted 2.6.32-rc5 #2
[13832.210075] Call Trace:
[13832.210081] [<ffffffff8134120a>] ? _spin_unlock+0x23/0x2f
[13832.210085] [<ffffffff8107cf46>] ? oom_kill_process+0x78/0x236
[13832.210088] [<ffffffff8107d5ba>] ? __out_of_memory+0x12f/0x146
[13832.210091] [<ffffffff8107d6be>] ? pagefault_out_of_memory+0x54/0x82
[13832.210094] [<ffffffff81341177>] ? _spin_unlock_irqrestore+0x25/0x31
[13832.210098] [<ffffffff8102644d>] ? mm_fault_error+0x39/0xe6
[13832.210101] [<ffffffff810af3ea>] ? do_vfs_ioctl+0x443/0x47b
[13832.210103] [<ffffffff81026759>] ? do_page_fault+0x25f/0x27b
[13832.210106] [<ffffffff8134161f>] ? page_fault+0x1f/0x30
[13832.210108] Mem-Info:
[13832.210109] DMA per-cpu:
[13832.210111] CPU 0: hi: 0, btch: 1 usd: 0
[13832.210113] CPU 1: hi: 0, btch: 1 usd: 0
[13832.210114] DMA32 per-cpu:
[13832.210116] CPU 0: hi: 186, btch: 31 usd: 165
[13832.210117] CPU 1: hi: 186, btch: 31 usd: 177
[13832.210119] Normal per-cpu:
[13832.210120] CPU 0: hi: 186, btch: 31 usd: 143
[13832.210122] CPU 1: hi: 186, btch: 31 usd: 159
[13832.210128] active_anon:465239 inactive_anon:178856 isolated_anon:96
[13832.210129] active_file:120044 inactive_file:120889 isolated_file:34
[13832.210130] unevictable:32076 dirty:136955 writeback:1178 unstable:0 buffer:32965
[13832.210131] free:6932 slab_reclaimable:23740 slab_unreclaimable:11776
[13832.210132] mapped:41869 shmem:127673 pagetables:7320 bounce:0
[13832.210138] DMA free:15784kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:24kB inactive_file:132kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15364kB mlocked:0kB dirty:60kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:16kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[13832.210142] lowmem_reserve[]: 0 2931 3941 3941
[13832.210150] DMA32 free:9928kB min:5960kB low:7448kB high:8940kB active_anon:1527548kB inactive_anon:382016kB active_file:345724kB inactive_file:348528kB unevictable:127864kB isolated(anon):256kB isolated(file):0kB present:3001852kB mlocked:127864kB dirty:389520kB writeback:3192kB mapped:119544kB shmem:301556kB slab_reclaimable:62476kB slab_unreclaimable:22472kB kernel_stack:320kB pagetables:6692kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:448 all_unreclaimable? no
[13832.210155] lowmem_reserve[]: 0 0 1010 1010
[13832.210161] Normal free:2016kB min:2052kB low:2564kB high:3076kB active_anon:333408kB inactive_anon:333408kB active_file:134428kB inactive_file:134896kB unevictable:440kB isolated(anon):128kB isolated(file):136kB present:1034240kB mlocked:440kB dirty:158240kB writeback:1520kB mapped:47932kB shmem:209136kB slab_reclaimable:32468kB slab_unreclaimable:24632kB kernel_stack:2072kB pagetables:22588kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:192 all_unreclaimable? no
[13832.210166] lowmem_reserve[]: 0 0 0 0
[13832.210169] DMA: 2*4kB 2*8kB 1*16kB 2*32kB 1*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15784kB
[13832.210177] DMA32: 624*4kB 1*8kB 11*16kB 6*32kB 1*64kB 8*128kB 5*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 9848kB
[13832.210184] Normal: 504*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2016kB
[13832.210191] 374966 total pagecache pages
[13832.210192] 6328 pages in swap cache
[13832.210194] Swap cache stats: add 147686, delete 141358, find 119392/120966
[13832.210195] Free swap = 8661548kB
[13832.210197] Total swap = 8851804kB
[13832.225488] 1048576 pages RAM
[13832.225491] 73094 pages reserved
[13832.225492] 695291 pages shared
[13832.225493] 352255 pages non-shared
[13832.225496] Out of memory: kill process 11292 (gnome-session) score 500953 or a child
[13832.225498] Killed process 11569 (xscreensaver)

After that I managed to get my system runing normally on, restarting X,
all runs since then quite fine.

Is that something I should be nervous about?

Thanks a lot and all the best

Norbert

-------------------------------------------------------------------------------
Dr. Norbert Preining Associate Professor
JAIST Japan Advanced Institute of Science and Technology preining(a)jaist.ac.jp
Vienna University of Technology preining(a)logic.at
Debian Developer (Debian TeX Task Force) preining(a)debian.org
gpg DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
-------------------------------------------------------------------------------
ARTHUR What is an Algolian Zylatburger anyway?
FORD They're a kind of meatburger made from the most unpleasant parts
of a creature well known for its total lack of any pleasant
parts.
ARTHUR So you mean that the Universe does actually end not with a bang
but with a Wimpy?
--- Cut dialogue from Fit the Fifth.
--- Douglas Adams, The Hitchhikers Guide to the Galaxy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KOSAKI Motohiro on 1 Nov 2009 23:30

Hi,

(Cc to linux-mm)

Wow, this is very strange log.

> Dear all,
>
> (please Cc)
>
> With 2.6.32-rc5 I got that one:
> [13832.210068] Xorg invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0

order = 0

> [13832.210073] Pid: 11220, comm: Xorg Not tainted 2.6.32-rc5 #2
> [13832.210075] Call Trace:
> [13832.210081] [<ffffffff8134120a>] ? _spin_unlock+0x23/0x2f
> [13832.210085] [<ffffffff8107cf46>] ? oom_kill_process+0x78/0x236
> [13832.210088] [<ffffffff8107d5ba>] ? __out_of_memory+0x12f/0x146
> [13832.210091] [<ffffffff8107d6be>] ? pagefault_out_of_memory+0x54/0x82
> [13832.210094] [<ffffffff81341177>] ? _spin_unlock_irqrestore+0x25/0x31
> [13832.210098] [<ffffffff8102644d>] ? mm_fault_error+0x39/0xe6
> [13832.210101] [<ffffffff810af3ea>] ? do_vfs_ioctl+0x443/0x47b
> [13832.210103] [<ffffffff81026759>] ? do_page_fault+0x25f/0x27b
> [13832.210106] [<ffffffff8134161f>] ? page_fault+0x1f/0x30
> [13832.210108] Mem-Info:
> [13832.210109] DMA per-cpu:
> [13832.210111] CPU 0: hi: 0, btch: 1 usd: 0
> [13832.210113] CPU 1: hi: 0, btch: 1 usd: 0
> [13832.210114] DMA32 per-cpu:
> [13832.210116] CPU 0: hi: 186, btch: 31 usd: 165
> [13832.210117] CPU 1: hi: 186, btch: 31 usd: 177
> [13832.210119] Normal per-cpu:
> [13832.210120] CPU 0: hi: 186, btch: 31 usd: 143
> [13832.210122] CPU 1: hi: 186, btch: 31 usd: 159
> [13832.210128] active_anon:465239 inactive_anon:178856 isolated_anon:96
> [13832.210129] active_file:120044 inactive_file:120889 isolated_file:34

but the system has plenty droppable cache.

Umm, Is this reproducable?
Typically such strange log was caused by corruptted ram. can you please
check your memory correctness?

> [13832.210130] unevictable:32076 dirty:136955 writeback:1178 unstable:0 buffer:32965
> [13832.210131] free:6932 slab_reclaimable:23740 slab_unreclaimable:11776
> [13832.210132] mapped:41869 shmem:127673 pagetables:7320 bounce:0
> [13832.210138] DMA free:15784kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:24kB inactive_file:132kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15364kB mlocked:0kB dirty:60kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:16kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [13832.210142] lowmem_reserve[]: 0 2931 3941 3941
> [13832.210150] DMA32 free:9928kB min:5960kB low:7448kB high:8940kB active_anon:1527548kB inactive_anon:382016kB active_file:345724kB inactive_file:348528kB unevictable:127864kB isolated(anon):256kB isolated(file):0kB present:3001852kB mlocked:127864kB dirty:389520kB writeback:3192kB mapped:119544kB shmem:301556kB slab_reclaimable:62476kB slab_unreclaimable:22472kB kernel_stack:320kB pagetables:6692kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:448 all_unreclaimable? no
> [13832.210155] lowmem_reserve[]: 0 0 1010 1010
> [13832.210161] Normal free:2016kB min:2052kB low:2564kB high:3076kB active_anon:333408kB inactive_anon:333408kB active_file:134428kB inactive_file:134896kB unevictable:440kB isolated(anon):128kB isolated(file):136kB present:1034240kB mlocked:440kB dirty:158240kB writeback:1520kB mapped:47932kB shmem:209136kB slab_reclaimable:32468kB slab_unreclaimable:24632kB kernel_stack:2072kB pagetables:22588kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:192 all_unreclaimable? no
> [13832.210166] lowmem_reserve[]: 0 0 0 0
> [13832.210169] DMA: 2*4kB 2*8kB 1*16kB 2*32kB 1*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15784kB
> [13832.210177] DMA32: 624*4kB 1*8kB 11*16kB 6*32kB 1*64kB 8*128kB 5*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 9848kB
> [13832.210184] Normal: 504*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2016kB
> [13832.210191] 374966 total pagecache pages
> [13832.210192] 6328 pages in swap cache
> [13832.210194] Swap cache stats: add 147686, delete 141358, find 119392/120966
> [13832.210195] Free swap = 8661548kB
> [13832.210197] Total swap = 8851804kB
> [13832.225488] 1048576 pages RAM
> [13832.225491] 73094 pages reserved
> [13832.225492] 695291 pages shared
> [13832.225493] 352255 pages non-shared
> [13832.225496] Out of memory: kill process 11292 (gnome-session) score 500953 or a child
> [13832.225498] Killed process 11569 (xscreensaver)
>
>
> After that I managed to get my system runing normally on, restarting X,
> all runs since then quite fine.
>
> Is that something I should be nervous about?

This obviously indicate kernel-bug or hw-corrupt. I'm not sure which happen ;)

> Thanks a lot and all the best
>
> Norbert

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Minchan Kim on 2 Nov 2009 00:00

On Mon, 2 Nov 2009 13:24:06 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> wrote:

> Hi,
>
> (Cc to linux-mm)
>
> Wow, this is very strange log.
>
> > Dear all,
> >
> > (please Cc)
> >
> > With 2.6.32-rc5 I got that one:
> > [13832.210068] Xorg invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0
>
> order = 0

I think this problem results from 'gfp_mask = 0x0'.
Is it possible?

If it isn't H/W problem, Who passes gfp_mask with 0x0?
It's culpit.

Could you add BUG_ON(gfp_mask == 0x0) in __alloc_pages_nodemask's head?

---

/*
* This is the 'heart' of the zoned buddy allocator.
*/
struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, nodemask_t *nodemask)
{
enum zone_type high_zoneidx = gfp_zone(gfp_mask);
struct zone *preferred_zone;
struct page *page;
int migratetype = allocflags_to_migratetype(gfp_mask);

+ BUG_ON(gfp_mask == 0x0);
gfp_mask &= gfp_allowed_mask;

lockdep_trace_alloc(gfp_mask);

might_sleep_if(gfp_mask & __GFP_WAIT);

if (should_fail_alloc_page(gfp_mask, order))
return NULL;

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KAMEZAWA Hiroyuki on 2 Nov 2009 00:10

On Mon, 2 Nov 2009 13:56:40 +0900
Minchan Kim <minchan.kim(a)gmail.com> wrote:

> On Mon, 2 Nov 2009 13:24:06 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> wrote:
>
> > Hi,
> >
> > (Cc to linux-mm)
> >
> > Wow, this is very strange log.
> >
> > > Dear all,
> > >
> > > (please Cc)
> > >
> > > With 2.6.32-rc5 I got that one:
> > > [13832.210068] Xorg invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0
> >
> > order = 0
>
> I think this problem results from 'gfp_mask = 0x0'.
> Is it possible?
>
> If it isn't H/W problem, Who passes gfp_mask with 0x0?
> It's culpit.
>
> Could you add BUG_ON(gfp_mask == 0x0) in __alloc_pages_nodemask's head?
>

Maybe some code returns VM_FAULT_OOM by mistake and pagefault_oom_killer()
is called. digging mm/memory.c is necessary...

I wonder why...now is this code
===
static int do_nonlinear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pte_t *page_table, pmd_t *pmd,
unsigned int flags, pte_t orig_pte)
{
pgoff_t pgoff;

flags |= FAULT_FLAG_NONLINEAR;

if (!pte_unmap_same(mm, pmd, page_table, orig_pte))
return 0;

if (unlikely(!(vma->vm_flags & VM_NONLINEAR))) {
/*
* Page table corrupted: show pte and kill process.
*/
print_bad_pte(vma, address, orig_pte, NULL);
return VM_FAULT_OOM;
}

pgoff = pte_to_pgoff(orig_pte);
return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
}
==
Then, OOM...is this really OOM ?

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Minchan Kim on 2 Nov 2009 03:40

On Mon, 2 Nov 2009 14:02:16 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:

> On Mon, 2 Nov 2009 13:56:40 +0900
> Minchan Kim <minchan.kim(a)gmail.com> wrote:
>
> > On Mon, 2 Nov 2009 13:24:06 +0900 (JST)
> > KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> wrote:
> >
> > > Hi,
> > >
> > > (Cc to linux-mm)
> > >
> > > Wow, this is very strange log.
> > >
> > > > Dear all,
> > > >
> > > > (please Cc)
> > > >
> > > > With 2.6.32-rc5 I got that one:
> > > > [13832.210068] Xorg invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0
> > >
> > > order = 0
> >
> > I think this problem results from 'gfp_mask = 0x0'.
> > Is it possible?
> >
> > If it isn't H/W problem, Who passes gfp_mask with 0x0?
> > It's culpit.
> >
> > Could you add BUG_ON(gfp_mask == 0x0) in __alloc_pages_nodemask's head?
> >
>
> Maybe some code returns VM_FAULT_OOM by mistake and pagefault_oom_killer()
> is called. digging mm/memory.c is necessary...
>
> I wonder why...now is this code
> ===
> static int do_nonlinear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> unsigned long address, pte_t *page_table, pmd_t *pmd,
> unsigned int flags, pte_t orig_pte)
> {
> pgoff_t pgoff;
>
> flags |= FAULT_FLAG_NONLINEAR;
>
> if (!pte_unmap_same(mm, pmd, page_table, orig_pte))
> return 0;
>
> if (unlikely(!(vma->vm_flags & VM_NONLINEAR))) {
> /*
> * Page table corrupted: show pte and kill process.
> */
> print_bad_pte(vma, address, orig_pte, NULL);
> return VM_FAULT_OOM;
> }
>
> pgoff = pte_to_pgoff(orig_pte);
> return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
> }
> ==
> Then, OOM...is this really OOM ?

It seems that the goal is to kill process by OOM trick as comment said.

I found It results from Hugh's commit 65500d234e74fc4e8f18e1a429bc24e51e75de4a.
I think it's not a real OOM.

BTW, If it is culpit in this case, print_bad_pte should have remained any log. :)

>
> Thanks,
> -Kame
>

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3 4 5
Prev: tracing: Fix to use unused attribute
Next: [PATCH 2/2] pci: pciehp update the slot bridge res to get big range for pcie devices - v8