Prev: [PATCH -v6 00/13] ftrace for MIPS
Next: [Bug #14372] ath5k wireless not working after suspend-resume - eeepc
From: Vedran Furač on 29 Oct 2009 07:20 David Rientjes wrote: > Right, because in Vedran's latest oom log it shows that Xorg is preferred > more than any other thread other than the memory hogging test program with > your patch than without. I pointed out a clear distinction in the killing > order using both total_vm and rss in that log and in my opinion killing > Xorg as opposed to krunner would be undesireable. But then you should rename OOM killer to TRIPK: Totally Random Innocent Process Killer If you have OOM situation and Xorg is the first, that means it's leaking memory badly and the system is probably already frozen/FUBAR. Killing krunner in that situation wouldn't do any good. From a user perspective, nothing changes, system is still FUBAR and (s)he would probably reboot cursing linux in the process. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on 29 Oct 2009 15:50 On Thu, 29 Oct 2009, Vedran Furac wrote: > [ 1493.064458] Out of memory: kill process 6304 (kdeinit4) score 1190231 > or a child > [ 1493.064467] Killed process 6409 (konqueror) > [ 1493.261149] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1493.261166] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1493.276528] Out of memory: kill process 6304 (kdeinit4) score 1161265 > or a child > [ 1493.276538] Killed process 6411 (krusader) > [ 1499.221160] akregator invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1499.221178] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1499.236431] Out of memory: kill process 6304 (kdeinit4) score 1067593 > or a child > [ 1499.236441] Killed process 6412 (irexec) > [ 1499.370192] firefox-bin invoked oom-killer: gfp_mask=0x201da, > order=0, oomkilladj=0 > [ 1499.370209] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1499.385417] Out of memory: kill process 6304 (kdeinit4) score 1066861 > or a child > [ 1499.385427] Killed process 6420 (xchm) > [ 1499.458304] kio_file invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1499.458333] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1499.458367] [<ffffffff81120900>] ? d_kill+0x5c/0x7c > [ 1499.473573] Out of memory: kill process 6304 (kdeinit4) score 1043690 > or a child > [ 1499.473582] Killed process 6425 (kio_file) > [ 1500.250746] korgac invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1500.250765] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1500.266186] Out of memory: kill process 6304 (kdeinit4) score 1020350 > or a child > [ 1500.266196] Killed process 6464 (icedove) > [ 1500.349355] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1500.349371] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1500.364689] Out of memory: kill process 6304 (kdeinit4) score 1019864 > or a child > [ 1500.364699] Killed process 6477 (kio_http) > [ 1500.452151] kded4 invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1500.452167] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1500.452196] [<ffffffff81120900>] ? d_kill+0x5c/0x7c > [ 1500.467307] Out of memory: kill process 6304 (kdeinit4) score 993142 > or a child > [ 1500.467316] Killed process 6478 (kio_http) > [ 1500.780222] akregator invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1500.780239] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1500.796280] Out of memory: kill process 6304 (kdeinit4) score 966331 > or a child > [ 1500.796290] Killed process 6484 (kio_http) > [ 1501.065374] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1501.065390] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1501.080579] Out of memory: kill process 6304 (kdeinit4) score 939434 > or a child > [ 1501.080587] Killed process 6486 (kio_http) > [ 1501.381188] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1501.381204] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1501.396338] Out of memory: kill process 6304 (kdeinit4) score 912691 > or a child > [ 1501.396346] Killed process 6487 (firefox-bin) > [ 1502.661294] icedove-bin invoked oom-killer: gfp_mask=0x201da, > order=0, oomkilladj=0 > [ 1502.661311] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1502.676563] Out of memory: kill process 7580 (test) score 708945 or a > child > [ 1502.676575] Killed process 7580 (test) > Ok, so this is the forkbomb problem by adding half of each child's total_vm into the badness score of the parent. We should address this completely seperately by addressing that specific part of the heuristic, not changing what we consider to be a baseline. The rationale is quite simple: we'll still experience the same problem with rss as we did with total_vm in the forkbomb scenario above on certain workloads (maybe not yours, but others). The oom killer always kills a child first if it has a different mm than the selected parent, so the amount of memory freeing as a result of that is entirely dependent on the order of the child list. It may be very little, but killed because its siblings had large total_vm values. So instead of focusing on rss, we simply need to find a better heuristic for the forkbomb issue which I've already proposed a very trivial solution for. Then, afterwards, we can debate about how the scoring heuristic can be changed to select better tasks (and perhaps remove a lot of the clutter that's there currently!). > > Can you explain why Xorg is preferred as a baseline to kill rather than > > krunner in your example? > > Krunner is a small app for running other apps and do similar things. It > shouldn't use a lot of memory. OTOH, Xorg has to hold all the pixmaps > and so on. That was expected result. Fist Xorg, then firefox and > thunderbird. > You're making all these claims and assertions based _solely_ on the theory that killing the application with the most resident RAM is always the optimal solution. That's just not true, especially if we're just allocating small numbers of order-0 memory. Much better is to allow the user to decide at what point, regardless of swap usage, their application is using much more memory than expected or required. They can do that right now pretty well with /proc/pid/oom_adj without this outlandish claim that they should be expected to know the rss of their applications at the time of oom to effectively tune oom_adj. What would you suggest? A script that sits in a loop checking each task's current rss from /proc/pid/stat or their current oom priority though /proc/pid/oom_score and adjusting oom_adj preemptively just in case the oom killer is invoked in the next second? And that "small app" has 30MB of rss which could be freed, if killed, and utilized for subsequent page allocations. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on 29 Oct 2009 16:00 On Thu, 29 Oct 2009, Vedran Furac wrote: > But then you should rename OOM killer to TRIPK: > Totally Random Innocent Process Killer > The randomness here is the order of the child list when the oom killer selects a task, based on the badness score, and then tries to kill a child with a different mm before the parent. The problem you identified in http://pastebin.com/f3f9674a0, however, is a forkbomb issue where the badness score should never have been so high for kdeinit4 compared to "test". That's directly proportional to adding the scores of all disjoint child total_vm values into the badness score for the parent and then killing the children instead. That's the problem, not using total_vm as a baseline. Replacing that with rss is not going to solve the issue and reducing the user's ability to specify a rough oom priority from userspace is simply not an option. > If you have OOM situation and Xorg is the first, that means it's leaking > memory badly and the system is probably already frozen/FUBAR. Killing > krunner in that situation wouldn't do any good. From a user perspective, > nothing changes, system is still FUBAR and (s)he would probably reboot > cursing linux in the process. > It depends on what you're running, we need to be able to have the option of protecting very large tasks on production servers. Imagine if "test" here is actually a critical application that we need to protect, its not solely mlocked anonymous memory, but still kill if it is leaking memory beyond your approximate 2.5GB. How do you do that when using rss as the baseline? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on 29 Oct 2009 20:00 On Thu, 29 Oct 2009 12:53:42 -0700 (PDT) David Rientjes <rientjes(a)google.com> wrote: > > If you have OOM situation and Xorg is the first, that means it's leaking > > memory badly and the system is probably already frozen/FUBAR. Killing > > krunner in that situation wouldn't do any good. From a user perspective, > > nothing changes, system is still FUBAR and (s)he would probably reboot > > cursing linux in the process. > > > > It depends on what you're running, we need to be able to have the option > of protecting very large tasks on production servers. Imagine if "test" > here is actually a critical application that we need to protect, its > not solely mlocked anonymous memory, but still kill if it is leaking > memory beyond your approximate 2.5GB. How do you do that when using rss > as the baseline? As I wrote repeatedly, - OOM-Killer itselfs is bad thing, bad situation. - The kernel can't know the program is bad or not. just guess it. - Then, there is no "correct" OOM-Killer other than fork-bomb killer. - User has a knob as oom_adj. This is very strong. Then, there is only "reasonable" or "easy-to-understand" OOM-Kill. "Current biggest memory eater is killed" sounds reasonable, easy to understand. And if total_vm works well, overcommit_guess should catch it. Please improve overcommit_guess if you want to stay on total_vm. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on 30 Oct 2009 05:20
On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote: > As I wrote repeatedly, > > - OOM-Killer itselfs is bad thing, bad situation. Not necessarily, the memory controller and cpusets uses it quite often to enforce it's policy and is standard runtime behavior. We'd like to imagine that our cpuset will never be too small to run all the attached jobs, but that happens and we can easily recover from it by killing a task. > - The kernel can't know the program is bad or not. just guess it. Totally irrelevant, given your fourth point about /proc/pid/oom_adj. We can tell the kernel what we'd like the oom killer behavior should be if the situation arises. > - Then, there is no "correct" OOM-Killer other than fork-bomb killer. Well of course there is, you're seeing this is a WAY too simplistic manner. If we are oom, we want to be able to influence how the oom killer behaves and respond to that situation. You are proposing that we change the baseline for how the oom killer selects tasks which we use CONSTANTLY as part of our normal production environment. I'd appreciate it if you'd take it a little more seriously. > - User has a knob as oom_adj. This is very strong. > Agreed. > Then, there is only "reasonable" or "easy-to-understand" OOM-Kill. > "Current biggest memory eater is killed" sounds reasonable, easy to > understand. And if total_vm works well, overcommit_guess should catch it. > Please improve overcommit_guess if you want to stay on total_vm. > I don't necessarily want to stay on total_vm, but I also don't want to move to rss as a baseline, as you would probably agree. We disagree about a very fundamental principle: you are coming from a perspective of always wanting to kill the biggest resident memory eater even for a single order-0 allocation that fails and I'm coming from a perspective of wanting to ensure that our machines know how the oom killer will react when it is used. Moving to rss reduces the ability of the user to specify an expected oom priority other than polarizing it by either disabling it completely with an oom_adj value of -17 or choosing the definite next victim with +15. That's my objection to it: the user cannot possibly be expected to predict what proportion of each application's memory will be resident at the time of oom. I understand you want to totally rewrite the oom killer for whatever reason, but I think you need to spend a lot more time understanding the needs that the Linux community has for its behavior instead of insisting on your point of view. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |