Prev: [PATCH -v6 00/13] ftrace for MIPS
Next: [Bug #14372] ath5k wireless not working after suspend-resume - eeepc
From: KAMEZAWA Hiroyuki on 3 Nov 2009 20:00 On Tue, 3 Nov 2009 12:49:52 -0800 (PST) David Rientjes <rientjes(a)google.com> wrote: > On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote: > > > > > - The kernel can't know the program is bad or not. just guess it. > > > > > > Totally irrelevant, given your fourth point about /proc/pid/oom_adj. We > > > can tell the kernel what we'd like the oom killer behavior should be if > > > the situation arises. > > > > > > > My point is that the server cannot distinguish memory leak from intentional > > memory usage. No other than that. > > > > That's a different point. Today, we can influence the badness score of > any user thread to prioritize oom killing from userspace and that can be > done regardless of whether there's a memory leaker, a fork bomber, etc. > The priority based oom killing is important to production scenarios and > cannot be replaced by a heuristic that works everytime if it cannot be > influenced by userspace. > I don't removed oom_adj... > A spike in memory consumption when a process is initially forked would be > defined as a memory leaker in your quiet_time model. > I'll rewrite or drop quiet_time. > > In this summer, at lunch with a daily linux user, I was said > > "you, enterprise guys, don't consider desktop or laptop problem at all." > > yes, I use only servers. My customer uses server, too. My first priority > > is always on server users. > > But, for this time, I wrote reply to Vedran and try to fix desktop problem. > > Even if current logic works well for servers, "KDE/GNOME is killed" problem > > seems to be serious. And this may be a problem for EMBEDED people, I guess. > > > > You argued before that the problem wasn't specific to X (after I said you > could protect it very trivially with /proc/pid/oom_adj set to > OOM_DISABLE), but that's now your reasoning for rewriting the oom killer > heuristics? > One of reasons. My cusotomers always suffers from "OOM-RANDOM-KILLER". Why I mentioned about "lunch" is for saying that "I'm not working _only_ for servers." ok ? > > I can say the same thing to total_vm size. total_vm size doesn't include any > > good information for oom situation. And tweaking based on that not-useful > > parameter will make things worse. > > > > Tweaking on the heuristic will probably make it more convoluted and > overall worse, I agree. But it's a more stable baseline than rss from > which we can set oom killing priorities from userspace. - "rss < total_vm_size" always. - oom_adj culculation is quite strong. - total_vm of processes which maps hugetlb is very big ....but killing them is no help for usual oom. I recommend you to add "stable baseline" knob for user space, as I wrote. My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough large. If users can estimate how their process uses memory, it will be good thing. I'll add some other than oom_adj (I don't say I'll drop oom_adj). Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on 3 Nov 2009 21:00 On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote: > > That's a different point. Today, we can influence the badness score of > > any user thread to prioritize oom killing from userspace and that can be > > done regardless of whether there's a memory leaker, a fork bomber, etc. > > The priority based oom killing is important to production scenarios and > > cannot be replaced by a heuristic that works everytime if it cannot be > > influenced by userspace. > > > I don't removed oom_adj... > Right, but we must ensure that we have the same ability to influence a priority based oom killing scheme from userspace as we currently do with a relatively static total_vm. total_vm may not be the optimal baseline, but it does allow users to tune oom_adj specifically to identify tasks that are using more memory than expected and to be static enough to not depend on rss, for example, that is really hard to predict at the time of oom. That's actually my main goal in this discussion: to avoid losing any ability of userspace to influence to priority of tasks being oom killed (if you haven't noticed :). > > Tweaking on the heuristic will probably make it more convoluted and > > overall worse, I agree. But it's a more stable baseline than rss from > > which we can set oom killing priorities from userspace. > > - "rss < total_vm_size" always. But rss is much more dynamic than total_vm, that's my point. > - oom_adj culculation is quite strong. > - total_vm of processes which maps hugetlb is very big ....but killing them > is no help for usual oom. > > I recommend you to add "stable baseline" knob for user space, as I wrote. > My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough > large. > There's no clear relationship between VM size and runtime. The forkbomb heuristic itself could easily return a badness of ULONG_MAX if one is detected using runtime and number of children, as I earlier proposed, but that doesn't seem helpful to factor into the scoring. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on 3 Nov 2009 21:30 On Tue, 3 Nov 2009 17:58:04 -0800 (PST) David Rientjes <rientjes(a)google.com> wrote: > On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote: > > > > That's a different point. Today, we can influence the badness score of > > > any user thread to prioritize oom killing from userspace and that can be > > > done regardless of whether there's a memory leaker, a fork bomber, etc. > > > The priority based oom killing is important to production scenarios and > > > cannot be replaced by a heuristic that works everytime if it cannot be > > > influenced by userspace. > > > > > I don't removed oom_adj... > > > > Right, but we must ensure that we have the same ability to influence a > priority based oom killing scheme from userspace as we currently do with a > relatively static total_vm. total_vm may not be the optimal baseline, but > it does allow users to tune oom_adj specifically to identify tasks that > are using more memory than expected and to be static enough to not depend > on rss, for example, that is really hard to predict at the time of oom. > > That's actually my main goal in this discussion: to avoid losing any > ability of userspace to influence to priority of tasks being oom killed > (if you haven't noticed :). > > > > Tweaking on the heuristic will probably make it more convoluted and > > > overall worse, I agree. But it's a more stable baseline than rss from > > > which we can set oom killing priorities from userspace. > > > > - "rss < total_vm_size" always. > > But rss is much more dynamic than total_vm, that's my point. > My point and your point are differnt. 1. All my concern is "baseline for heuristics" 2. All your concern is "baseline for knob, as oom_adj" ok ? For selecting victim by the kernel, dynamic value is much more useful. Current behavior of "Random kill" and "Kill multiple processes" are too bad. Considering oom-killer is for what, I think "1" is more important. But I know what you want, so, I offers new knob which is not affected by RSS as I wrote in previous mail. Off-topic: As memcg is growing better, using OOM-Killer for resource control should be ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, but plz consider to use memcg. > > - oom_adj culculation is quite strong. > > - total_vm of processes which maps hugetlb is very big ....but killing them > > is no help for usual oom. > > > > I recommend you to add "stable baseline" knob for user space, as I wrote. > > My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough > > large. > > > > There's no clear relationship between VM size and runtime. The forkbomb > heuristic itself could easily return a badness of ULONG_MAX if one is > detected using runtime and number of children, as I earlier proposed, but > that doesn't seem helpful to factor into the scoring. > Old processes are important, younger are not. But as I wrote, I'll drop most of patch "6". So, plz forget about this part. I'm interested in fork-bomb killer rather than crazy badness calculation, now. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on 3 Nov 2009 22:20 On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote: > My point and your point are differnt. > > 1. All my concern is "baseline for heuristics" > 2. All your concern is "baseline for knob, as oom_adj" > > ok ? For selecting victim by the kernel, dynamic value is much more useful. > Current behavior of "Random kill" and "Kill multiple processes" are too bad. > Considering oom-killer is for what, I think "1" is more important. > > But I know what you want, so, I offers new knob which is not affected by RSS > as I wrote in previous mail. > > Off-topic: > As memcg is growing better, using OOM-Killer for resource control should be > ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, > but plz consider to use memcg. > I understand what you're trying to do, and I agree with it for most desktop systems. However, I think that admins should have a very strong influence in what tasks the oom killer kills. It doesn't really matter if it's via oom_adj or not, and its debatable whether an adjustment on a static heuristic score is in our best interest in the first place. But we must have an alternative so that our control over oom killing isn't lost. I'd also like to open another topic for discussion if you're proposing such sweeping changes: at what point do we allow ~__GFP_NOFAIL allocations to fail even if order < PAGE_ALLOC_COSTLY_ORDER and defer killing anything? We both agreed that it's not always in the best interest to kill a task so that an allocation can succeed, so we need to define some criteria to simply fail the allocation instead. > Old processes are important, younger are not. But as I wrote, I'll drop > most of patch "6". So, plz forget about this part. > > I'm interested in fork-bomb killer rather than crazy badness calculation, now. > Ok, great. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on 3 Nov 2009 22:30
On Tue, 3 Nov 2009 19:10:34 -0800 (PST) David Rientjes <rientjes(a)google.com> wrote: > On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote: > > > My point and your point are differnt. > > > > 1. All my concern is "baseline for heuristics" > > 2. All your concern is "baseline for knob, as oom_adj" > > > > ok ? For selecting victim by the kernel, dynamic value is much more useful. > > Current behavior of "Random kill" and "Kill multiple processes" are too bad. > > Considering oom-killer is for what, I think "1" is more important. > > > > But I know what you want, so, I offers new knob which is not affected by RSS > > as I wrote in previous mail. > > > > Off-topic: > > As memcg is growing better, using OOM-Killer for resource control should be > > ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, > > but plz consider to use memcg. > > > > I understand what you're trying to do, and I agree with it for most > desktop systems. However, I think that admins should have a very strong > influence in what tasks the oom killer kills. It doesn't really matter if > it's via oom_adj or not, and its debatable whether an adjustment on a > static heuristic score is in our best interest in the first place. But we > must have an alternative so that our control over oom killing isn't lost. > I'll not go too quickly, so, let's discuss and rewrite patches more, later. I'll parepare new version in the next week. For this week, I'll post swap accounting and improve fork-bomb detector. > I'd also like to open another topic for discussion if you're proposing > such sweeping changes: at what point do we allow ~__GFP_NOFAIL allocations > to fail even if order < PAGE_ALLOC_COSTLY_ORDER and defer killing > anything? We both agreed that it's not always in the best interest to > kill a task so that an allocation can succeed, so we need to define some > criteria to simply fail the allocation instead. > Yes, I think allocation itself (> order=0) should fail more before we finally invoke OOM. It tends to be soft-landing rather than oom-killer. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |