Prev: [PATCH] Staging: quatech_usb2: fix coding style issues
Next: proc: don't take ->siglock for /proc/pid/oom_adj
From: David Rientjes on 1 Apr 2010 15:00 On Thu, 1 Apr 2010, Oleg Nesterov wrote: > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -459,7 +459,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > > * its children or threads, just set TIF_MEMDIE so it can die quickly > > */ > > if (p->flags & PF_EXITING) { > > - __oom_kill_task(p); > > + set_tsk_thread_flag(p, TIF_MEMDIE); > > So, probably this makes sense anyway but not strictly necessary, up to you. > It matches the already-existing comment that only says we need to set TIF_MEMDIE so it can quickly exit rather than call __oom_kill_task(), so it seems worthwhile. > > if (fatal_signal_pending(current)) { > > - __oom_kill_task(current); > > + set_tsk_thread_flag(current, TIF_MEMDIE); > > Yes, I think this fix is needed. > Ok, I'll add your acked-by and send this to Andrew with a follow-up that consolidates __oom_kill_task() into oom_kill_task(), thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on 1 Apr 2010 15:20 On Thu, 1 Apr 2010, Oleg Nesterov wrote: > > > @@ -159,13 +172,9 @@ unsigned int oom_badness(struct task_str > > > if (p->flags & PF_OOM_ORIGIN) > > > return 1000; > > > > > > - task_lock(p); > > > - mm = p->mm; > > > - if (!mm) { > > > - task_unlock(p); > > > + p = find_lock_task_mm(p); > > > + if (!p) > > > return 0; > > > - } > > > - > > > /* > > > * The baseline for the badness score is the proportion of RAM that each > > > * task's rss and swap space use. > > > @@ -330,12 +339,6 @@ static struct task_struct *select_bad_pr > > > *ppoints = 1000; > > > } > > > > > > - /* > > > - * skip kernel threads and tasks which have already released > > > - * their mm. > > > - */ > > > - if (!p->mm) > > > - continue; > > > if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) > > > continue; > > > > You can't do this for the reason I cited in another email, oom_badness() > > returning 0 does not exclude a task from being chosen by > > selcet_bad_process(), it will use that task if nothing else has been found > > yet. We must explicitly filter it from consideration by checking for > > !p->mm. > > Yes, you are right. OK, oom_badness() can never return points < 0, > we can make it int and oom_badness() can return -1 if !mm. IOW, > > - unsigned int points; > + int points; > ... > > points = oom_badness(...); > if (points >= 0 && (points > *ppoints || !chosen)) > chosen = p; > oom_badness() and its predecessor badness() in mainline never return negative scores, so I don't see the value in doing this; just filter the task in select_bad_process() with !p->mm as it has always been done. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on 2 Apr 2010 15:10 On Fri, 2 Apr 2010, Oleg Nesterov wrote: > > > Yes, you are right. OK, oom_badness() can never return points < 0, > > > we can make it int and oom_badness() can return -1 if !mm. IOW, > > > > > > - unsigned int points; > > > + int points; > > > ... > > > > > > points = oom_badness(...); > > > if (points >= 0 && (points > *ppoints || !chosen)) > > > chosen = p; > > > > > > > oom_badness() and its predecessor badness() in mainline never return > > negative scores, so I don't see the value in doing this; just filter the > > task in select_bad_process() with !p->mm as it has always been done. > > David, you continue to ignore my arguments ;) select_bad_process() > must not filter out the tasks with ->mm == NULL. > > Once again: > > void *memory_hog_thread(void *arg) > { > for (;;) > malloc(A_LOT); > } > > int main(void) > { > pthread_create(memory_hog_thread, ...); > syscall(__NR_exit, 0); > } > > Now, even if we fix PF_EXITING check, select_bad_process() will always > ignore this process. The group leader has ->mm == NULL. > > See? > > That is why I think we need something like find_lock_task_mm() in the > pseudo-patch I sent. > I'm not ignoring your arguments, I think you're ignoring what I'm responding to. I prefer to keep oom_badness() to be a positive range as it always has been (and /proc/pid/oom_score has always used an unsigned qualifier), so I disagree that we need to change oom_badness() to return anything other than 0 for such tasks. We need to filter them explicitly in select_bad_process() instead, so please do this there. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on 2 Apr 2010 15:50 On Fri, 2 Apr 2010, Oleg Nesterov wrote: > > > David, you continue to ignore my arguments ;) select_bad_process() > > > must not filter out the tasks with ->mm == NULL. > > > > > I'm not ignoring your arguments, I think you're ignoring what I'm > > responding to. > > Ah, sorry, I misunderstood your replies. > > > I prefer to keep oom_badness() to be a positive range as > > it always has been (and /proc/pid/oom_score has always used an unsigned > > qualifier), > > Yes, I thought about /proc/pid/oom_score, but imho this is minor issue. > We can s/%lu/%ld/ though, or just report 0 if oom_badness() returns -1. > Or something. > Just have it return 0, meaning never kill, and then ensure "chosen" is never set for an oom_badness() of 0, even if we don't have another task to kill. That's how Documentation/filesystems/proc.txt describes it anyway. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on 8 Apr 2010 17:10 On Thu, 1 Apr 2010, Oleg Nesterov wrote: > > > > > Say, oom_forkbomb_penalty() does list_for_each_entry(tsk->children). > > > > > Again, this is not right even if we forget about !child->mm check. > > > > > This list_for_each_entry() can only see the processes forked by the > > > > > main thread. > > > > > > > > > > > > > That's the intention. > > > > > > Why? shouldn't oom_badness() return the same result for any thread > > > in thread group? We should take all childs into account. > > > > > > > oom_forkbomb_penalty() only cares about first-descendant children that > > do not share the same memory, > > I see, but the code doesn't really do this. I mean, it doesn't really > see the first-descendant children, only those which were forked by the > main thread. > > Look. We have a main thread M and the sub-thread T. T forks a lot of > processes which use a lot of memory. These processes _are_ the first > descendant children of the M+T thread group, they should be accounted. > But M->children list is empty. > > oom_forkbomb_penalty() and oom_kill_process() should do > > t = tsk; > do { > list_for_each_entry(child, &t->children, sibling) { > ... take child into account ... > } > } while_each_thread(tsk, t); > > In this case, it seems more appropriate that we would penalize T and not M since it's not necessarily responsible for the behavior of the children it forks. T is the buggy/malicious program, not M. > See the patch below. Yes, this is minor, but it is always good to avoid > the unnecessary locks, and thread_group_cputime() is O(N). > > Not only for performance reasons. This allows to change the locking in > thread_group_cputime() if needed without fear to deadlock with task_lock(). > > Oleg. > > --- x/mm/oom_kill.c > +++ x/mm/oom_kill.c > @@ -97,13 +97,16 @@ static unsigned long oom_forkbomb_penalt > return 0; > list_for_each_entry(child, &tsk->children, sibling) { > struct task_cputime task_time; > - unsigned long runtime; > + unsigned long runtime, this_rss; > > task_lock(child); > if (!child->mm || child->mm == tsk->mm) { > task_unlock(child); > continue; > } > + this_rss = get_mm_rss(child->mm); > + task_unlock(child); > + > thread_group_cputime(child, &task_time); > runtime = cputime_to_jiffies(task_time.utime) + > cputime_to_jiffies(task_time.stime); > @@ -113,10 +116,9 @@ static unsigned long oom_forkbomb_penalt > * get to execute at all in such cases anyway. > */ > if (runtime < HZ) { > - child_rss += get_mm_rss(child->mm); > + child_rss += this_rss; > forkcount++; > } > - task_unlock(child); > } > > /* This patch looks good, will you send it to Andrew with a changelog and sign-off line? Also feel free to add: Acked-by: David Rientjes <rientjes(a)google.com> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Pages: 1 2 Prev: [PATCH] Staging: quatech_usb2: fix coding style issues Next: proc: don't take ->siglock for /proc/pid/oom_adj |