Prev: oom: give current access to memory reserves if it has been killed
Next: patch iwlwifi-silence-tfds_in_queue-message.patch added to 2.6.32-stable tree
From: David Rientjes on 30 Mar 2010 16:40 On Tue, 30 Mar 2010, Oleg Nesterov wrote: > ->siglock is no longer needed to access task->signal, change > oom_adjust_read() and oom_adjust_write() to read/write oom_adj > lockless. > > Yes, this means that "echo 2 >oom_adj" and "echo 1 >oom_adj" > can race and the second write can win, but I hope this is OK. > Ok, but could you base this on -mm at http://userweb.kernel.org/~akpm/mmotm/ since an additional tunable has been added (oom_score_adj), which does the same thing? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on 31 Mar 2010 05:30 On 03/30, David Rientjes wrote: > > On Tue, 30 Mar 2010, Oleg Nesterov wrote: > > > ->siglock is no longer needed to access task->signal, change > > oom_adjust_read() and oom_adjust_write() to read/write oom_adj > > lockless. > > > > Yes, this means that "echo 2 >oom_adj" and "echo 1 >oom_adj" > > can race and the second write can win, but I hope this is OK. > > > > Ok, but could you base this on -mm at > http://userweb.kernel.org/~akpm/mmotm/ since an additional tunable has > been added (oom_score_adj), which does the same thing? Ah, OK, will do. Thanks David. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on 31 Mar 2010 15:10 On 03/30, David Rientjes wrote: > > On Tue, 30 Mar 2010, Oleg Nesterov wrote: > > > ->siglock is no longer needed to access task->signal, change > > oom_adjust_read() and oom_adjust_write() to read/write oom_adj > > lockless. > > > > Yes, this means that "echo 2 >oom_adj" and "echo 1 >oom_adj" > > can race and the second write can win, but I hope this is OK. > > Ok, but could you base this on -mm at > http://userweb.kernel.org/~akpm/mmotm/ since an additional tunable has > been added (oom_score_adj), which does the same thing? David, I just can't understand why oom-badness-heuristic-rewrite.patch duplicates the related code in fs/proc/base.c and why it preserves the deprecated signal->oom_adj. OK. Please forget about lock_task_sighand/signal issues. Can't we kill signal->oom_adj and create a single helper for both /proc/pid/{oom_adj,oom_score_adj} ? static ssize_t oom_any_adj_write(struct file *file, const char __user *buf, size_t count, bool deprecated_mode) { struct task_struct *task; char buffer[PROC_NUMBUF]; unsigned long flags; long oom_score_adj; int err; memset(buffer, 0, sizeof(buffer)); if (count > sizeof(buffer) - 1) count = sizeof(buffer) - 1; if (copy_from_user(buffer, buf, count)) return -EFAULT; err = strict_strtol(strstrip(buffer), 0, &oom_score_adj); if (err) return -EINVAL; if (depraceted_mode) { if (oom_score_adj == OOM_ADJUST_MAX) oom_score_adj = OOM_SCORE_ADJ_MAX; else oom_score_adj = (oom_score_adj * OOM_SCORE_ADJ_MAX) / -OOM_DISABLE; } if (oom_score_adj < OOM_SCORE_ADJ_MIN || oom_score_adj > OOM_SCORE_ADJ_MAX) return -EINVAL; task = get_proc_task(file->f_path.dentry->d_inode); if (!task) return -ESRCH; if (!lock_task_sighand(task, &flags)) { put_task_struct(task); return -ESRCH; } if (oom_score_adj < task->signal->oom_score_adj && !capable(CAP_SYS_RESOURCE)) { unlock_task_sighand(task, &flags); put_task_struct(task); return -EACCES; } task->signal->oom_score_adj = oom_score_adj; unlock_task_sighand(task, &flags); put_task_struct(task); return count; } This is just the current oom_score_adj_read() + "if (depraceted_mode)" which does oom_adj -> oom_score_adj conversion. Now, static ssize_t oom_adjust_write(...) { printk_once(KERN_WARNING "... deprecated ...\n"); return oom_any_adj_write(..., true); } static ssize_t oom_score_adj_write(...) { return oom_any_adj_write(..., false); } The same for oom_xxx_read(). What is the point to keep signal->oom_adj ? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on 31 Mar 2010 17:20 On Wed, 31 Mar 2010, Oleg Nesterov wrote: > David, I just can't understand why > oom-badness-heuristic-rewrite.patch > duplicates the related code in fs/proc/base.c and why it preserves > the deprecated signal->oom_adj. > You could combine the two write functions together and then two read functions together if you'd like. > OK. Please forget about lock_task_sighand/signal issues. Can't we kill > signal->oom_adj and create a single helper for both > /proc/pid/{oom_adj,oom_score_adj} ? > > static ssize_t oom_any_adj_write(struct file *file, const char __user *buf, > size_t count, bool deprecated_mode) > { > struct task_struct *task; > char buffer[PROC_NUMBUF]; > unsigned long flags; > long oom_score_adj; > int err; > > memset(buffer, 0, sizeof(buffer)); > if (count > sizeof(buffer) - 1) > count = sizeof(buffer) - 1; > if (copy_from_user(buffer, buf, count)) > return -EFAULT; > > err = strict_strtol(strstrip(buffer), 0, &oom_score_adj); > if (err) > return -EINVAL; > > if (depraceted_mode) { > if (oom_score_adj == OOM_ADJUST_MAX) > oom_score_adj = OOM_SCORE_ADJ_MAX; ??? > else > oom_score_adj = (oom_score_adj * OOM_SCORE_ADJ_MAX) / > -OOM_DISABLE; > } > > if (oom_score_adj < OOM_SCORE_ADJ_MIN || > oom_score_adj > OOM_SCORE_ADJ_MAX) That doesn't work for depraceted_mode (sic), you'd need to test for OOM_ADJUST_MIN and OOM_ADJUST_MAX in that case. > return -EINVAL; > > task = get_proc_task(file->f_path.dentry->d_inode); > if (!task) > return -ESRCH; > if (!lock_task_sighand(task, &flags)) { > put_task_struct(task); > return -ESRCH; > } > if (oom_score_adj < task->signal->oom_score_adj && > !capable(CAP_SYS_RESOURCE)) { > unlock_task_sighand(task, &flags); > put_task_struct(task); > return -EACCES; > } > > task->signal->oom_score_adj = oom_score_adj; > > unlock_task_sighand(task, &flags); > put_task_struct(task); > return count; > } > There have been efforts to reuse as much of this code as possible for other sysctl handlers as well, you might be better off looking for other users of the common read and write code and then merging them first (comm_write, proc_coredump_filter_write, etc). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on 31 Mar 2010 19:10
On 03/31, David Rientjes wrote: > > On Wed, 31 Mar 2010, Oleg Nesterov wrote: > > > David, I just can't understand why > > oom-badness-heuristic-rewrite.patch > > duplicates the related code in fs/proc/base.c and why it preserves > > the deprecated signal->oom_adj. > > You could combine the two write functions together and then two read > functions together if you'd like. Yes, > > static ssize_t oom_any_adj_write(struct file *file, const char __user *buf, > > size_t count, bool deprecated_mode) > > { > > > > if (depraceted_mode) { > > if (oom_score_adj == OOM_ADJUST_MAX) > > oom_score_adj = OOM_SCORE_ADJ_MAX; > > ??? What? > > else > > oom_score_adj = (oom_score_adj * OOM_SCORE_ADJ_MAX) / > > -OOM_DISABLE; > > } > > > > if (oom_score_adj < OOM_SCORE_ADJ_MIN || > > oom_score_adj > OOM_SCORE_ADJ_MAX) > > That doesn't work for depraceted_mode (sic), you'd need to test for > OOM_ADJUST_MIN and OOM_ADJUST_MAX in that case. Yes, probably "if (depraceted_mode)" should do more checks, I didn't try to verify that MIN/MAX are correctly converted. I showed this code to explain what I mean. > There have been efforts to reuse as much of this code as possible for > other sysctl handlers as well, you might be better off looking for David, sorry ;) Right now I'd better try to stop the overloading of ->siglock. And, I'd like to shrink struct_signal if possible, but this is minor. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |