Prev: perf probe: Fix error message if get_real_path() failed
Next: linux-next: build failure after merge of the final tree (block tree related)
From: Louis Rilling on 9 Jul 2010 08:20 On 08/07/10 21:39 -0700, Eric W. Biederman wrote: > > Currently it is possible to put proc_mnt before we have flushed the > last process that will use the proc_mnt to flush it's proc entries. > > This race is fixed by not flushing proc entries for dead pid > namespaces, and calling pid_ns_release_proc unconditionally from > zap_pid_ns_processes after the pid namespace has been declared dead. One comment below. > > To ensure we don't unnecessarily leak any dcache entries with skipped > flushes pid_ns_release_proc flushes the entire proc_mnt when it is > called. > > Signed-off-by: Eric W. Biederman <ebiederm(a)xmission.com> > --- > fs/proc/base.c | 9 +++++---- > fs/proc/root.c | 3 +++ > kernel/pid_namespace.c | 1 + > 3 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/fs/proc/base.c b/fs/proc/base.c > index acb7ef8..e9d84e1 100644 > --- a/fs/proc/base.c > +++ b/fs/proc/base.c > @@ -2742,13 +2742,14 @@ void proc_flush_task(struct task_struct *task) > > for (i = 0; i <= pid->level; i++) { > upid = &pid->numbers[i]; > + > + /* Don't bother flushing dead pid namespaces */ > + if (test_bit(PIDNS_DEAD, &upid->ns->flags)) > + continue; > + IMHO, nothing prevents zap_pid_ns_processes() from setting PIDNS_DEAD and calling pid_ns_release_proc() right now. zap_pid_ns_processes() does not wait for EXIT_DEAD (self-reaping) children to be released. Thanks, Louis > proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr, > tgid->numbers[i].nr); > } > - > - upid = &pid->numbers[pid->level]; > - if (upid->nr == 1) > - pid_ns_release_proc(upid->ns); > } > > static struct dentry *proc_pid_instantiate(struct inode *dir, > diff --git a/fs/proc/root.c b/fs/proc/root.c > index cfdf032..2298fdd 100644 > --- a/fs/proc/root.c > +++ b/fs/proc/root.c > @@ -209,5 +209,8 @@ int pid_ns_prepare_proc(struct pid_namespace *ns) > > void pid_ns_release_proc(struct pid_namespace *ns) > { > + /* Flush any cached proc dentries for this pid namespace */ > + shrink_dcache_parent(ns->proc_mnt->mnt_root); > + > mntput(ns->proc_mnt); > } > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c > index 92032d1..43dec5d 100644 > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -189,6 +189,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) > rc = sys_wait4(-1, NULL, __WALL, NULL); > } while (rc != -ECHILD); > > + pid_ns_release_proc(pid_ns); > acct_exit_ns(pid_ns); > return; > } > -- > 1.6.5.2.143.g8cc62 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo(a)vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes
From: Louis Rilling on 9 Jul 2010 10:20
On 09/07/10 6:05 -0700, Eric W. Biederman wrote: > Louis Rilling <Louis.Rilling(a)kerlabs.com> writes: > > > On 08/07/10 21:39 -0700, Eric W. Biederman wrote: > >> > >> Currently it is possible to put proc_mnt before we have flushed the > >> last process that will use the proc_mnt to flush it's proc entries. > >> > >> This race is fixed by not flushing proc entries for dead pid > >> namespaces, and calling pid_ns_release_proc unconditionally from > >> zap_pid_ns_processes after the pid namespace has been declared dead. > > > > One comment below. > > > >> > >> To ensure we don't unnecessarily leak any dcache entries with skipped > >> flushes pid_ns_release_proc flushes the entire proc_mnt when it is > >> called. > >> > >> Signed-off-by: Eric W. Biederman <ebiederm(a)xmission.com> > >> --- > >> fs/proc/base.c | 9 +++++---- > >> fs/proc/root.c | 3 +++ > >> kernel/pid_namespace.c | 1 + > >> 3 files changed, 9 insertions(+), 4 deletions(-) > >> > >> diff --git a/fs/proc/base.c b/fs/proc/base.c > >> index acb7ef8..e9d84e1 100644 > >> --- a/fs/proc/base.c > >> +++ b/fs/proc/base.c > >> @@ -2742,13 +2742,14 @@ void proc_flush_task(struct task_struct *task) > >> > >> for (i = 0; i <= pid->level; i++) { > >> upid = &pid->numbers[i]; > >> + > >> + /* Don't bother flushing dead pid namespaces */ > >> + if (test_bit(PIDNS_DEAD, &upid->ns->flags)) > >> + continue; > >> + > > > > IMHO, nothing prevents zap_pid_ns_processes() from setting PIDNS_DEAD and > > calling pid_ns_release_proc() right now. zap_pid_ns_processes() does not wait > > for EXIT_DEAD (self-reaping) children to be released. > > Good point we need something probably a lock to prevent proc_mnt from > going away here. We might do a little better if we were starting with > a specific dentry, those at least have some rcu properties but that isn't > a big help. > > Hmm. Perhaps there is a way to completely restructure this flushing > of dentries. It is just an optimization after all so we don't get too many > stale dentries building up. > > It might just be worth it simply kill proc_flush_mnt altogether. I know > it is measurable when we don't do the flushing but perhaps there can > be a work struct that periodically wakes up and smacks stale proc dentries. > > Right now I really don't think proc_flush_task is worth the hassle it > causes. Indeed, proc_flush_task() seems to be the only bad guy trying to access pid_ns->proc_mnt after the death of the init process. But I don't know enough about the performance impact of removing it. Louis > > Grumble, Grumble more thinking to do. > > Eric > _______________________________________________ > Containers mailing list > Containers(a)lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/containers -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes |