while_each_thread() under rcu_read_lock() is broken? [Kernel]

Prev: [PATCH] x86: Fix vsyscall on gcc 4.5 with -Os
Next: [PATCH 10/12] scsi: megaraid_sas - Add input parameter for max_sectors

From: Paul E. McKenney on 24 Jun 2010 23:40

On Thu, Jun 24, 2010 at 02:14:46PM -0700, Roland McGrath wrote:
> > First, what "bad things" can happen to a reader scanning a thread
> > group?
> >
> > 1. The thread-group leader might do exec(), destroying the old
> > list and forming a new one. In this case, we want any readers
> > to stop scanning.
>
> This doesn't do anything different (for these concerns) from just all the
> other threads happening to exit right before the exec. There is no
> "destroying the old" and "forming the new", it's just that all the other
> threads are convinced to die now. There is no problem here.

Understood -- I wasn't saying that each category posed a unique problem,
but rather making sure that I really had enumerated all the possibilities.
The reason for my "destroying the old" and "forming the new" is the
possibility of someone doing proc_task_readdir() when the group leader
does exec(), which causes all to die, and then the new process does
pthread_create(), forming a new thread group. Because proc_task_readdir()
neither holds a lock nor stays in an RCU read-side critical section
for the full /proc scan, the old group might really be destroyed from
the reader's point of view.

That said, I freely admit that I am not very familiar with this code.

> > 2. Some other thread might do exec(), destroying the old list and
> > forming a new one. In this case, we also want any readers to
> > stop scanning.
>
> Again, the list is not really destroyed, just everybody dies. What is
> different here is that ->group_leader changes. This is the only time
> that ever happens. Moreover, it's the only time that a task that was
> previously pointed to by any ->group_leader can be reaped before the
> rest of the group has already been reaped first (and thus the
> thread_group made a singleton).

Yep! Same proc_task_readdir() situation as before. The group leader
cannot go away because proc_task_readdir() takes a reference.

> > 3. The thread-group leader might do pthread_exit(), removing itself
> > from the thread group -- and might do so while the hapless reader
> > is referencing that thread.
>
> This is called the delay_group_leader() case. It doesn't happen in a
> way that has the problems you are concerned with. The group_leader
> remains in EXIT_ZOMBIE state and can't be reaped until all the other
> threads have been reaped. There is no time at which any thread in the
> group is in any hashes or accessible by any means after the (final)
> group_leader is reaped.

OK, good to know -- that does make things simpler.

> > 4. Some other thread might do pthread_exit(), removing itself
> > from the thread group, and again might do so while the hapless
> > reader is referencing that thread. In this case, we want
> > the hapless reader to continue scanning the remainder of the
> > thread group.
>
> This is the most normal case (and #1 is effectively just this repeated
> by every thread in parallel).

Agreed. One possible difference is that in #1, no one is going to
complain about the reader quitting, while in this case someone might
well be annoyed if the list of threads is incomplete.

> > 5. The thread-group leader might do exit(), destroying the old
> > list without forming a new one. In this case, we want any
> > readers to stop scanning.
>
> All this means is everybody is convinced to die, and the group_leader
> dies too. It is not discernibly different from #6.

Seems reasonable.

> > 6. Some other thread might do exit(), destroying the old list
> > without forming a new one. In this case, we also want any
> > readers to stop scanning.
>
> This just means everybody is convinced to die and is not materially
> different from each individual thread all happening to die at the same
> time.

Fair enough. Again, my goal was to ensure that I had covered all the
cases as opposed to ensuring that I had described them minimally.

> You've described all these cases as "we want any readers to stop
> scanning". That is far too strong, and sounds like some kind of
> guaranteed synchronization, which does not and need not exist. Any
> reader that needs a dead thread to be off the list holds siglock
> and/or tasklist_lock. For the casual readers that only use
> rcu_read_lock, we only "want any readers' loops eventually to
> terminate and never to dereference stale pointers". That's why
> normal RCU listiness is generally fine.

OK, so maybe "it is OK for readers to stop scanning" is a better way
of putting it?

> The only problem we have is in #2. This is only a problem because
> readers' loops may be using the old ->group_leader pointer as the
> anchor for their circular-list round-robin loop. Once the former
> leader is removed from the list, that loop termination condition can
> never be met.

Does Oleg's checking for the group leader still being alive look correct
to you?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul E. McKenney on 24 Jun 2010 23:50

On Thu, Jun 24, 2010 at 05:08:10PM -0700, Eric W. Biederman wrote:
> Oleg Nesterov <oleg(a)redhat.com> writes:
>
> > On 06/24, Chris Friesen wrote:
> >>
> >> On 06/24/2010 12:07 PM, Paul E. McKenney wrote:
> >>
> >> > 3. The thread-group leader might do pthread_exit(), removing itself
> >> > from the thread group -- and might do so while the hapless reader
> >> > is referencing that thread.
> >> >
> >> > But isn't this prohibited? Or is it really legal to do a
> >> > pthread_create() to create a new thread and then have the
> >> > parent thread call pthread_exit()? Not something I would
> >> > consider trying in my own code! Well, I might, just to
> >> > be perverse, but... ;-)
> >>
> >> I believe SUS allows the main thread to explicitly call pthread_exit(),
> >> leaving the other threads to run. If the main() routine just returns
> >> then it implicitly calls exit().
> >
> > Correct.
> >
> > But, to clarify, if the main thread does pthread_exit() (sys_exit,
> > actually), it won't be removed from the group. It will be zombie
> > until all other threads exit.
>
> That we don't cleanup that zombie leaders is unfortunate really, it
> means we have the entire de_thread special case. But short fixing
> libpthread to not make bad assumptions there is little we can do about
> it really.

Keeping the zombie leaders does make at least one of the lockless
scan cases quite a bit simpler. I think, anyway.

> I'm only half following this conversation.
>
> If what we are looking for is a stable list_head that won't disappear
> on us we should be able to put one in sighand_struct or signal_struct
> (I forget which is which at the moment) and have a list_head that
> lives for the life of the longest living thread, and that won't get
> messed up by things like de_thread, and then next_thread could simply
> return NULL when we hit the end of the list.

Oleg did suggest this possibility, but there were complications that
I do not claim to fully understand.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Oleg Nesterov on 25 Jun 2010 06:00

On 06/24, Paul E. McKenney wrote:
>
> On Thu, Jun 24, 2010 at 11:57:02PM +0200, Oleg Nesterov wrote:
> > On 06/24, Paul E. McKenney wrote:
> > >
> > > 4. Some other thread might do pthread_exit(), removing itself
> > > from the thread group, and again might do so while the hapless
> > > reader is referencing that thread. In this case, we want
> > > the hapless reader to continue scanning the remainder of the
> > > thread group.
> >
> > Yes.
> >
> > But, if that thread was used as a starting point g, then
> >
> > before the patch: loop forever
> > after the patch: break
>
> So it is OK to skip some of the other threads in this case, even
> though they were present throughout the whole procedure?

I think, yes. We can miss them in any case, they can go away before
while_each_thread(g, t) starts the scan.

If g == group_leader (old or new), then we should notice this thread
at least.

Otherwise we can miss them all, with or without next_thread_careful().

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Oleg Nesterov on 25 Jun 2010 06:20

On 06/24, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg(a)redhat.com> writes:
>
> If what we are looking for is a stable list_head that won't disappear
> on us we should be able to put one in sighand_struct or signal_struct
> (I forget which is which at the moment) and have a list_head that
> lives for the life of the longest living thread, and that won't get
> messed up by things like de_thread, and then next_thread could simply
> return NULL when we hit the end of the list.

This was already discussed.

Yes, we can add list_head into signal_struct. But this breaks, say,
thread_group_empty() and the similar code. This is fixeable.

But. We need to convert the code which does while_each_thread(g, t) to
use list_for_each(t, head). And, if g != group_leader this can't work.

Not to mention the increase of sizeof(signal_struct).

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul E. McKenney on 28 Jun 2010 19:50

On Fri, Jun 25, 2010 at 11:55:48AM +0200, Oleg Nesterov wrote:
> On 06/24, Paul E. McKenney wrote:
> >
> > On Thu, Jun 24, 2010 at 11:57:02PM +0200, Oleg Nesterov wrote:
> > > On 06/24, Paul E. McKenney wrote:
> > > >
> > > > 4. Some other thread might do pthread_exit(), removing itself
> > > > from the thread group, and again might do so while the hapless
> > > > reader is referencing that thread. In this case, we want
> > > > the hapless reader to continue scanning the remainder of the
> > > > thread group.
> > >
> > > Yes.
> > >
> > > But, if that thread was used as a starting point g, then
> > >
> > > before the patch: loop forever
> > > after the patch: break
> >
> > So it is OK to skip some of the other threads in this case, even
> > though they were present throughout the whole procedure?
>
> I think, yes. We can miss them in any case, they can go away before
> while_each_thread(g, t) starts the scan.
>
> If g == group_leader (old or new), then we should notice this thread
> at least.
>
> Otherwise we can miss them all, with or without next_thread_careful().

Just to be sure that we are actually talking about the same scenario...

Suppose that a task group is lead by 2908 and has member 2909, 2910,
2911, and 2912. Suppose that 2910 does pthread_exit() just as some
other task is "ls"ing the relevant /proc entry. Is it really OK for
"ls" to show 2909 but not 2911 and 2912, even though 2911 and 2912
were alive and kicking the entire time?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: [PATCH] x86: Fix vsyscall on gcc 4.5 with -Os
Next: [PATCH 10/12] scsi: megaraid_sas - Add input parameter for max_sectors