Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) [Kernel]

Prev: futex: Take mmap_sem for get_user_pages in fault_in_user_writeable
Next: [PATCH v9 0/8] Loongson: YeeLoong: add platform drivers

From: Alan Stern on 8 Dec 2009 15:50

On Tue, 8 Dec 2009, Rafael J. Wysocki wrote:

> > This is a little more awkward because it requires the parent to iterate
> > through its children.
>
> I can live with that.
>
> > But it does solve the off-tree dependency problem for suspends.
>
> That's a plus, but I still think we're trying to create a barrier-alike
> mechanism using lock.
>
> There's one more possibility to consider, though. What if we use a completion
> instead of the flag + wait queue? It surely is a standard synchronization
> mechanism and it seems it might work here.

You're right. I should have thought of that. Linus's original
approach couldn't use a completion because during suspend it needed to
make one task (the parent) wait for a bunch of others (the children).
But if you iterate through the children by hand, that objection no
longer applies.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Rafael J. Wysocki on 8 Dec 2009 16:00

On Tuesday 08 December 2009, Alan Stern wrote:
> On Tue, 8 Dec 2009, Rafael J. Wysocki wrote:
>
> > > This is a little more awkward because it requires the parent to iterate
> > > through its children.
> >
> > I can live with that.
> >
> > > But it does solve the off-tree dependency problem for suspends.
> >
> > That's a plus, but I still think we're trying to create a barrier-alike
> > mechanism using lock.
> >
> > There's one more possibility to consider, though. What if we use a completion
> > instead of the flag + wait queue? It surely is a standard synchronization
> > mechanism and it seems it might work here.
>
> You're right. I should have thought of that. Linus's original
> approach couldn't use a completion because during suspend it needed to
> make one task (the parent) wait for a bunch of others (the children).
> But if you iterate through the children by hand, that objection no
> longer applies.

BTW, is there a good reason why completion_done() doesn't use spin_lock_irqsave
and spin_unlock_irqrestore? complete() and complete_all() use them, so why not
here?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Linus Torvalds on 8 Dec 2009 16:10

On Tue, 8 Dec 2009, Rafael J. Wysocki wrote:
>
> Anyway, if we use an rwsem, it won't be checkable from interrupt context just
> as well.

You can't do a lock() from an interrupt, but the unlocks should be
irq-safe.

> Suppose we use rwsem and during suspend each child uses a down_read() on a
> parent and then the parent uses down_write() on itself. What if, whatever the
> reason, the parent is a bit early and does the down_write() before one of the
> children has a chance to do the down_read()? Aren't we toast?

We're toast, but we're toast for a totally unrealted reason: it means that
you tried to resume a child before a parent, which would be a major bug to
begin with.

Look, I even wrote out the comments, so let me repeat the code one more
time.

- suspend time calling:
// This won't block, because we suspend nodes before parents
down_read(node->parent->lock);
// Do the part that may block asynchronously
async_schedule(do_usb_node_suspend, node);

- resume time calling:
// This won't block, because we resume parents before children,
// and the children will take the read lock.
down_write(leaf->lock);
// Do the blocking part asynchronously
async_schedule(usb_node_resume, leaf);

See? So when we take the parent lock for suspend, we are guaranteed to do
so _before_ the parent node itself suspends. And conversely, when we take
the parent lock (asynchronously) for resume, we're guaranteed to do that
_after_ the parent node has done its own down_write.

And that all depends on just one trivial thing; that the suspend and
resume is called in the right order (children first vs parent first
respectively). And that is such a _major_ correctness issue that if that
isn't correct, your suspend isn't going to work _anyway_.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Linus Torvalds on 8 Dec 2009 16:10

On Tue, 8 Dec 2009, Alan Stern wrote:
>
> That's not the way it should be done. Linus had children taking their
> parents' locks during suspend, which is simple but leads to
> difficulties.

No it doesn't. Name them.

> Instead, the PM core should do a down_write() on each device before
> starting the device's async suspend routine, and an up_write() when the
> routine finishes.

No you should NOT do that. If you do that, you serialize the suspend
incorrectly and much too early. IOW, think a topology like this:

a -> b -> c
\
> d -> e

where you'd want to suspend 'c' and 'e' asynchronously. If we do a
'down-write()' on b, then we'll delay until 'c' has suspended, an if we
have ordered the nodes in the obvious depth-first order, we'll walk the PM
device list in the order:

c b e d a

and now we'll serialize on 'b', waiting for 'c' to suspend. Which we do
_not_ want to do, because the whole point was to suspend 'c' and 'e'
together.

> Parents should, at the start of their async routine,
> do down_read() on each of their children plus whatever other devices
> they need to wait for. The core can do the waiting for children part
> and the driver's suspend routine can handle any other waiting.

Why?

That just complicates things. Compare to my simple locking scheme I've
quoted several times.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alan Stern on 8 Dec 2009 16:40

On Tue, 8 Dec 2009, Linus Torvalds wrote:

> > No, that's not needed. Unlike reads, writes can't move in front of
> > data or control dependencies. Or so I've been lead to believe...
>
> Sure they can. Control dependencies are trivial - it's called "branch
> prediction", and everybody does it, and data dependencies don't exist on
> many CPU architectures (even to the point of reading through a pointer
> that you loaded).

Wait a second. Are you saying that with code like this:

if (x == 1)
y = 5;

the CPU may write to y before it has finished reading the value of x?
And this write is visible to other CPUs, so that if x was initially 0
and a second CPU sets x to 1, the second CPU may see y == 5 before it
executes the write to x (whatever that may mean)?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: futex: Take mmap_sem for get_user_pages in fault_in_user_writeable
Next: [PATCH v9 0/8] Loongson: YeeLoong: add platform drivers