Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) [Kernel]

Prev: futex: Take mmap_sem for get_user_pages in fault_in_user_writeable
Next: [PATCH v9 0/8] Loongson: YeeLoong: add platform drivers

From: Mark Brown on 9 Dec 2009 13:10

On Wed, Dec 09, 2009 at 12:10:03PM -0500, Alan Stern wrote:
> On Wed, 9 Dec 2009, Mark Brown wrote:

> > Worst case is about a second for both resume and suspend which means two
> > seconds total but it's very hardware dependant.

> A second seems awfully long. What happens if audio isn't being played
> when the suspend occurs? Can't you shorten things with no artifacts in
> that case?

For the affected hardware the problem is basically the same with or
without audio being played. As I said in my reply to Linus this is
delays caused by ramping reference voltages. These delays are
sufficiently long that the reference voltages have to be maintained all
the time so that they don't delay the start of audio streams which means
that having or not having an audio stream at suspend time doesn't affect
the reference voltage ramps since we don't turn them off when not in
use. There is a win from other stuff having been shut off already, but
it's already being exploited.

On suspend the problem is the same as for resume - we need to ramp the
voltages quietly, this time down to zero. We want to make sure they're
actually at zero to ensure that the ramp at resume time starts from a
known hardware state.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mark Brown on 9 Dec 2009 13:30

On Wed, Dec 09, 2009 at 09:57:22AM -0800, Linus Torvalds wrote:
> On Wed, 9 Dec 2009, Mark Brown wrote:

> > The problem comes when you've got audio outputs referenced to something
> > other than ground which used to happen because no negative supplies were
> > available in these systems. To bring these up from cold you need to
> > bring the outputs up to the reference level but if you do that by just
> > turning on the power you get an audible (often loud) noise in the output
> > from the square(ish) waveform that results which users don't find
> > acceptable.

> Ouch. A second still sounds way too long - but whatever.

Yes, I think there's pretty much universal agreement on that :)

Hardware that needs a few hundred miliseconds is much more common at the
minute (and like I say current generation hardware is basically
unaffected), but it's the number I keep in mind when considering how bad
things might be.

> However, it sounds like the nice way to do that isn't by doing it
> synchronously in the suspend/resume code itself, but simply ramping it
> down (and up) from a timer. It would be asynchronous, but not because the
> suspend itself is in any way asynchronous.

We don't actually need a timer for most of this - generally the ramp is
done by charging or discharging a capacitor through a resistor so you
just set it going then wait, possibly in several stages with a little
bit twiddling in the middle to speed things up which could be done off a
timer.

> Done right, it might even result in a nice volume fade of the sound (ie if
> the hw allows for it, stop the actual sound engine late on suspend, and
> start it early on resume, so that sound works _while_ the whole reference
> volume rampdown/up is going on)

The big issue with running off a partially ramped supply is that it can
upset the analogue components - for example, if an amplifier is trying
to handle a signal with an amplitude outside the supply range then it'll
clip. But sometimes that approach does work and it does get used.

For resume we're pretty much taking care of it already by moving the
resume out of the main device resume and using ALSA-specific stuff to
keep audio streams stopped until we're done but for suspend we don't
know the system is going down until the suspend starts and we do want to
make sure we got the analogue into a known poweroff state so that we can
control powerup properly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alan Stern on 9 Dec 2009 14:10

On Tue, 8 Dec 2009, Linus Torvalds wrote:

> > Wait a second. Are you saying that with code like this:
> >
> > if (x == 1)
> > y = 5;
> >
> > the CPU may write to y before it has finished reading the value of x?

> > And this write is visible to other CPUs, so that if x was initially 0
> > and a second CPU sets x to 1, the second CPU may see y == 5 before it
> > executes the write to x (whatever that may mean)?
>
> Well, yes and no. CPU1 above won't release the '5' until it has confirmed
> the '1' (even if it does so by reading it late). but assuming the other
> CPU also does speculation, then yes, the situation you describe could
> happen. If the other CPU does
>
> z = y;
> x = 1;
>
> then it's certainly possible that 'z' contains 5 at the end (even if both
> x and y started out zero). Because now the read of 'y' on that other CPU
> might be delayed, and the write of 'x' goes ahead, CPU1 sees the 1, and
> commits its write of 5, sp when CPU2 gets the cacheline, z will now
> contain 5.

That could be attributed to reordering on CPU2, so let's take CPU2's
peculiarities out of the picture (initially everything is set to 0):

CPU1 CPU2
---- ----
if (x == 1) z = y;
y = 5; mb();
x = 1;

This gets at the heart of the question: Can a write move up past a
control dependency? Similar questions apply to the two types of data
dependency:

CPU1 CPU2
---- ----
y = x + 4; z = y;
mb();
x = 1;

(Initially p points to x, not y):

CPU1 CPU2
---- ----
*p = 5; z = y;
mb();
p = &y;

Can z end up equal to 5 in any of these examples?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Linus Torvalds on 9 Dec 2009 17:00

On Wed, 9 Dec 2009, Alan Stern wrote:
>
> That could be attributed to reordering on CPU2, so let's take CPU2's
> peculiarities out of the picture (initially everything is set to 0):
>
> CPU1 CPU2
> ---- ----
> if (x == 1) z = y;
> y = 5; mb();
> x = 1;
>
> This gets at the heart of the question: Can a write move up past a
> control dependency?
> [ .. ]
> Can z end up equal to 5 in any of these examples?

In any _practical_ microarchitecture I know of, the above will never
result in 'z' being 5, even though CPU1 doesn't really have a memory
barrier. But if I read the alpha memory ordering guarantees rigth, then at
least in theory you really can end up with z=5.

Let me write that as five events (with the things in brackets being what
the alpha memory ordering manual calls them):

- A is "read of x returns 1" on CPU1 [ P1:R(x,1) ]
- B is "write of value 5 to y" on CPU1 [ P1:W(y,5) ]
- C is "read of y returns 5" on CPU2 [ P2:R(y,5) ]
- D is "write of value 1 to x" on CPU2 [ P2:W(x,1) ]
- 'MB' is the mb() on CPU2 [ P2:MB ]

(The write of 'z' is irrelevant, we can think of it as a register, the end
result is the same).

And yes, if I read the alpha memory ordering rules correctly, you really
can end up with z=5, although I don't think you will ever find an alpha
_implementation_ that does it.

Why?

The alpha memory ordering literally defines ordering in two ways:

- "location access order". But that is _only_ defined per actual
location, so while 'x' can have a location access order specified by
seeing certain values, there is no "location access order" for two
different memory locations (x and y).

The alpha architecture manual uses "A << B" to say "event A" is before
"event B" when there is a defined ordering.

So in the example above, there is a location access ordering between

P2:W(x,1) << P1:R(x, 1)

and

P2:R(y,5) << P1:W(y,5)

ie you have D << A and B << C.

Good so far, but that doesn't define anything else: there's only
ordering between the pairs (D,A) and (B,C), nothing between them.

- "Processor issue order" for two instruction is _only_ defined by either
(a) memory barriers or (b) accesses to the _same_ locations. The alpha
architecture manual uses "A < B" to say that "event A" is before "event
B" in processor issue order.

So there is a "Processor issue order" on CPU2 due to the memory
barrier: P2:R(y,5) < P2:MB < P2:W(x,1), or put another way C < MB < D:
C < D.

Now, the question is, can we actually get the behaviour of reading 5 on
CPU2 (ie P2:R(y,5)), and that is only possible if we can find an ordering
that satisfies all the constraints. We have

D << A
B << C
C < D

and it seems to be that it is a possible situation: "B C D A"
really does satisfy all the constraints afaik.

So yes, according to the actual alpha architecture memory ordering rules,
you can see '5' from that first read of 'y'. DESPITE having a mb() on
CPU2.

In order to not see 5, you need to also specify "A < B", and the _only_
way to do that processor issue order specification is with a memory
barrier (or if the locations are the same, which they aren't).

"Causality" simply is nowhere in the officially defined alpha memory
ordering. The fact that we test 'x == 1' and conditionally do the write
simply doesn't enter the picture. I suspect you'd have a really hard time
not having causality in practice, but there _are_ things that can break
causality (value prediction etc), so it's not like you'd have to actually
violate physics of reality to do it.

IOW, you could at least in theory implement a CPU that does every
instruction speculatively in parallel, and then validates the end result
afterwards according to the architecture rules. And that CPU would require
the memory barrier on alpha.

(On x86, 'causality' is defined to be part of the memory ordering rules,
so on x86, you _do_ have a 'A < B' relationship. But not on alpha).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Rafael J. Wysocki on 9 Dec 2009 17:00

On Wednesday 09 December 2009, Alan Stern wrote:
> On Tue, 8 Dec 2009, Rafael J. Wysocki wrote:
>
> > > Well, one difficulty. It arises only because we are contemplating
> > > having the PM core fire up the async tasks, rather than having the
> > > drivers' suspend routines launch them (the way your original proposal
> > > did -- the difficulty does not arise there).
> > >
> > > Suppose A and B are unrelated devices and we need to impose the
> > > off-tree constraint that A suspends after B. With children taking
> > > their parent's lock, the way to prevent A from suspending too soon is
> > > by having B's suspend routine acquire A's lock.
> > >
> > > But B's suspend routine runs entirely in an async task, because that
> > > task is started by the PM core and it does the method call. Hence by
> > > the time B's suspend routine is called, A may already have begun
> > > suspending -- it's too late to take A's lock. To make the locking
> > > work, B would have to acquire A's lock _before_ B's async task starts.
> > > Since the PM core is unaware of the off-tree dependency, there's no
> > > simple way to make it work.
> >
> > Do not set async_suspend for B and instead start your own async thread
> > from its suspend callback. The parent-children synchronization is done by the
> > core anyway (at least I'd do it that way), so the only thing you need to worry
> > about is the extra dependency.
>
> I don't like that because it introduces "artificial" dependencies: It
> makes B depend on all the preceding synchronous suspends, even totally
> unrelated ones. But yes, it would work.

Well, unfortunately, it wouldn't, because (at least in the context of my last
patch) the core would release the rwsems as soon as your suspend had
returned. So you'd have to make your suspend wait for the async thread and
that would make it pointless. So scratch that, it wasn't a good idea at all.

This leaves us with basically two options, where the first one is to use
rwsems in a way that you've proposed (with iterating over children), and the
second one is to use completions. In my opinion rwsems don't give us any
advantage in this case, so I'd very much prefer to use completions.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: futex: Take mmap_sem for get_user_pages in fault_in_user_writeable
Next: [PATCH v9 0/8] Loongson: YeeLoong: add platform drivers