Prev: futex: Take mmap_sem for get_user_pages in fault_in_user_writeable
Next: [PATCH v9 0/8] Loongson: YeeLoong: add platform drivers
From: Linus Torvalds on 8 Dec 2009 17:20 On Tue, 8 Dec 2009, Alan Stern wrote: > > > > Sure they can. Control dependencies are trivial - it's called "branch > > prediction", and everybody does it, and data dependencies don't exist on > > many CPU architectures (even to the point of reading through a pointer > > that you loaded). > > Wait a second. Are you saying that with code like this: > > if (x == 1) > y = 5; > > the CPU may write to y before it has finished reading the value of x? Well, in a way. The branch may have been predicted, and the CPU can _internally_ have done the 'y=5' thing into a write buffer before it even did the read. Some time later it will have to _verify_ the prediction and then perhaps kill the write before it makes it to a data structure that is visible to others, but internally from the CPU standpoint, yes, the write could have happened before the read. Now, whether that write is "before" or "after" the read is debatable. But one way of looking at it is certainly that the write took place earlier, and the read might have just caused it to be undone. And there are real effects of this - looking at the bus, you might have a bus transaction to get the cacheline that contains 'y' for exclusive access happen _before_ the bus transaction that reads in the value of 'x' (but you'd never see the writeout of that '5' before). > And this write is visible to other CPUs, so that if x was initially 0 > and a second CPU sets x to 1, the second CPU may see y == 5 before it > executes the write to x (whatever that may mean)? Well, yes and no. CPU1 above won't release the '5' until it has confirmed the '1' (even if it does so by reading it late). but assuming the other CPU also does speculation, then yes, the situation you describe could happen. If the other CPU does z = y; x = 1; then it's certainly possible that 'z' contains 5 at the end (even if both x and y started out zero). Because now the read of 'y' on that other CPU might be delayed, and the write of 'x' goes ahead, CPU1 sees the 1, and commits its write of 5, sp when CPU2 gets the cacheline, z will now contain 5. Is it likely? No. CPU microarchitectures aim to do reads early, and writes late. Reads are on the critical path, writes can be buffered. But you can basically get into "impossible" situations where a write that was _later_ in the instruction stream than a read (on CPU2, the 'store 1 to x' would be after the load of 'y' from memory) could show up in the other order on another CPU. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 8 Dec 2009 17:30 On Tue, 8 Dec 2009, Alan Stern wrote: > > And likewise in try_wait_for_completion(). It looks like a bug. Maybe > these routines were not intended to be called with interrupts disabled, > but that requirement doesn't seem to be documented. And it isn't a > natural requirement anyway. 'complete()' is supposed to be callable from interrupts, but the waiting ones aren't. But 'complete()' is all you should need to call from interrupts, so that's fine. So I think completions should work, if done right. That whole "make the parent wait for all the children to complete" is fine in that sense. And I'll happily take such an approach if my rwlock thing doesn't work. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 8 Dec 2009 17:40 On Tue, 8 Dec 2009, Alan Stern wrote: > > Suppose A and B are unrelated devices and we need to impose the > off-tree constraint that A suspends after B. Ah. Ok, I can imagine the off-tree constraints, but part of my "keep it simple" was to simply not do them. If there are constraints that aren't in the topology of the tree, then I simply don't think that async is worth it in the first place. > You misunderstand. The suspend algorithm will look like this: > > dpm_suspend() > { > list_for_each_entry_reverse(dpm_list, dev) { > down_write(dev->lock); > async_schedule(device_suspend, dev); > } > } > > device_suspend(dev) > { > device_for_each_child(dev, child) { > down_read(child->lock); > up_read(child->lock); > } > dev->suspend(dev); /* May do off-tree down+up pairs */ > up_write(dev->lock); > } Ok, so the above I think work (and see my previous email: I think completions would be workable there too). It's just that I think the "looping over children" is ugly, when I think that by doing it the other way around you can make the code simpler and only depend on the PM device list and a simple parent pointer access. I also think that you are wrong that the above somehow protects against non-topological dependencies. If the device you want to keep delay yourself suspending for is after you in the list, the down_read() on that may succeed simply because it hasn't even done its down_write() yet and you got scheduled early. But I guess you could do that by walking the list twice (first to lock them all, then to actually call the suspend function). That whole two-phase thing, except the first phase _only_ locks, and doesn't do any callbacks. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on 8 Dec 2009 17:40 On Tuesday 08 December 2009, Alan Stern wrote: > On Tue, 8 Dec 2009, Linus Torvalds wrote: > > > On Tue, 8 Dec 2009, Alan Stern wrote: > > > > > > That's not the way it should be done. Linus had children taking their > > > parents' locks during suspend, which is simple but leads to > > > difficulties. > > > > No it doesn't. Name them. > > Well, one difficulty. It arises only because we are contemplating > having the PM core fire up the async tasks, rather than having the > drivers' suspend routines launch them (the way your original proposal > did -- the difficulty does not arise there). > > Suppose A and B are unrelated devices and we need to impose the > off-tree constraint that A suspends after B. With children taking > their parent's lock, the way to prevent A from suspending too soon is > by having B's suspend routine acquire A's lock. > > But B's suspend routine runs entirely in an async task, because that > task is started by the PM core and it does the method call. Hence by > the time B's suspend routine is called, A may already have begun > suspending -- it's too late to take A's lock. To make the locking > work, B would have to acquire A's lock _before_ B's async task starts. > Since the PM core is unaware of the off-tree dependency, there's no > simple way to make it work. Do not set async_suspend for B and instead start your own async thread from its suspend callback. The parent-children synchronization is done by the core anyway (at least I'd do it that way), so the only thing you need to worry about is the extra dependency. > > That just complicates things. Compare to my simple locking scheme I've > > quoted several times. > > It is a little more complicated in that it involves explicitly > iterating over children. But it is simpler in that it can use > completions instead of rwsems and it avoids the off-tree dependency > problem described above. I would be slightly more comfortable using completions, but the rwsem-based approach is fine with me as well. Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on 8 Dec 2009 21:20
On Tue, 8 Dec 2009, Linus Torvalds wrote: > On Tue, 8 Dec 2009, Alan Stern wrote: > > > > And likewise in try_wait_for_completion(). It looks like a bug. Maybe > > these routines were not intended to be called with interrupts disabled, > > but that requirement doesn't seem to be documented. And it isn't a > > natural requirement anyway. > > 'complete()' is supposed to be callable from interrupts, but the waiting > ones aren't. But 'complete()' is all you should need to call from > interrupts, so that's fine. And try_wait_for_completion()? The fact that it doesn't block makes it interrupt-safe. What's the point of having an interrupt-safe routine that you can't call from within interrupt handlers? Even if nobody uses it that way now, there's no guarantee somebody won't attempt it in the future. > So I think completions should work, if done right. That whole "make the > parent wait for all the children to complete" is fine in that sense. And > I'll happily take such an approach if my rwlock thing doesn't work. In principle the two approaches could be combined: Add an rwsem for use by children and a completion for off-tree[*] use. But that would certainly be overkill. Looping over children doesn't take a tremendous amount of time compared to a full system suspend. Alan Stern [*] "Off-tree" isn't really an appropriate term; these devices aren't "off" the tree. "Non-tree" would be better. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |