Prev: [GIT PULL] MSM mmc_sdcc updates for 2.6.36
Next: [tip:perf/urgent] perf: Add back list_head data types
From: James Bottomley on 13 Aug 2010 13:20 On Fri, 2010-08-13 at 19:19 +0300, Felipe Contreras wrote: > On Fri, Aug 13, 2010 at 6:57 PM, Dominik Brodowski > <linux(a)dominikbrodowski.net> wrote: > >> >> Not Ubuntu, not Fedora, not MeeGo, not anyone with a typical > >> >> user-space seems to be having this problem. I can argue to you that > >> >> this problem can be solved in easier ways, but instead I will argue > >> >> that perhaps we should wait for somebody besides Android to complain > >> >> about it before providing a "solution". Because after all, what good > >> >> is a "solution" provided by the kernel, if the user-space is not going > >> >> to use it, ever. > >> > > >> > At this point in the discussion, I am quite prepared to believe that you > >> > will avoid using suspend blockers, and that you will further do everything > >> > in your power to prevent anyone else from using suspend blockers. ;-) > >> > >> I'm not tying anybody's hands. > >> > >> How are people using real-time linux if it's not on mainline? Well, > >> duuh, you apply the patches. If say Fedora was interested on it, they > >> could apply the patches, and see for themselves. People do that all > >> the time, with the mm tree, with Con Koliva's patches, etc. Once > >> people are happy with the results, things get merged. Why should this > >> be any different? > > > > Because millions of users are happy -- with Android, including suspend > > blockers. > > I explicitly said somebody besides Android, specifically, somebody > with a typical linux ecosystem. You are not addressing the argument at > hand, that nobody else wants to tackle the issue this way, thus only > making the discussion more difficult. Can we stop arguing about the pointless? The facts are that suspend blockers identifies a race within our suspend to ram system that permeates from top to bottom (that's from server to mobile). The problem is that resume events are racy with respect to suspend and vice versa. This manifests itself most annoyingly on my laptop in the "double suspend" case: where I suspend with a pending suspend event, my laptop will resume and then immediately re-suspend (leading me to kick myself and remind myself to check it stayed up before pushing unsuspend and walking away). The other annoying case is that if I accidentally close the lid before presenting, I have to wait until the system is fully down before pressing resume. In a Data Centre controlling power, if you sent a suspend then a wake on lan, there's a window where the machine will still be down (because the wol got ignored). There are easy fixes to all the above ... I should wait to verify suspend and resume in my laptop and I have to accept the wait time between the two. In the data centre, you just repeat your power control commands a few times with about 5s between them and so on. The simple hacky work arounds mean that a user space invasive solution like suspend blockers is a bit of a non starter as a solution to the general case. However, it has shown that we do have a problem and furthermore it's a problem encountered by more than android. The technical problem with suspend blockers is that they're a solution to a general problem that only works for a specific case. What we're searching for is a general solution that can also be used in the android specific case. So far, we have three possibilities: 1. Stubs with deprecation - this has been rejected by android, so looks like a non starter. 2. update pm_qos so that the suspend blocks become qos constraints. This may or may not be coupled with a user space suspend manager, but in the latter case it's essentially full suspend blockers (with the additional opportunistic suspend kernel code) but with information systems outside of android can use. 3. Rafael's patch that makes it possible to avoid the races between wakeup and suspend. This requires a user space suspend manager (There's a whole other load of implementation details like stats and the like, but the above is the concept view). Unless anyone has something substantive to add to either the problem space or the solution space, the android discussion piece of this thread has degenerated to pure noise. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ted Ts'o on 13 Aug 2010 15:10 On Fri, Aug 13, 2010 at 01:11:29PM -0400, James Bottomley wrote: > > The facts are that suspend blockers identifies a race within our suspend > to ram system that permeates from top to bottom (that's from server to > mobile). The problem is that resume events are racy with respect to > suspend and vice versa. This manifests itself most annoyingly on my > laptop in the "double suspend" case: where I suspend with a pending > suspend event, my laptop will resume and then immediately re-suspend > (leading me to kick myself and remind myself to check it stayed up > before pushing unsuspend and walking away). The other annoying case is > that if I accidentally close the lid before presenting, I have to wait > until the system is fully down before pressing resume. This is all true, but it's also only one aspect of the problem. I agree with you that this is the part of the problem which affects Linux at all scales, from Cloud servers in a data center that want to suspend themselves when there's no work to do (and then fail to respond to the WOL packet) to mobile platforms that are suspending much more frequently. However, it doesn't follow that this is the _only_ problem that the Android folks might be interested in solving. Opportunistic suspend is a different part of the problem space, which is generally believed by the Android developers as being far more efficient than a user-space suspend manager. Rafael has stated his complete unwillingness to deal with this part of the problem. OK, so that probably means that for Android, it will have to be an out-of-tree kernel patch. The question, then, is whether a solution which addresses the only part of the problem which Rafael is interested in dealing with at this point, is sufficient such that (a) the kernel-level opportunistic suspend can be done as an out-of-tree patch, while simultaneously (b) allowing device drivers for Android devices can utilize Rafael's interfaces to solve the race design bug currently found in our suspend subsystem, while (c) requiring minimal changes to the Android userspace, and (d) providing all of the statistics and debugging functionality required by the Android userspace. If we can engineer a solution which meets (a), (b), (c), and (d) above, then everyone will be happy. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Brian Swetland on 13 Aug 2010 15:30 On Fri, Aug 13, 2010 at 12:08 PM, Ted Ts'o <tytso(a)mit.edu> wrote: > On Fri, Aug 13, 2010 at 01:11:29PM -0400, James Bottomley wrote: >> >> The facts are that suspend blockers identifies a race within our suspend >> to ram system that permeates from top to bottom (that's from server to >> mobile). The problem is that resume events are racy with respect to >> suspend and vice versa. This manifests itself most annoyingly on my >> laptop in the "double suspend" case: where I suspend with a pending >> suspend event, my laptop will resume and then immediately re-suspend >> (leading me to kick myself and remind myself to check it stayed up >> before pushing unsuspend and walking away). The other annoying case is >> that if I accidentally close the lid before presenting, I have to wait >> until the system is fully down before pressing resume. > > This is all true, but it's also only one aspect of the problem. I > agree with you that this is the part of the problem which affects > Linux at all scales, from Cloud servers in a data center that want to > suspend themselves when there's no work to do (and then fail to > respond to the WOL packet) to mobile platforms that are suspending > much more frequently. > > However, it doesn't follow that this is the _only_ problem that the > Android folks might be interested in solving. Opportunistic suspend > is a different part of the problem space, which is generally believed > by the Android developers as being far more efficient than a > user-space suspend manager. Rafael has stated his complete > unwillingness to deal with this part of the problem. OK, so that > probably means that for Android, it will have to be an out-of-tree > kernel patch. > > The question, then, is whether a solution which addresses the only > part of the problem which Rafael is interested in dealing with at this > point, is sufficient such that (a) the kernel-level opportunistic > suspend can be done as an out-of-tree patch, while simultaneously (b) > allowing device drivers for Android devices can utilize Rafael's > interfaces to solve the race design bug currently found in our suspend > subsystem, while (c) requiring minimal changes to the Android > userspace, and (d) providing all of the statistics and debugging > functionality required by the Android userspace. > > If we can engineer a solution which meets (a), (b), (c), and (d) > above, then everyone will be happy. Arve's suspend blockers patch stack actually separates the core functionality (ability for drivers to delay suspend while doing work suspend would interfere with), from the ability to hold suspend blockers from userspace (a separate, smaller patch building on the core functionality). Brian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on 13 Aug 2010 20:50 On Fri, 2010-08-13 at 15:08 -0400, Ted Ts'o wrote: > On Fri, Aug 13, 2010 at 01:11:29PM -0400, James Bottomley wrote: > > > > The facts are that suspend blockers identifies a race within our suspend > > to ram system that permeates from top to bottom (that's from server to > > mobile). The problem is that resume events are racy with respect to > > suspend and vice versa. This manifests itself most annoyingly on my > > laptop in the "double suspend" case: where I suspend with a pending > > suspend event, my laptop will resume and then immediately re-suspend > > (leading me to kick myself and remind myself to check it stayed up > > before pushing unsuspend and walking away). The other annoying case is > > that if I accidentally close the lid before presenting, I have to wait > > until the system is fully down before pressing resume. > > This is all true, but it's also only one aspect of the problem. I > agree with you that this is the part of the problem which affects > Linux at all scales, from Cloud servers in a data center that want to > suspend themselves when there's no work to do (and then fail to > respond to the WOL packet) to mobile platforms that are suspending > much more frequently. > > However, it doesn't follow that this is the _only_ problem that the > Android folks might be interested in solving. Opportunistic suspend > is a different part of the problem space, which is generally believed > by the Android developers as being far more efficient than a > user-space suspend manager. Rafael has stated his complete > unwillingness to deal with this part of the problem. OK, so that > probably means that for Android, it will have to be an out-of-tree > kernel patch. OK, so I tried desperately to avoid the question of whether opportunistic suspend is a good way of managing power. However, it seems to me that it is in use by several systems (android, olpc, etc). I'll defer the question of whether it's better in user space or kernel space to Rafael's investigations ... but I will point out that the kernel space patch, once the suspend blockers issue is taken care of looks like a single patch to one file, so should be locally containable and should allow upstream to be useful as the driver base again. > The question, then, is whether a solution which addresses the only > part of the problem which Rafael is interested in dealing with at this > point, is sufficient such that (a) the kernel-level opportunistic > suspend can be done as an out-of-tree patch, while simultaneously (b) > allowing device drivers for Android devices can utilize Rafael's > interfaces to solve the race design bug currently found in our suspend > subsystem, while (c) requiring minimal changes to the Android > userspace, and (d) providing all of the statistics and debugging > functionality required by the Android userspace. > > If we can engineer a solution which meets (a), (b), (c), and (d) > above, then everyone will be happy. That's my goal. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Pages: 1 2 Prev: [GIT PULL] MSM mmc_sdcc updates for 2.6.36 Next: [tip:perf/urgent] perf: Add back list_head data types |