Prev: How do I ignore the changes made by CVS keyword substitution efficiently?
Next: [PATCH 1/2] x86: make save_stack_address() !CONFIG_FRAME_POINTER friendly
From: Ingo Molnar on 3 Jun 2010 20:50 * Ingo Molnar <mingo(a)elte.hu> wrote: > - Create a 'deep idle' mode that suspends. This, if all constraints > are met, is triggered by the scheduler automatically: just like the other > idle modes are triggered currently. This approach fixes the wakeup > races because an incoming wakeup event will set need_resched() and > abort the suspend. > > ( This mode can even use the existing suspend code to bring stuff down, > therefore it also solves the pending timer problem and works even on > PC style x86. ) Note that this does not necessarily have to be implemented as 'execute suspend from the idle task' code: scheduling from the idle task, while can certainly be made to work, is a somewhat recursive concept that we might want to avoid for robustness reasons. Instead, the 'deepest idle' (suspend) method could consist of a wakeup of a kernel thread (or of any of the existing kernel threads such as the migration thread) - which kernel thread then does a race-free suspend: it offlines all but one CPU [on platforms that need that] and then initiates the suspend - but aborts the attempt if there's any sign of wakeup activity. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 3 Jun 2010 22:30 On Fri, 4 Jun 2010, Ingo Molnar wrote: > > What you say is absolutely true, hence this would be driven via sched_tick() + > TIF notifiers - i.e. only ever treat user-mode tasks as 'idle-able'. This can > be done with no overhead to the regular fastpaths. > > The TIF notifier would be the one scheduling to idle - and would thus do it > only to user-mode tasks. The thing is, unless there is some _really_ deep other reason to do something like this, I still think it's total overdesign to push any knowledge/choices like this into the scheduler. I'd rather keep things way more independent, less tied to each other and to deep kernel subsystems. IOW, my personal opinion is that somethng like a suspend (blocker or not) decision simply shouldn't be important enough to be tied into the scheduler. Especially not if it could just be its own layer. That said, as far as I know, the Android people have mostly been looking at the suspend angle from a single-core standpoint. And I'm not at all convinced that they should hijack the existing "/sys/power/state" thing which is what I think they do now. And those two things go together. The /sys/power/state thing is a global suspend - which I don't think is appropriate for a opportunistic thing in the first place, especially for multi-core. A well-designed opportunistic suspend should be a two-phase thing: an opportunistc CPU hotunplug (shutting down cores one by one as the system is idle), and not a "global" event in the first place. And only when you've reached single-core state should you then say "do I suspend the system too". So I've tried to look a bit at the patches, and my admittedly rough comments so far is - I really do prefer the "off to the side" approach that the current google opportunistic suspend patches have. As mentioned, I don't think this should be deep in the scheduler. Not at all. - I do think there are possibly races and CPU idle issues there, but I think they are mainly for the multi-core thing. And I think that's a totally separate issue. Or it _should_ be. - once you're single-core (whether because you never had more cores to begin with, or because the "opportunistic CPU offlining" has taken down the other cores), I think the suspend-blocker is fine as a concept, and certainly shouldn't need any deep scheduler hooks. so I'd like to see the opportunistc suspend thing think about CPU offlining, and I'd like to see it disconnect from the existing /sys/power/state. And I'd really not like to involved deep internal kernel hooks into it. But I'll also admit that maybe I'm not seeing some problems. I've frankly tried to avoid the whole discussion until Andrew pulled me in yesterday. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 3 Jun 2010 22:40 On Thu, 3 Jun 2010, Linus Torvalds wrote: > > so I'd like to see the opportunistc suspend thing think about CPU > offlining Side note: one reason for me being somewhat interested in the CPU offlining is that I think the Android kind of opportunistic suspend is _not_ likely something I'd like to see on a desktop. But an the "opportunistic CPU offliner"? That might _well_ be useful even outside of any other suspend activity. If the system is idle (or almost idle) for long times, I would heartily recommend actively shutting down unused cores. Some CPU's are hopefully smart enough to not even need that kind of software management, but I suspect even the really smart ones might be able to take advantage of the kernel saying: "I'm shutting you down, you don't have to worry about latency AT ALL, because I'm keeping another CPU active to do any real work". I'd also be interested to see if it could even improve single-thread performance if we end up doing the whole SMP->UP "lock" prefix rewriting when the system is idle enough that we'd be better off running just a single core. I dunno - just throwing that out there. Anyway, the only reason I think this is related is literally because I think that if we know there is only a single CPU active, I think the actual "real" opportunistic suspend is easier. Suddenly you don't have to worry about what happens on other run-queues etc, and whether another CPU is just about to create a suspend block etc. So I think they tie together, although it's mostly tangential. And as mentioned, I think a opportunistic CPU suspend part is more relevant outside of Android, and thus perhaps more widely interesting. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on 3 Jun 2010 23:50 On Thu, 3 Jun 2010 19:26:50 -0700 (PDT) Linus Torvalds <torvalds(a)linux-foundation.org> wrote: > > If the system is idle (or almost idle) for long times, I would > heartily recommend actively shutting down unused cores. Some CPU's > are hopefully smart enough to not even need that kind of software > management, but I suspect even the really smart ones might be able to > take advantage of the kernel saying: "I'm shutting you down, you > don't have to worry about latency AT ALL, because I'm keeping another > CPU active to do any real work". sadly the reality is that "offline" is actually the same as "deepest C state". At best. As far as I can see, this is at least true for all Intel and AMD cpus. And because there's then no power saving (but a performance cost), it's actually a negative for battery life/total energy. (lots of experiments inside Intel seem to confirm that, it's not just theory) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on 3 Jun 2010 23:50
On Thu, Jun 3, 2010 at 7:16 PM, Linus Torvalds <torvalds(a)linux-foundation.org> wrote: > > > On Fri, 4 Jun 2010, Ingo Molnar wrote: >> >> What you say is absolutely true, hence this would be driven via sched_tick() + >> TIF notifiers - i.e. only ever treat user-mode tasks as 'idle-able'. This can >> be done with no overhead to the regular fastpaths. >> >> The TIF notifier would be the one scheduling to idle - and would thus do it >> only to user-mode tasks. > > The thing is, unless there is some _really_ deep other reason to do > something like this, I still think it's total overdesign to push any > knowledge/choices like this into the scheduler. I'd rather keep things way > more independent, less tied to each other and to deep kernel subsystems. > > IOW, my personal opinion is that somethng like a suspend (blocker or not) > decision simply shouldn't be important enough to be tied into the > scheduler. Especially not if it could just be its own layer. > > That said, as far as I know, the Android people have mostly been looking > at the suspend angle from a single-core standpoint. And I'm not at all > convinced that they should hijack the existing "/sys/power/state" thing > which is what I think they do now. > While it is true that we have not used this code on a multi core system yet, I'm not sure why multiple cores codes would affect it. We annotate that works needs to be done before it is safe to suspend, but we don't care which core does the work (or if multiple cores do pieces of it). > And those two things go together. The /sys/power/state thing is a global > suspend - which I don't think is appropriate for a opportunistic thing in > the first place, especially for multi-core. > > A well-designed opportunistic suspend should be a two-phase thing: an > opportunistc CPU hotunplug (shutting down cores one by one as the system > is idle), and not a "global" event in the first place. And only when > you've reached single-core state should you then say "do I suspend the > system too". > This seems to fit better into the cpuidle and/or frequency scaling framework. > So I've tried to look a bit at the patches, and my admittedly rough > comments so far is > > �- I really do prefer the "off to the side" approach that the current > � google opportunistic suspend patches have. As mentioned, I don't think > � this should be deep in the scheduler. Not at all. > > �- I do think there are possibly races and CPU idle issues there, but I > � think they are mainly for the multi-core thing. And I think that's a > � totally separate issue. Or it _should_ be. > I'm not aware of any races with multi-core systems unless there are existing problems in suspend. We check if any suspend blockers are active after disable_nonboot_cpus() has returned. > �- once you're single-core (whether because you never had more cores to > � begin with, or because the "opportunistic CPU offlining" has taken down > � the other cores), I think the suspend-blocker is fine as a concept, and > � certainly shouldn't need any deep scheduler hooks. > > so I'd like to see the opportunistc suspend thing think about CPU > offlining, I see this as a separate problem. We ignore a single busy CPU for opportunistic suspend, so why should the number of online CPUs matter? > and I'd like to see it disconnect from the existing > /sys/power/state. The entry point is not important to us. The current interface is what Rafael wanted instead of the /sys/power/request-state interface which is what we changed it to last year. > And I'd really not like to involved deep internal kernel > hooks into it. > > But I'll also admit that maybe I'm not seeing some problems. I've frankly > tried to avoid the whole discussion until Andrew pulled me in yesterday. > > � � � � � � � � � � � �Linus > -- Arve Hj�nnev�g -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |