Prev: i915 / PM: Fix crash while aborting hibernation (Re: [linux-pm] [regression] "drm/i915: implement new pm ops" disables irq on aborted s2disk)
Next: [PATCH] sdio_uart: Use kfifo instead of the messy circ stuff
From: Vaidyanathan Srinivasan on 8 Feb 2010 05:10 Hi Peter, sched_mc_powersavings is broken in pre-Nehalem x86 platforms due to contradictory SD flags at MC level and CPU level. SD_PREFER_SIBLING being set at MC level is expected to do the following: a) Disable consolidating tasks to single group in the parent sched domain (generally single cpu package) b) Spread tasks equally across groups at the parent sched domain. While SD_POWERSAVINGS_BALANCE set at a sched domain will enable logic to consolidate tasks within minimum number of groups at that sched domain. Basically SD_POWERSAVINGS_BALANCE at one sched domain and its child domain having SD_PREFER_SIBLING is contradicting and disabling the SD_POWERSAVINGS_BALANCE logic in if (local_group && (sds->this_nr_running >= sgs->group_capacity || !sds->this_nr_running)) sds->power_savings_balance = 0; Since sgs.group_capacity is set to '1' by SD_PREFER_SIBLING in child sched domain. The attached patch will fix the expected behavior for sched_mc_powersavings > 0 while objective (b) is still an open issue. The following condition in find_busiest_group() sds.max_load <= sds.busiest_load_per_task treats unequally loaded groups as balanced as longs they are below capacity Test Results: The following patch was tested on dual socket quad core non-threaded Xeon: Running 4 while(1) loops in shell: echo 1 > /sys/devices/system/cpu/sched_mc_powersavings Without Patch: Running 1 task in one quad core package and 3 in another. This is effectively the baseline behavior with sched_mc=0 With patch: All 4 tasks running in one quad core package. Expected behavior for sched_mc_powersavings>0 --Vaidy Fix for sched_mc_powersavigs for pre-Nehalem platforms. Child sched domain should clear SD_PREFER_SIBLING if parent will have SD_POWERSAVINGS_BALANCE because they are contradicting. Sets the flags correctly based on sched_mc_power_savings. Signed-off-by: Vaidyanathan Srinivasan <svaidy(a)linux.vnet.ibm.com> diff --git a/include/linux/sched.h b/include/linux/sched.h index 6550415..ef6b7cd 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -866,7 +866,10 @@ static inline int sd_balance_for_mc_power(void) if (sched_smt_power_savings) return SD_POWERSAVINGS_BALANCE; - return SD_PREFER_SIBLING; + if (!sched_mc_power_savings) + return SD_PREFER_SIBLING; + + return 0; } static inline int sd_balance_for_package_power(void) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 8 Feb 2010 06:40 On Mon, 2010-02-08 at 15:35 +0530, Vaidyanathan Srinivasan wrote: > Fix for sched_mc_powersavigs for pre-Nehalem platforms. > Child sched domain should clear SD_PREFER_SIBLING if parent will have > SD_POWERSAVINGS_BALANCE because they are contradicting. > > Sets the flags correctly based on sched_mc_power_savings. > > Signed-off-by: Vaidyanathan Srinivasan <svaidy(a)linux.vnet.ibm.com> > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 6550415..ef6b7cd 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -866,7 +866,10 @@ static inline int sd_balance_for_mc_power(void) > if (sched_smt_power_savings) > return SD_POWERSAVINGS_BALANCE; > > - return SD_PREFER_SIBLING; > + if (!sched_mc_power_savings) > + return SD_PREFER_SIBLING; > + > + return 0; > } > > static inline int sd_balance_for_package_power(void) > Looks good, thanks! What's the status of getting rid of sched_{mc,smt}_power_savings? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Vaidyanathan Srinivasan on 8 Feb 2010 07:50
* Peter Zijlstra <peterz(a)infradead.org> [2010-02-08 12:35:48]: > On Mon, 2010-02-08 at 15:35 +0530, Vaidyanathan Srinivasan wrote: > > > Fix for sched_mc_powersavigs for pre-Nehalem platforms. > > Child sched domain should clear SD_PREFER_SIBLING if parent will have > > SD_POWERSAVINGS_BALANCE because they are contradicting. > > > > Sets the flags correctly based on sched_mc_power_savings. > > > > Signed-off-by: Vaidyanathan Srinivasan <svaidy(a)linux.vnet.ibm.com> > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index 6550415..ef6b7cd 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -866,7 +866,10 @@ static inline int sd_balance_for_mc_power(void) > > if (sched_smt_power_savings) > > return SD_POWERSAVINGS_BALANCE; > > > > - return SD_PREFER_SIBLING; > > + if (!sched_mc_power_savings) > > + return SD_PREFER_SIBLING; > > + > > + return 0; > > } > > > > static inline int sd_balance_for_package_power(void) > > > > Looks good, thanks! > > What's the status of getting rid of sched_{mc,smt}_power_savings? Hi Peter, With the current rearrangement of the code, the unified sched_power_savings seems more doable. However, I have few more fixes for sched_smt_powersavings on Nehalem before I would revisit the unified tunable. --Vaidy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |