Prev: [PATCH] time/fs - file's time race with vgettimeofday
Next: [PATCH v2] KVM: IOAPIC: only access APIC registers one dword at a time
From: Peter Zijlstra on 8 Jul 2010 10:20 Sorry for the delay, I got side-tracked :/ On Fri, 2010-07-02 at 09:56 -0700, Venkatesh Pallipadi wrote: > My fault. I completely missed that code path, assuming this_cpu in > load_balance means this_cpu :( Right, but local_group is still valid because that's cpumask_test_cpu(this_cpu, sched_group_cpus(group)); So in that regard your initial patch was still correct. However, since there can be multiple CPUs in the local group, we want to only have one do the actual balancing, which is what the !*balance logic is about, so by keeping the call site where it is gains that. The below proposal avoids calling update_groups_power() too often for: - !local_groups, - multiple cpus in the same group. Does this look correct? Related to this, Mikey was looking at avoiding the for_each_cpu_and() loops by converting update_sg_lb_stats into something similar to update_group_power() and have 1 cpu in the group update the statistics and propagate upwards by summing groups instead of individual cpus. --- kernel/sched_fair.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index 9910e1b..62f34a0 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd, return; } - update_group_power(sd, this_cpu); + if (local_group) + update_group_power(sd, this_cpu); /* Adjust by relative CPU power of the group */ sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 8 Jul 2010 13:50 On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote: > > @@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu, > init_sd_power_savings_stats(sd, sds, idle); > load_idx = get_sd_load_idx(sd, idle); > > + if (this_cpu == smp_processor_id()) > + update_group_power(sd, this_cpu); > + > do { > int local_group; Which will break for nohz_idle_balance.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Suresh Siddha on 8 Jul 2010 13:50 On Thu, 2010-07-08 at 07:12 -0700, Peter Zijlstra wrote: > Sorry for the delay, I got side-tracked :/ > > On Fri, 2010-07-02 at 09:56 -0700, Venkatesh Pallipadi wrote: > > My fault. I completely missed that code path, assuming this_cpu in > > load_balance means this_cpu :( > > Right, but local_group is still valid because that's > cpumask_test_cpu(this_cpu, sched_group_cpus(group)); > > So in that regard your initial patch was still correct. However, since > there can be multiple CPUs in the local group, we want to only have one > do the actual balancing, which is what the !*balance logic is about, so > by keeping the call site where it is gains that. > > The below proposal avoids calling update_groups_power() too often for: > - !local_groups, > - multiple cpus in the same group. > > Does this look correct? > > Related to this, Mikey was looking at avoiding the for_each_cpu_and() > loops by converting update_sg_lb_stats into something similar to > update_group_power() and have 1 cpu in the group update the statistics > and propagate upwards by summing groups instead of individual cpus. > > --- > kernel/sched_fair.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > index 9910e1b..62f34a0 100644 > --- a/kernel/sched_fair.c > +++ b/kernel/sched_fair.c > @@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd, > return; > } > > - update_group_power(sd, this_cpu); > + if (local_group) > + update_group_power(sd, this_cpu); if IDLE == CPU_NEWLY_IDLE, then all the cpu's in the local group will do this. Also update_group_power() can be done on only on the local cpu, i.e., when this_cpu == smp_processor_id() right? So it should be something like? diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index a878b53..2d8a74e 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -2406,8 +2406,6 @@ static inline void update_sg_lb_stats(struct sched_domain *sd, return; } - update_group_power(sd, this_cpu); - /* Adjust by relative CPU power of the group */ sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power; @@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu, init_sd_power_savings_stats(sd, sds, idle); load_idx = get_sd_load_idx(sd, idle); + if (this_cpu == smp_processor_id()) + update_group_power(sd, this_cpu); + do { int local_group; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Suresh Siddha on 8 Jul 2010 14:00 On Thu, 2010-07-08 at 10:49 -0700, Peter Zijlstra wrote: > On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote: > > > > @@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu, > > init_sd_power_savings_stats(sd, sds, idle); > > load_idx = get_sd_load_idx(sd, idle); > > > > + if (this_cpu == smp_processor_id()) > > + update_group_power(sd, this_cpu); > > + > > do { > > int local_group; > > Which will break for nohz_idle_balance.. Then the logic is broken somewhere because update_group_power() reads APERF/MPERF MSR's which doesn't make sense when this_cpu != smp_processor_id(). What am I missing? thanks, suresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 8 Jul 2010 14:00
On Thu, 2010-07-08 at 10:50 -0700, Suresh Siddha wrote: > On Thu, 2010-07-08 at 10:49 -0700, Peter Zijlstra wrote: > > On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote: > > > > > > @@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu, > > > init_sd_power_savings_stats(sd, sds, idle); > > > load_idx = get_sd_load_idx(sd, idle); > > > > > > + if (this_cpu == smp_processor_id()) > > > + update_group_power(sd, this_cpu); > > > + > > > do { > > > int local_group; > > > > Which will break for nohz_idle_balance.. > > Then the logic is broken somewhere because update_group_power() reads > APERF/MPERF MSR's which doesn't make sense when this_cpu != > smp_processor_id(). What am I missing? The APERF/MPERF code is utterly broken.. (and currently disabled by default) but yeah, that's one aspect of its brokenness I only realized after your email. The problem is that it measures current throughput, not current capacity. So for an idle thread/core it would return 0, instead of the potential. I've been meaning to revisit this.. maybe I should simply rip that out until I get it working. I was thinking of measuring temporal maxima to approximate capacity instead of throughput. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |