Prev: trivial: clear the comment on parameters for ext2_xattr_set
Next: [PATCH] inotify: allow users to request not to recieve events on unlinked children
From: H. Peter Anvin on 13 Jul 2010 16:10 [Adding Borislav and Hans to the Cc: list] Any objections? Otherwise I'm going to queue this up in the urgent queue. -hpa On 07/13/2010 11:59 AM, Michal Schmidt wrote: > On my system with AMD Phenom II X6 I am seeing pauses at boot (usually during > udev startup) which require a key press to continue. It only happens if C1E is > enabled in the BIOS. > > It's caused by the APIC timer's inability to wake up the CPU from C1E (AMD > erratum #400). Linux has a workaround for it, but it's not being applied > correctly in this case. Though c1e_idle() detects C1E just fine, by the time > acpi_idle ('processor.ko' module) takes over, it is forgotten. > > After AMD C1E is detected, it is not sufficient to flag it in boot_cpu_data, > because the flag will get cleared in identify_cpu() when more CPUs are brought > up later. The fix is to mark the flag as forced. > > The additional call to set_cpu_cap() is just to make sure the flag is set even > on the CPUs that are already up and /proc/cpuinfo shows 'amdc1e' on all. > > Also fix indentation in the function. > > Signed-off-by: Michal Schmidt <mschmidt(a)redhat.com> > --- > > arch/x86/kernel/process.c | 9 +++++---- > 1 files changed, 5 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index e7e3521..f3520a8 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -589,7 +589,7 @@ static void c1e_idle(void) > if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) > mark_tsc_unstable("TSC halt in AMD C1E"); > printk(KERN_INFO "System has AMD C1E enabled\n"); > - set_cpu_cap(&boot_cpu_data, X86_FEATURE_AMDC1E); > + setup_force_cpu_cap(X86_FEATURE_AMDC1E); > } > } > > @@ -605,6 +605,7 @@ static void c1e_idle(void) > &cpu); > printk(KERN_INFO "Switch to broadcast mode on CPU%d\n", > cpu); > + set_cpu_cap(¤t_cpu_data, X86_FEATURE_AMDC1E); > } > clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu); > > @@ -614,9 +615,9 @@ static void c1e_idle(void) > * The switch back from broadcast mode needs to be > * called with interrupts disabled. > */ > - local_irq_disable(); > - clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); > - local_irq_enable(); > + local_irq_disable(); > + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); > + local_irq_enable(); > } else > default_idle(); > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on 13 Jul 2010 16:40 From: "H. Peter Anvin" <hpa(a)zytor.com> Date: Tue, Jul 13, 2010 at 01:05:06PM -0700 > [Adding Borislav and Hans to the Cc: list] > > Any objections? Otherwise I'm going to queue this up in the urgent queue. Yeah, I was staring at it already. And it looks ok at a first glance. However, since you want to speed it up into urgent, I'd like to give it a run at our X6 boxes tomorrow before ACKing it.. > On 07/13/2010 11:59 AM, Michal Schmidt wrote: > > On my system with AMD Phenom II X6 I am seeing pauses at boot (usually during > > udev startup) which require a key press to continue. It only happens if C1E is > > enabled in the BIOS. .... and this is strange, I didn't experience anything like that on our X6 boxes here couple of months ago. But maybe something got changed in later kernels to trigger that behavior. Michal, what chipset is that and do you have the latest BIOS on it? -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michal Schmidt on 13 Jul 2010 17:10 ----- "Borislav Petkov" <borislav.petkov(a)amd.com> wrote: > ... and this is strange, I didn't experience anything like that on > our X6 boxes here couple of months ago. But maybe something got changed > in later kernels to trigger that behavior. Michal, what chipset is that > and do you have the latest BIOS on it? The chipset is 890FX. Motherboard Asus M4A89TD PRO/USB3. BIOS is the latest version: 0901 (05/17/2010 according to dmidecode) I found another person reporting the same symptoms on GIGABYTE GA-MA770T-UD3P AthlonXII 620 (4 cores) kernel 2.6.33.5 (Mandriva) (http://www.abclinuxu.cz/poradna/hardware/show/308799, in Czech) Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: john stultz on 13 Jul 2010 21:40 On Tue, Jul 13, 2010 at 2:01 PM, Michal Schmidt <mschmidt(a)redhat.com> wrote: > ----- "Borislav Petkov" <borislav.petkov(a)amd.com> wrote: >> ... and this is strange, I didn't experience anything like that on >> our X6 boxes here couple of months ago. But maybe something got changed >> in later kernels to trigger that behavior. Michal, what chipset is that >> and do you have the latest BIOS on it? > > The chipset is 890FX. Motherboard Asus M4A89TD PRO/USB3. > BIOS is the latest version: 0901 (05/17/2010 according to dmidecode) > > I found another person reporting the same symptoms on > GIGABYTE GA-MA770T-UD3P AthlonXII 620 (4 cores) kernel 2.6.33.5 (Mandriva) > (http://www.abclinuxu.cz/poradna/hardware/show/308799, in Czech) This also sounds like: https://bugzilla.kernel.org/show_bug.cgi?id=15289 thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on 14 Jul 2010 12:10
From: Michal Schmidt <mschmidt(a)redhat.com> Date: Tue, Jul 13, 2010 at 08:59:58PM +0200 Hi, > On my system with AMD Phenom II X6 I am seeing pauses at boot (usually during > udev startup) which require a key press to continue. It only happens if C1E is > enabled in the BIOS. > > It's caused by the APIC timer's inability to wake up the CPU from C1E (AMD > erratum #400). Linux has a workaround for it, but it's not being applied > correctly in this case. Though c1e_idle() detects C1E just fine, by the time > acpi_idle ('processor.ko' module) takes over, it is forgotten. > > After AMD C1E is detected, it is not sufficient to flag it in boot_cpu_data, > because the flag will get cleared in identify_cpu() when more CPUs are brought > up later. The fix is to mark the flag as forced. I don't think that the workaround is wrong, assuming I'm not missing something. I'm seeing the following sequence on my machine here in which the cores are brought up and checked for c1e: The BSP does start_kernel() |->check_bugs() |->identify_boot_cpu() # here we do select_idle_routine(), i.e. pm_idle = c1e_idle .... |->rest_init() |->kernel_thread(kernel_init,... ) and kernel_init() does smp_init() where we init the rest of the cores. Now, each core does start_secondary() |->smp_callin() |->smp_store_cpu_info # here we copy boot_cpu_data for the starting AP |->identify_secondary_cpu |->identify_cpu |->select_idle_routine() # here we exit early since pm_idle is set already now all the cores except the BSP do cpu_idle but since bits 27,28 in the int pending MSR (see below) are not set yet, they spin a bit in cpu_idle doing default_idle. You can see this with my debugging patch below. Now here comes the key moment - the BSP enters cpu_idle _after_ all APs have been initialized and does set X86_FEATURE_AMDC1E. We haven't set the c1e_detected variable earlier since the hardware sets bit 28 in MSR_K8_INT_PENDING_MSG, C1eOnCmpHalt only after all cores have entered halt. After this bit is set, we set c1e_detected and switch to broadcast mode on each core. Now the question is, why does your system doesn't do that in that order? And I don't think your patch is the right fix - it doesn't change anything in the above sequence on my system except enabling the "amdc1e" feature string in /proc/cpuinfo which we don't need actually since dmesg already contains that info. And the more puzzling question is, how does your patch fix your system...? So, IMHO, what is more likely is that it has something to do with https://bugzilla.kernel.org/show_bug.cgi?id=15289, as John pointed out earlier (thanks John, Michal's situation looks quite similar). So, please apply the debug patch below and send me your whole dmesg to see what happens. Also, I'd like to see whether the SMI bit (27) in that same MSR is set so please do when the machine is up for i in $(seq 0 5); do lsmsr -c $i Int -V 3; done after installing the x86info tool. Thanks. C1E dbg patch: --- arch/x86/include/asm/acpi.h | 11 +++++++++-- arch/x86/kernel/process.c | 12 ++++++++++++ 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h index aa2c39d..39b8348 100644 --- a/arch/x86/include/asm/acpi.h +++ b/arch/x86/include/asm/acpi.h @@ -123,6 +123,9 @@ extern void acpi_reserve_wakeup_memory(void); */ static inline unsigned int acpi_processor_cstate_check(unsigned int max_cstate) { + + pr_err("%s: enter\n", __func__); + /* * Early models (<=5) of AMD Opterons are not supposed to go into * C2 state. @@ -134,10 +137,14 @@ static inline unsigned int acpi_processor_cstate_check(unsigned int max_cstate) boot_cpu_data.x86_model <= 0x05 && boot_cpu_data.x86_mask < 0x0A) return 1; - else if (boot_cpu_has(X86_FEATURE_AMDC1E)) + else if (boot_cpu_has(X86_FEATURE_AMDC1E)) { + pr_err("%s: C1E\n", __func__); return 1; - else + } + else { + pr_err("%s: max_cstate: %d\n", __func__, max_cstate); return max_cstate; + } } static inline bool arch_has_acpi_pdc(void) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index cfe109d..116c8bd 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -573,6 +573,18 @@ static void c1e_idle(void) if (need_resched()) return; + if (!boot_cpu_has(X86_FEATURE_AMDC1E)) { + u32 lo, hi; + + rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi); + + pr_err("%s: bits 0x%08x\n", + __func__, lo & K8_INTP_C1E_ACTIVE_MASK); + + pr_err("%s: cpu: %d, c1e_detected: %d\n", + __func__, raw_smp_processor_id(), c1e_detected); + } + if (!c1e_detected) { u32 lo, hi; -- 1.7.0 -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |