Prev: trivial: clear the comment on parameters for ext2_xattr_set
Next: [PATCH] inotify: allow users to request not to recieve events on unlinked children
From: Borislav Petkov on 18 Jul 2010 14:20 From: Heinz Diehl <htd(a)fancy-poultry.org> Date: Sun, Jul 18, 2010 at 12:22:39PM -0400 > On 15.07.2010, Michal Schmidt wrote: > > > This suggests that another way to fix my problem would be this (tested): > [....] > > Did apply this patch to stock 2.6.35-rc5, it does _not_ fix my C1E problem. > It takes a lot of key pressing in the boot process to get the system up, > just as it has been before. Well, no wonder it wouldn't work - you seem to have that funny Gigabyte BIOS which botches lapic and ioapic id enumeration and obviously enabling C1E in the SMI handler (excerpt from your dmesg): ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] disabled) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled) ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x05] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x06] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x07] dfl dfl lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 33, address 0xfec00000, GSI 0-23 See how the IOAPIC's and the third LAPIC's ids are the same? That's wrong. There are couple people with the same problem: https://bugzilla.kernel.org/show_bug.cgi?id=15289. And as comment #53 says, we're trying to talk to Gigabyte to fix this and the C1E problem. For now, you can disable C1E in the BIOS, use "idle=mwait" and put yourself on the CC list of that bug. Sorry, I wish I could give you better news. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on 19 Jul 2010 15:40 From: Heinz Diehl <htd(a)fancy-poultry.org> Date: Sun, Jul 18, 2010 at 12:22:39PM -0400 > On 15.07.2010, Michal Schmidt wrote: > > > This suggests that another way to fix my problem would be this (tested): > [....] > > Did apply this patch to stock 2.6.35-rc5, it does _not_ fix my C1E problem. > It takes a lot of key pressing in the boot process to get the system up, > just as it has been before. Ok, come to think of it, there are a couple of things you could also test: First of all, there's this HPET readback on ATI chipsets which wasn't in your testing kernel and it would be a good thing to test it - I'm little sceptical but it fixes the same symptoms for another user so it wouldn't hurt. You'll have to apply the following patches: 1. The latest fix from Michal (adding it here for completeness): -- From 8edc23442afd629e71b17789fdf2a4b657c29e37 Mon Sep 17 00:00:00 2001 From: Michal Schmidt <mschmidt(a)redhat.com> Date: Wed, 14 Jul 2010 17:31:02 -0400 Subject: [PATCH] x86: fix keeping track of AMD C1E On Wed, 14 Jul 2010 23:22:01 +0200 Michal Schmidt wrote: > identify_cpu: before ANDing, c1e_detected: 0, boot_cpu_has(C1E): 0 > identify_cpu: after ANDing, c1e_detected: 0, boot_cpu_has(C1E): 0 > c1e_idle: cpu: 1, bits 0x10000000, c1e_detected: 0, > boot_cpu_has(C1E): 0 lockdep: fixing up alternatives. > #2 > System has AMD C1E enabled > Switch to broadcast mode on CPU1 > identify_cpu: before ANDing, c1e_detected: 1, boot_cpu_has(C1E): 1 > identify_cpu: after ANDing, c1e_detected: 1, boot_cpu_has(C1E): 0 > Switch to broadcast mode on CPU2 > lockdep: fixing up alternatives. > #3 > identify_cpu: before ANDing, c1e_detected: 1, boot_cpu_has(C1E): 0 > identify_cpu: after ANDing, c1e_detected: 1, boot_cpu_has(C1E): 0 > Switch to broadcast mode on CPU3 > lockdep: fixing up alternatives. > #4 > identify_cpu: before ANDing, c1e_detected: 1, boot_cpu_has(C1E): 0 > identify_cpu: after ANDing, c1e_detected: 1, boot_cpu_has(C1E): 0 > Switch to broadcast mode on CPU4 > lockdep: fixing up alternatives. > #5 Ok. > identify_cpu: before ANDing, c1e_detected: 1, boot_cpu_has(C1E): 0 > identify_cpu: after ANDing, c1e_detected: 1, boot_cpu_has(C1E): 0 > Brought up 6 CPUs > Switch to broadcast mode on CPU5 > Total of 6 processors activated (38528.67 BogoMIPS). > Switch to broadcast mode on CPU0 This suggests that another way to fix my problem would be this (tested): --- arch/x86/include/asm/acpi.h | 8 ++++++-- arch/x86/include/asm/cpufeature.h | 2 +- arch/x86/include/asm/processor.h | 1 + arch/x86/kernel/process.c | 12 ++++++++++-- drivers/acpi/processor_idle.c | 2 +- 5 files changed, 19 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h index aa2c39d..7583f19 100644 --- a/arch/x86/include/asm/acpi.h +++ b/arch/x86/include/asm/acpi.h @@ -134,10 +134,14 @@ static inline unsigned int acpi_processor_cstate_check(unsigned int max_cstate) boot_cpu_data.x86_model <= 0x05 && boot_cpu_data.x86_mask < 0x0A) return 1; - else if (boot_cpu_has(X86_FEATURE_AMDC1E)) + else if (c1e_detected) { + pr_err("%s: C1E\n", __func__); return 1; - else + } + else { + pr_err("%s: max_cstate: %d\n", __func__, max_cstate); return max_cstate; + } } static inline bool arch_has_acpi_pdc(void) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 4681459..353154e 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -89,7 +89,7 @@ #define X86_FEATURE_LFENCE_RDTSC (3*32+18) /* "" Lfence synchronizes RDTSC */ #define X86_FEATURE_11AP (3*32+19) /* "" Bad local APIC aka 11AP */ #define X86_FEATURE_NOPL (3*32+20) /* The NOPL (0F 1F) instructions */ -#define X86_FEATURE_AMDC1E (3*32+21) /* AMD C1E detected */ + /* 21 missing, was AMD_C1E workaround */ #define X86_FEATURE_XTOPOLOGY (3*32+22) /* cpu topology enum extensions */ #define X86_FEATURE_TSC_RELIABLE (3*32+23) /* TSC is known to be reliable */ #define X86_FEATURE_NONSTOP_TSC (3*32+24) /* TSC does not stop in C states */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7e5c6a6..336851e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -762,6 +762,7 @@ extern void init_c1e_mask(void); extern unsigned long boot_option_idle_override; extern unsigned long idle_halt; extern unsigned long idle_nomwait; +extern int c1e_detected; /* * on systems with caches, caches must be flashed as the absolute diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index e7e3521..0c2d4df 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -561,8 +561,10 @@ no_c1e_idle: return 0; } +int c1e_detected; +EXPORT_SYMBOL(c1e_detected); + static cpumask_var_t c1e_mask; -static int c1e_detected; void c1e_remove_cpu(int cpu) { @@ -584,12 +586,18 @@ static void c1e_idle(void) u32 lo, hi; rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi); + + pr_err("%s: bits 0x%08x\n", + __func__, lo & K8_INTP_C1E_ACTIVE_MASK); + + pr_err("%s: cpu: %d, c1e_detected: %d\n", + __func__, raw_smp_processor_id(), c1e_detected); + if (lo & K8_INTP_C1E_ACTIVE_MASK) { c1e_detected = 1; if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) mark_tsc_unstable("TSC halt in AMD C1E"); printk(KERN_INFO "System has AMD C1E enabled\n"); - set_cpu_cap(&boot_cpu_data, X86_FEATURE_AMDC1E); } } diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index b1b3856..7cd95eb 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -159,7 +159,7 @@ static void lapic_timer_check_state(int state, struct acpi_processor *pr, if (cpu_has(&cpu_data(pr->id), X86_FEATURE_ARAT)) return; - if (boot_cpu_has(X86_FEATURE_AMDC1E)) + if (c1e_detected) type = ACPI_STATE_C1; /* -- and 2. the patch at http://git.kernel.org/tip/08be97962bf338161325d4901642f956ce8c1adb Please boot this on your machine and send me the whole dmesg, as usual. Now, if it still shows hickups, we'd like to rule out that there's some funny HPET IRQ routing issue so please rerun the same test with the same 2 patches ontop but also with "nolapic_timer hpet=verbose" on the kernel command line. As above, catch the whole dmesg and send it to me. Thanks. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Heinz Diehl on 20 Jul 2010 11:20 On 19.07.2010, Borislav Petkov wrote: [Patches] > Please boot this on your machine and send me the whole dmesg, as usual. Appying these 2 patches to 2.6.35-rc5 didn't help, the machine is stalling as usual and several key presses are required to get it to boot. > Now, if it still shows hickups, we'd like to rule out that there's some > funny HPET IRQ routing issue so please rerun the same test with the same > 2 patches ontop but also with "nolapic_timer hpet=verbose" on the kernel > command line. Did it, and now it boots fine without any delay/stalling. Did also try stock/vanilla 2.6.35-rc5 with "nolapic_timer hpet=verbose", and it does boot fine, too. > As above, catch the whole dmesg and send it to me. Attached are three files, dmesg-01 = output of 2.6.35-rc5 + the two patches dmesg-02 = output of 2.6.35-rc5 + the two patches + "nolapic_timer hpet=verbose" dmesg-03 = vanilla 2.6.35-rc5 + "nolapic_timer hpet=verbose" Thanks, Heinz.
From: Borislav Petkov on 22 Jul 2010 11:10 From: Borislav Petkov <borislav.petkov(a)amd.com> Date: Sat, Jul 17, 2010 at 06:21:08AM -0400 > Btw, I think we should wait with whatever fix we come up until the > merge window so that we have more time to fix any fallout then (which I > don't expect but who knows) instead of rushing this now. We can always > backport it then too. Ok, I think we should go ahead and queue this up for .36 for now, let it see some linux-next time and such. The other issue with the Gigabyte boards is still ongoing and, as it looks so far, unrelated. Michal, scream if you have objections to the patch: -- From: Michal Schmidt <mschmidt(a)redhat.com> Date: Wed, 14 Jul 2010 17:31:02 -0400 Subject: [PATCH] x86: fix keeping track of AMD C1E Accomodate the original C1E-aware idle routine to the different points during boot when the BIOS enables C1E. While at it, remove the synthetic CPUID flag in favor of a single global setting which denotes C1E status on the system. Signed-off-by: Michal Schmidt <mschmidt(a)redhat.com> Signed-off-by: Borislav Petkov <borislav.petkov(a)amd.com> --- arch/x86/include/asm/acpi.h | 2 +- arch/x86/include/asm/cpufeature.h | 2 +- arch/x86/include/asm/processor.h | 1 + arch/x86/kernel/process.c | 6 ++++-- drivers/acpi/processor_idle.c | 2 +- 5 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h index aa2c39d..92091de 100644 --- a/arch/x86/include/asm/acpi.h +++ b/arch/x86/include/asm/acpi.h @@ -134,7 +134,7 @@ static inline unsigned int acpi_processor_cstate_check(unsigned int max_cstate) boot_cpu_data.x86_model <= 0x05 && boot_cpu_data.x86_mask < 0x0A) return 1; - else if (boot_cpu_has(X86_FEATURE_AMDC1E)) + else if (c1e_detected) return 1; else return max_cstate; diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 4681459..353154e 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -89,7 +89,7 @@ #define X86_FEATURE_LFENCE_RDTSC (3*32+18) /* "" Lfence synchronizes RDTSC */ #define X86_FEATURE_11AP (3*32+19) /* "" Bad local APIC aka 11AP */ #define X86_FEATURE_NOPL (3*32+20) /* The NOPL (0F 1F) instructions */ -#define X86_FEATURE_AMDC1E (3*32+21) /* AMD C1E detected */ + /* 21 missing, was AMD_C1E workaround */ #define X86_FEATURE_XTOPOLOGY (3*32+22) /* cpu topology enum extensions */ #define X86_FEATURE_TSC_RELIABLE (3*32+23) /* TSC is known to be reliable */ #define X86_FEATURE_NONSTOP_TSC (3*32+24) /* TSC does not stop in C states */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7e5c6a6..336851e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -762,6 +762,7 @@ extern void init_c1e_mask(void); extern unsigned long boot_option_idle_override; extern unsigned long idle_halt; extern unsigned long idle_nomwait; +extern int c1e_detected; /* * on systems with caches, caches must be flashed as the absolute diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index e7e3521..1b44a5c 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -561,8 +561,10 @@ no_c1e_idle: return 0; } +int c1e_detected; +EXPORT_SYMBOL(c1e_detected); + static cpumask_var_t c1e_mask; -static int c1e_detected; void c1e_remove_cpu(int cpu) { @@ -584,12 +586,12 @@ static void c1e_idle(void) u32 lo, hi; rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi); + if (lo & K8_INTP_C1E_ACTIVE_MASK) { c1e_detected = 1; if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) mark_tsc_unstable("TSC halt in AMD C1E"); printk(KERN_INFO "System has AMD C1E enabled\n"); - set_cpu_cap(&boot_cpu_data, X86_FEATURE_AMDC1E); } } diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index b1b3856..7cd95eb 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -159,7 +159,7 @@ static void lapic_timer_check_state(int state, struct acpi_processor *pr, if (cpu_has(&cpu_data(pr->id), X86_FEATURE_ARAT)) return; - if (boot_cpu_has(X86_FEATURE_AMDC1E)) + if (c1e_detected) type = ACPI_STATE_C1; /* -- 1.7.1 -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on 27 Jul 2010 13:00
From: Michal Schmidt <mschmidt(a)redhat.com> Date: Tue, Jul 27, 2010 at 12:53:35PM -0400 Sorry about the From: mismatch but if you do "mutt -H <patch>", the From: of the mail gets set to the patch author and not the sender. I'll paste the formatted patch instead next time. > Accomodate the original C1E-aware idle routine to the different times > during boot when the BIOS enables C1E. While at it, remove the synthetic > CPUID flag in favor of a single global setting which denotes C1E status > on the system. > > Signed-off-by: Michal Schmidt <mschmidt(a)redhat.com> > Signed-off-by: Borislav Petkov <borislav.petkov(a)amd.com> > --- > > Peter, can we please queue this for .36 merge window? Thanks. > > arch/x86/include/asm/acpi.h | 2 +- > arch/x86/include/asm/cpufeature.h | 2 +- > arch/x86/include/asm/processor.h | 1 + > arch/x86/kernel/process.c | 6 ++++-- > drivers/acpi/processor_idle.c | 2 +- > 5 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h > index aa2c39d..92091de 100644 > --- a/arch/x86/include/asm/acpi.h > +++ b/arch/x86/include/asm/acpi.h > @@ -134,7 +134,7 @@ static inline unsigned int acpi_processor_cstate_check(unsigned int max_cstate) > boot_cpu_data.x86_model <= 0x05 && > boot_cpu_data.x86_mask < 0x0A) > return 1; > - else if (boot_cpu_has(X86_FEATURE_AMDC1E)) > + else if (c1e_detected) > return 1; > else > return max_cstate; > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h > index 4681459..353154e 100644 > --- a/arch/x86/include/asm/cpufeature.h > +++ b/arch/x86/include/asm/cpufeature.h > @@ -89,7 +89,7 @@ > #define X86_FEATURE_LFENCE_RDTSC (3*32+18) /* "" Lfence synchronizes RDTSC */ > #define X86_FEATURE_11AP (3*32+19) /* "" Bad local APIC aka 11AP */ > #define X86_FEATURE_NOPL (3*32+20) /* The NOPL (0F 1F) instructions */ > -#define X86_FEATURE_AMDC1E (3*32+21) /* AMD C1E detected */ > + /* 21 missing, was AMD_C1E workaround */ > #define X86_FEATURE_XTOPOLOGY (3*32+22) /* cpu topology enum extensions */ > #define X86_FEATURE_TSC_RELIABLE (3*32+23) /* TSC is known to be reliable */ > #define X86_FEATURE_NONSTOP_TSC (3*32+24) /* TSC does not stop in C states */ > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h > index 7e5c6a6..336851e 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -762,6 +762,7 @@ extern void init_c1e_mask(void); > extern unsigned long boot_option_idle_override; > extern unsigned long idle_halt; > extern unsigned long idle_nomwait; > +extern int c1e_detected; > > /* > * on systems with caches, caches must be flashed as the absolute > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index e7e3521..1b44a5c 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -561,8 +561,10 @@ no_c1e_idle: > return 0; > } > > +int c1e_detected; > +EXPORT_SYMBOL(c1e_detected); > + > static cpumask_var_t c1e_mask; > -static int c1e_detected; > > void c1e_remove_cpu(int cpu) > { > @@ -584,12 +586,12 @@ static void c1e_idle(void) > u32 lo, hi; > > rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi); > + > if (lo & K8_INTP_C1E_ACTIVE_MASK) { > c1e_detected = 1; > if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) > mark_tsc_unstable("TSC halt in AMD C1E"); > printk(KERN_INFO "System has AMD C1E enabled\n"); > - set_cpu_cap(&boot_cpu_data, X86_FEATURE_AMDC1E); > } > } > > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c > index b1b3856..7cd95eb 100644 > --- a/drivers/acpi/processor_idle.c > +++ b/drivers/acpi/processor_idle.c > @@ -159,7 +159,7 @@ static void lapic_timer_check_state(int state, struct acpi_processor *pr, > if (cpu_has(&cpu_data(pr->id), X86_FEATURE_ARAT)) > return; > > - if (boot_cpu_has(X86_FEATURE_AMDC1E)) > + if (c1e_detected) > type = ACPI_STATE_C1; > > /* > -- > 1.7.1 > > > -- > Regards/Gruss, > Boris. > > Advanced Micro Devices GmbH > Einsteinring 24, 85609 Dornach > General Managers: Alberto Bozzo, Andrew Bowd > Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen > Registergericht Muenchen, HRB Nr. 43632 -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |