Prev: [mmotm][PATCH 4/5] mm : add lowmem detection logic
Next: move eject code from zd1211rw to usb-storage
From: Yinghai Lu on 15 Dec 2009 08:00 Jens Axboe wrote: > On Tue, Dec 15 2009, Yinghai Lu wrote: >> Jens Axboe wrote: >>> On Tue, Dec 15 2009, Yinghai Lu wrote: >>>> Jens Axboe wrote: >>>>> Hi, >>>>> >>>>> I have this big box that takes forever to boot, so I use kexec to boot >>>>> into new kernels. Works fine, but some time past 2.6.32 it stopped >>>>> working. Instead of wasting brain cycles on finding out why, I handed >>>>> the problem to my trusty regression friend - git bisect. >>>>> >>>>> This is what it found (sorry Yinghai it's you again, you owe me a beer >>>>> for hours of 2.6.32-git bisecting ;-) >>>> sure. >>>> >>>>> 99935a7a59eaca0292c1a5880e10bae03f4a5e3d is the first bad commit >>>>> commit 99935a7a59eaca0292c1a5880e10bae03f4a5e3d >>>>> Author: Yinghai Lu <yinghai(a)kernel.org> >>>>> Date: Sun Oct 4 21:54:24 2009 -0700 >>>>> >>>>> x86/PCI: read root resources from IOH on Intel >>>>> >>>>> For intel systems with multi IOH, we should read peer root resources >>>>> directly from PCI config space, and don't trust _CRS. >>>>> >>>>> >>>>> I could not revert this single commit, as a further commit made other >>>>> changes. So I reverted 67f241f4 first and then 99935a7a. I confirmed >>>>> that this kernel then works fine. >>>>> >>>> let see how BIOS mess it up again! >>> Heh, I had a feeling this was coming :-) >>> >>>> please. >>> Please find two logs attached - one from a boot with -git and the two >>> patches reverted, and one from a boot with -git. >> please enabled CONFIG_PCI_DEBUG and boot with debug in boot command line. > > On the good or bad kernel? both please. YH -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on 15 Dec 2009 09:20 On Tue, Dec 15 2009, Yinghai Lu wrote: > Jens Axboe wrote: > > On Tue, Dec 15 2009, Yinghai Lu wrote: > >> Jens Axboe wrote: > >>> On Tue, Dec 15 2009, Yinghai Lu wrote: > >>>> Jens Axboe wrote: > >>>>> Hi, > >>>>> > >>>>> I have this big box that takes forever to boot, so I use kexec to boot > >>>>> into new kernels. Works fine, but some time past 2.6.32 it stopped > >>>>> working. Instead of wasting brain cycles on finding out why, I handed > >>>>> the problem to my trusty regression friend - git bisect. > >>>>> > >>>>> This is what it found (sorry Yinghai it's you again, you owe me a beer > >>>>> for hours of 2.6.32-git bisecting ;-) > >>>> sure. > >>>> > >>>>> 99935a7a59eaca0292c1a5880e10bae03f4a5e3d is the first bad commit > >>>>> commit 99935a7a59eaca0292c1a5880e10bae03f4a5e3d > >>>>> Author: Yinghai Lu <yinghai(a)kernel.org> > >>>>> Date: Sun Oct 4 21:54:24 2009 -0700 > >>>>> > >>>>> x86/PCI: read root resources from IOH on Intel > >>>>> > >>>>> For intel systems with multi IOH, we should read peer root resources > >>>>> directly from PCI config space, and don't trust _CRS. > >>>>> > >>>>> > >>>>> I could not revert this single commit, as a further commit made other > >>>>> changes. So I reverted 67f241f4 first and then 99935a7a. I confirmed > >>>>> that this kernel then works fine. > >>>>> > >>>> let see how BIOS mess it up again! > >>> Heh, I had a feeling this was coming :-) > >>> > >>>> please. > >>> Please find two logs attached - one from a boot with -git and the two > >>> patches reverted, and one from a boot with -git. > >> please enabled CONFIG_PCI_DEBUG and boot with debug in boot command line. > > > > On the good or bad kernel? > > both please. Attached. -- Jens Axboe
From: Yinghai Lu on 15 Dec 2009 13:50 Jens Axboe wrote: > On Tue, Dec 15 2009, Yinghai Lu wrote: >>>>>>> >>>>>> let see how BIOS mess it up again! >>>>> Heh, I had a feeling this was coming :-) [ 0.000000] user-defined physical RAM map: [ 0.000000] user: 0000000000000100 - 0000000000098800 (usable) [ 0.000000] user: 0000000000098800 - 00000000000a0000 (reserved) [ 0.000000] user: 00000000000e0000 - 0000000000100000 (reserved) [ 0.000000] user: 0000000000100000 - 0000000078c63000 (usable) [ 0.000000] user: 0000000078c63000 - 0000000078e77000 (ACPI NVS) [ 0.000000] user: 0000000078e77000 - 000000007924e000 (ACPI data) [ 0.000000] user: 000000007924e000 - 00000000792c2000 (reserved) [ 0.000000] user: 00000000792c2000 - 00000000792d2000 (ACPI data) [ 0.000000] user: 00000000792d2000 - 00000000792e7000 (reserved) [ 0.000000] user: 00000000792e7000 - 0000000079301000 (ACPI data) [ 0.000000] user: 0000000079301000 - 0000000079303000 (reserved) [ 0.000000] user: 0000000079303000 - 0000000079305000 (ACPI data) [ 0.000000] user: 0000000079305000 - 0000000079310000 (reserved) [ 0.000000] user: 0000000079310000 - 0000000079314000 (ACPI data) [ 0.000000] user: 0000000079314000 - 0000000079319000 (reserved) [ 0.000000] user: 0000000079319000 - 0000000079336000 (ACPI data) [ 0.000000] user: 0000000079336000 - 0000000079358000 (reserved) [ 0.000000] user: 0000000079358000 - 0000000079388000 (ACPI data) [ 0.000000] user: 0000000079388000 - 00000000793c9000 (reserved) [ 0.000000] user: 00000000793c9000 - 000000007968f000 (ACPI data) [ 0.000000] user: 000000007968f000 - 00000000796bb000 (reserved) [ 0.000000] user: 00000000796bb000 - 00000000799d8000 (ACPI data) [ 0.000000] user: 00000000799d8000 - 0000000079bd8000 (ACPI NVS) [ 0.000000] user: 0000000079bd8000 - 0000000079d87000 (ACPI data) [ 0.000000] user: 0000000079d87000 - 0000000079d8a000 (reserved) [ 0.000000] user: 0000000079d8a000 - 0000000079dca000 (ACPI data) [ 0.000000] user: 0000000079dca000 - 0000000079dcb000 (reserved) [ 0.000000] user: 0000000079dcb000 - 0000000079e1c000 (ACPI data) [ 0.000000] user: 0000000079e1c000 - 0000000079e87000 (reserved) [ 0.000000] user: 0000000079e87000 - 000000007bd5f000 (ACPI data) [ 0.000000] user: 000000007bd5f000 - 000000007be4f000 (reserved) [ 0.000000] user: 000000007be4f000 - 000000007bf87000 (ACPI data) [ 0.000000] user: 0000000100000000 - 0000001080000000 (usable) .... [ 0.000000] SRAT: Node 0 PXM 0 0-80000000 [ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000 [ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000 [ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000 [ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000 [ 0.000000] ACPI: [SRAT:0x01] ignored 16 entries of 32 found [ 0.000000] NUMA: Using 31 for the hash shift. [ 0.000000] SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used. [ 0.000000] SRAT: SRAT not used. [ 0.000000] No NUMA configuration found so SRAT is broken? if (max_entries && count > max_entries) { printk(KERN_WARNING PREFIX "[%4.4s:0x%02x] ignored %i entries of " "%i found\n", id, entry_id, count - max_entries, count); } .... or what is your CONFIG_NODES_SHIFT? 3? can you try to set it to 6? [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources [ 13.112475] PCI: not using MMCONFIG [ 13.206650] ACPI: No dock devices found. so mmconf is not used...<ask BIOS fix it please!> then we get [ 13.990335] IOH bus: [00, 00] [ 13.993707] IOH bus: 00 index 0 io port: [0, fff] [ 13.999023] IOH bus: 00 index 1 mmio: [0, ffffff] [ 14.004335] IOH bus: 00 index 2 mmio: [0, 3ffffff] please check [PATCH] x86/pci: intel ioh bus num reg accessing fix it is above 0x100, so if mmconf is not enable, need to skip it Reported-by: Jens Axboe <jens.axboe(a)oracle.com> Signed-off-by: Yinghai Lu <yinghai(a)kernel.org> --- arch/x86/pci/intel_bus.c | 4 ++++ 1 file changed, 4 insertions(+) Index: linux-2.6/arch/x86/pci/intel_bus.c =================================================================== --- linux-2.6.orig/arch/x86/pci/intel_bus.c +++ linux-2.6/arch/x86/pci/intel_bus.c @@ -49,6 +49,10 @@ static void __devinit pci_root_bus_res(s u64 mmioh_base, mmioh_end; int bus_base, bus_end; + /* some sys doesn't get mmconf enabled */ + if (dev->cfg_size < 0x200) + return; + if (pci_root_num >= PCI_ROOT_NR) { printk(KERN_DEBUG "intel_bus.c: PCI_ROOT_NR is too small\n"); return; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Wilcox on 15 Dec 2009 13:50 On Tue, Dec 15, 2009 at 10:39:37AM -0800, Yinghai Lu wrote: > + /* some sys doesn't get mmconf enabled */ > + if (dev->cfg_size < 0x200) > + return; What is the meaning of this mystic 0x200? -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on 15 Dec 2009 14:00
On Tue, Dec 15 2009, Yinghai Lu wrote: > Jens Axboe wrote: > > On Tue, Dec 15 2009, Yinghai Lu wrote: > >>>>>>> > >>>>>> let see how BIOS mess it up again! > >>>>> Heh, I had a feeling this was coming :-) > > [ 0.000000] user-defined physical RAM map: > > [ 0.000000] user: 0000000000000100 - 0000000000098800 (usable) > > [ 0.000000] user: 0000000000098800 - 00000000000a0000 (reserved) > > [ 0.000000] user: 00000000000e0000 - 0000000000100000 (reserved) > > [ 0.000000] user: 0000000000100000 - 0000000078c63000 (usable) > > [ 0.000000] user: 0000000078c63000 - 0000000078e77000 (ACPI NVS) > > [ 0.000000] user: 0000000078e77000 - 000000007924e000 (ACPI data) > > [ 0.000000] user: 000000007924e000 - 00000000792c2000 (reserved) > > [ 0.000000] user: 00000000792c2000 - 00000000792d2000 (ACPI data) > > [ 0.000000] user: 00000000792d2000 - 00000000792e7000 (reserved) > > [ 0.000000] user: 00000000792e7000 - 0000000079301000 (ACPI data) > > [ 0.000000] user: 0000000079301000 - 0000000079303000 (reserved) > > [ 0.000000] user: 0000000079303000 - 0000000079305000 (ACPI data) > > > [ 0.000000] user: 0000000079305000 - 0000000079310000 (reserved) > > [ 0.000000] user: 0000000079310000 - 0000000079314000 (ACPI data) > > [ 0.000000] user: 0000000079314000 - 0000000079319000 (reserved) > > [ 0.000000] user: 0000000079319000 - 0000000079336000 (ACPI data) > > [ 0.000000] user: 0000000079336000 - 0000000079358000 (reserved) > > [ 0.000000] user: 0000000079358000 - 0000000079388000 (ACPI data) > > [ 0.000000] user: 0000000079388000 - 00000000793c9000 (reserved) > > [ 0.000000] user: 00000000793c9000 - 000000007968f000 (ACPI data) > > [ 0.000000] user: 000000007968f000 - 00000000796bb000 (reserved) > > [ 0.000000] user: 00000000796bb000 - 00000000799d8000 (ACPI data) > > [ 0.000000] user: 00000000799d8000 - 0000000079bd8000 (ACPI NVS) > > [ 0.000000] user: 0000000079bd8000 - 0000000079d87000 (ACPI data) > > [ 0.000000] user: 0000000079d87000 - 0000000079d8a000 (reserved) > > [ 0.000000] user: 0000000079d8a000 - 0000000079dca000 (ACPI data) > > [ 0.000000] user: 0000000079dca000 - 0000000079dcb000 (reserved) > > [ 0.000000] user: 0000000079dcb000 - 0000000079e1c000 (ACPI data) > > [ 0.000000] user: 0000000079e1c000 - 0000000079e87000 (reserved) > > [ 0.000000] user: 0000000079e87000 - 000000007bd5f000 (ACPI data) > > [ 0.000000] user: 000000007bd5f000 - 000000007be4f000 (reserved) > > [ 0.000000] user: 000000007be4f000 - 000000007bf87000 (ACPI data) > > [ 0.000000] user: 0000000100000000 - 0000001080000000 (usable) > ... > [ 0.000000] SRAT: Node 0 PXM 0 0-80000000 > > [ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000 > > [ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000 > > [ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000 > > [ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000 > > [ 0.000000] ACPI: [SRAT:0x01] ignored 16 entries of 32 found > > [ 0.000000] NUMA: Using 31 for the hash shift. > > [ 0.000000] SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used. > > [ 0.000000] SRAT: SRAT not used. > > [ 0.000000] No NUMA configuration found > > so SRAT is broken? > > if (max_entries && count > max_entries) { > printk(KERN_WARNING PREFIX "[%4.4s:0x%02x] ignored %i entries of " > "%i found\n", id, entry_id, count - max_entries, count); > } > ... > > or what is your CONFIG_NODES_SHIFT? 3? can you try to set it to 6? Hmm funky, perhaps the BIOS changed that too. NUMA has otherwise been working fine, didn't check whether it still did after a BIOS upgrade. I'll try 6, it is set to 3 iirc. > [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) > > [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources > > [ 13.112475] PCI: not using MMCONFIG > > [ 13.206650] ACPI: No dock devices found. > > so mmconf is not used...<ask BIOS fix it please!> Reported, thanks. > then we get > > [ 13.990335] IOH bus: [00, 00] > > [ 13.993707] IOH bus: 00 index 0 io port: [0, fff] > > [ 13.999023] IOH bus: 00 index 1 mmio: [0, ffffff] > > [ 14.004335] IOH bus: 00 index 2 mmio: [0, 3ffffff] > > please check > > [PATCH] x86/pci: intel ioh bus num reg accessing fix > > it is above 0x100, so if mmconf is not enable, need to skip it Will check that now. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |