Prev: More than 1M open file descriptors
Next: [PATCH] drivers/gpio/it8761e_gpio: check return value of gpiochip_remove()
From: Bjorn Helgaas on 11 Jun 2010 17:50 [If you haven't been following this bug, the report is at [3].] Here's a theory. I'm not an expert in HyperTransport, so maybe somebody who knows HyperTransport and/or VIA chipsets can validate or refute it. This is based on the _HyperTransport I/O Link Specification_, rev 3.10b [1], and the _BIOS and Kernel Developer's Guide (BKDG) for AMD Family 10h Processors_ [2]. In a nutshell, I think the problem is that amd_bus.c treats a HyperTransport (HT) host bridge as though it were a PCI host bridge. In particular, when an HT chain contains more than one PCI host bridge, the HT host bridge apertures encompass all the PCI host bridges, but amd_bus.c mistakenly assigns all those resources to one PCI host bridge. From a software point of view, HyperTransport is similar but not identical to PCI. It is possible to make native HyperTransport peripheral devices, but PCI devices must be attached via a HyperTransport-to-PCI bridge [1, sec 4.1]. A PCI host bridge has a platform-specific non-PCI connection, e.g., a front-side bus, on the primary (upstream) side and a PCI bus on the secondary (downstream) side. Note that in the HyperTransport spec, "host bridge" refers to the interface from the host, e.g., CPU cores, to a HyperTransport chain. This HyperTransport host bridge has a HyperTransport link on the secondary side, *not* a PCI bus. A HyperTransport-to-PCI bridge is one kind of PCI host bridge, because the primary side is HyperTransport and the secondary side is PCI. Graham's machine contains one HT host bridge leading to an HT chain, and it has PCI devices on buses 00, 02, 03, 06, and 80. In addition, the HT host bridge configuration registers appear at device 18 (hex) in bus 00 configuration space, though they are not actually PCI functions. PCI buses 02, 03, and 06 are reachable from bus 00 via the PCI-to-PCI bridges at 00:03.3, 00:03.2, and 00:02.0, respectively. However, there are no PCI-to-PCI bridges that lead to bus 00 or bus 80, so the HT chain must contain two separate PCI host bridges that lead to them. Now, here's the problem: amd_bus.c reads the HT host bridge configuration and learns that it routes buses 00-ff and the related address space, including the following range, down the HT chain at node 0, link 0: [mem 0x80000000-0xfcffffffff] That makes sense, because both PCI host bridges are on that HT chain, so the HT host bridge has to forward all that address space. The problem is that amd_bus.c assumes there's only one PCI host bridge on the chain, so it assigns *all* that address space to PCI bus 00. This doesn't work because parts of that address space belong to bus 80, not bus 00, and we can't reach bus 80 from PCI bus 00. In particular, we know that at least the following address space is routed to bus 80, because the 80:01.0 device does work at this address, which is in the middle of the range we found above: [mem 0xfebfc000-0xfebfffff] (Note that we can reach bus 80 from the HT chain, but the HT chain is outside the PCI domain, even though some of the HT registers appear in PCI bus 00 config space. We need a second PCI host bridge from the HT chain to PCI bus 80.) The HT spec does suggest that an HT/PCI host bridge should implement a HyperTransport Bridge Header [1, sec 7.4]. This header would make the HT/PCI host bridge look just like a PCI-to-PCI bridge, with the usual primary/secondary/subordinate bus numbers, memory, prefetchable memory, and I/O port apertures, etc. If all the HT/PCI host bridges on a chain were implemented this way, I think it probably would work to pretend the HT host bridge is a PCI host bridge. But this sort of implementation is apparently not universal. The VIA chipset in Graham's machine doesn't do it that way, and the Serverworks HT-2100 chipset in the HP DL785 doesn't either. [1] http://www.hypertransport.org/docs/twgdocs/HTC20051222-0046-0033_changes.pdf [2] http://support.amd.com/us/Embedded_TechDocs/31116-Public-GH-BKDG_3-28_5-28-09.pdf [3] https://bugzilla.kernel.org/show_bug.cgi?id=16007 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on 11 Jun 2010 18:10 On Fri, Jun 11, 2010 at 2:49 PM, Bjorn Helgaas <bjorn.helgaas(a)hp.com> wrote: > [If you haven't been following this bug, the report is at [3].] > > Here's a theory. �I'm not an expert in HyperTransport, so maybe somebody > who knows HyperTransport and/or VIA chipsets can validate or refute it. > > This is based on the _HyperTransport I/O Link Specification_, rev 3.10b [1], > and the _BIOS and Kernel Developer's Guide (BKDG) for AMD Family 10h > Processors_ [2]. > > In a nutshell, I think the problem is that amd_bus.c treats a > HyperTransport (HT) host bridge as though it were a PCI host bridge. �In > particular, when an HT chain contains more than one PCI host bridge, the > HT host bridge apertures encompass all the PCI host bridges, but > amd_bus.c mistakenly assigns all those resources to one PCI host bridge. I don't think so. that system only have one HT chain. May 19 23:20:33 ocham kernel: pci 0000:00:18.1 config space: May 19 23:20:33 ocham kernel: 00: 22 10 01 11 00 00 00 00 00 00 00 06 00 00 80 00 May 19 23:20:33 ocham kernel: 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 40: 03 00 00 00 00 00 7f 00 00 00 00 00 01 00 00 00 May 19 23:20:33 ocham kernel: 50: 00 00 00 00 02 00 00 00 00 00 00 00 03 00 00 00 May 19 23:20:33 ocham kernel: 60: 00 00 00 00 04 00 00 00 00 00 00 00 05 00 00 00 May 19 23:20:33 ocham kernel: 70: 00 00 00 00 06 00 00 00 00 00 00 00 07 00 00 00 May 19 23:20:33 ocham kernel: 80: 03 00 e0 00 80 ff ef 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: b0: 03 0a 00 00 00 0b 00 00 03 00 80 00 00 ff ff 00 May 19 23:20:33 ocham kernel: c0: 13 10 00 00 00 f0 ff 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: e0: 03 00 00 ff 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 the (0xe4) = ff 00 00 03 mean it will route pci operation all to node0 link0. that chip from VIA has some design problem that will produce one orphan device. May 19 23:20:33 ocham kernel: pci 0000:80:01.0 config space: May 19 23:20:33 ocham kernel: 00: 06 11 88 32 06 00 10 00 10 00 03 04 10 00 00 00 May 19 23:20:33 ocham kernel: 10: 04 c0 bf fe 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 88 08 May 19 23:20:33 ocham kernel: 30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 01 00 00 May 19 23:20:33 ocham kernel: 40: 00 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 50: 01 60 42 c8 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 60: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 70: 10 00 91 00 00 00 00 00 00 00 30 00 00 00 00 00 May 19 23:20:33 ocham kernel: 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 YH -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on 11 Jun 2010 19:10 please check if this one workaround the problem Thanks Yinghai Lu [PATCH] x86, pci: handle fallout pci devices with peer root bus Signed-off-by: Yinghai Lu <yinghai(a)kernel.org> --- arch/x86/pci/bus_numa.c | 4 +++- kernel/resource.c | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/pci/bus_numa.c =================================================================== --- linux-2.6.orig/arch/x86/pci/bus_numa.c +++ linux-2.6/arch/x86/pci/bus_numa.c @@ -22,7 +22,8 @@ void x86_pci_root_bus_res_quirks(struct return; for (i = 0; i < pci_root_num; i++) { - if (pci_root_info[i].bus_min == b->number) + if (pci_root_info[i].bus_min <= b->number && + pci_root_info[i].bus_max >= b->number) break; } @@ -37,6 +38,7 @@ void x86_pci_root_bus_res_quirks(struct for (j = 0; j < info->res_num; j++) { struct resource *res; struct resource *root; + struct resource *tmp; res = &info->res[j]; pci_bus_add_resource(b, res, 0); Index: linux-2.6/kernel/resource.c =================================================================== --- linux-2.6.orig/kernel/resource.c +++ linux-2.6/kernel/resource.c @@ -451,7 +451,7 @@ static struct resource * __insert_resour if (!first) return first; - if (first == parent) + if (first == parent || first == new) return first; if ((first->start > new->start) || (first->end < new->end)) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Bjorn Helgaas on 14 Jun 2010 10:20 On Friday, June 11, 2010 05:06:49 pm Yinghai Lu wrote: > > please check if this one workaround the problem > > Thanks > > Yinghai Lu > > [PATCH] x86, pci: handle fallout pci devices with peer root bus > > Signed-off-by: Yinghai Lu <yinghai(a)kernel.org> This patch apparently does cover up the problem, but it fails on so many levels: - incomprehensible summary - no changelog - no bugzilla pointer - unrelated junk in patch ("tmp") - completely unexplained change to generic resource.c - no indication that we understand the root cause > --- > arch/x86/pci/bus_numa.c | 4 +++- > kernel/resource.c | 2 +- > 2 files changed, 4 insertions(+), 2 deletions(-) > > Index: linux-2.6/arch/x86/pci/bus_numa.c > =================================================================== > --- linux-2.6.orig/arch/x86/pci/bus_numa.c > +++ linux-2.6/arch/x86/pci/bus_numa.c > @@ -22,7 +22,8 @@ void x86_pci_root_bus_res_quirks(struct > return; > > for (i = 0; i < pci_root_num; i++) { > - if (pci_root_info[i].bus_min == b->number) > + if (pci_root_info[i].bus_min <= b->number && > + pci_root_info[i].bus_max >= b->number) > break; > } > > @@ -37,6 +38,7 @@ void x86_pci_root_bus_res_quirks(struct > for (j = 0; j < info->res_num; j++) { > struct resource *res; > struct resource *root; > + struct resource *tmp; > > res = &info->res[j]; > pci_bus_add_resource(b, res, 0); > Index: linux-2.6/kernel/resource.c > =================================================================== > --- linux-2.6.orig/kernel/resource.c > +++ linux-2.6/kernel/resource.c > @@ -451,7 +451,7 @@ static struct resource * __insert_resour > if (!first) > return first; > > - if (first == parent) > + if (first == parent || first == new) > return first; > > if ((first->start > new->start) || (first->end < new->end)) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Bjorn Helgaas on 21 Jun 2010 13:30
I think the best long-term fix is to always enable "pci=use_crs", regardless of the BIOS date (currently we only do it for 2008 and newer). System designers and BIOS writers expect the OS to pay attention to that information, and indications are that Windows does use it, so I think we will ultimately be better off if we use the expected, best-tested path. However, we have at least one known Linux issue (bug #16228) when _CRS is enabled, so I'm hesitant to enable it unconditionally at least until that is resolved. In the short term, I think we should apply Graham's quirk from comment #8, which enables pci=use_crs just for his system. Here's my response to Yinghai's patches. ACPI gives us these resources: pci_root PNP0A03:00: host bridge window [mem 0x80000000-0xff37ffff] (bus 00) pci_root PNP0A08:00: host bridge window [mem 0xfebfc000-0xfebfffff] (bus 80) Yinghai's patch (comment #17, with a v2 posted to the list but not in the bugzilla), gives us these resources: pci_bus 0000:00: resource 5 [mem 0x80000000-0xfcffffffff] pci_bus 0000:80: resource 5 [mem 0x80000000-0xfcffffffff] I think it's just a bad idea to assign the same range to both buses, especially when the BIOS is telling us what we should be using. I also think it's a mistake to mess with the resource code to deal with this specific case. A change like that makes resource.c hard to understand and maintain in the future. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |