Prev: Hyperv: Export the symbol that tracks hyperv features and recommendations
Next: [RELEASE] LTTng 0.218 for kernel 2.6.34
From: Matt Turner on 21 Jun 2010 17:20 Michael Cree and I have been debugging FDO bug 26403 [1]. I tried booting with `radeon.test=1` and found this, which I think is related: > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x202000 > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x302000 [snip] > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfd02000 > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfe02000 > pci_map_single failed: could not allocate dma page tables > [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 128 pages at 0x0FF02000 > [TTM] Couldn't bind backend. > radeon 0000:00:07.0: object_init failed for (1048576, 0x00000002) > [drm:radeon_test_moves] *ERROR* Failed to create GTT object 253 > Error while testing BO move. From what I can see, the call chain is radeon_test_moves (radeon_ttm_backend_bind called through callback function) - radeon_ttm.c:radeon_ttm_backend_bind calls radeon_gart_bind - radeon_gart.c:radeon_gart_bind calls pci_map_page - pci_map_page is alpha_pci_map_page, which calls... - alpha_pci_map_page calls pci_iommu.c:pci_map_single_1 - pci_map_single_1 calls iommu_arena_alloc - iommu_arena_alloc calls iommu_arena_find_pages - iommu_arena_find_pages returns non-0 - iommu_arena_alloc returns non-0 - pci_map_single_1 returns 0 after printing "could not allocate dma page tables" error - alpha_pci_map_page returns 0 from pci_map_single_1 - radeon_gart_bind returns non-0, error path prints "*ERROR* failed to bind 128 pages at 0x0FF02000" Is this the cause of the bug we're seeing in the report [1]? Anyone know what's going wrong here? Thanks! Matt Turner [1] https://bugs.freedesktop.org/show_bug.cgi?id=26403 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on 22 Jun 2010 02:10 On Mon, 21 Jun 2010 17:19:43 -0400 Matt Turner <mattst88(a)gmail.com> wrote: > Michael Cree and I have been debugging FDO bug 26403 [1]. I tried > booting with `radeon.test=1` and found this, which I think is related: > > > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x202000 > > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x302000 > [snip] > > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfd02000 > > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfe02000 > > pci_map_single failed: could not allocate dma page tables > > [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 128 pages at 0x0FF02000 > > [TTM] Couldn't bind backend. > > radeon 0000:00:07.0: object_init failed for (1048576, 0x00000002) > > [drm:radeon_test_moves] *ERROR* Failed to create GTT object 253 > > Error while testing BO move. > > From what I can see, the call chain is > radeon_test_moves > (radeon_ttm_backend_bind called through callback function) > - radeon_ttm.c:radeon_ttm_backend_bind calls radeon_gart_bind > - radeon_gart.c:radeon_gart_bind calls pci_map_page > - pci_map_page is alpha_pci_map_page, which calls... > - alpha_pci_map_page calls pci_iommu.c:pci_map_single_1 > - pci_map_single_1 calls iommu_arena_alloc > - iommu_arena_alloc calls iommu_arena_find_pages > - iommu_arena_find_pages returns non-0 > - iommu_arena_alloc returns non-0 > - pci_map_single_1 returns 0 after printing > "could not allocate dma page tables" error > - alpha_pci_map_page returns 0 from pci_map_single_1 > - radeon_gart_bind returns non-0, error path prints > "*ERROR* failed to bind 128 pages at 0x0FF02000" This happens in the latest git, right? Is this a regression (what kernel version worked)? Seems that the IOMMU can't find 128 pages. It's likely due to: - out of the IOMMU space (possibly someone doesn't free the IOMMU space). or - the mapping parameters (such as align) aren't appropriate so the IOMMU can't find space. > Is this the cause of the bug we're seeing in the report [1]? > > Anyone know what's going wrong here? I've attached a patch to print the debug info about the mapping parameters. diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c index d1dbd9a..17cf0d8 100644 --- a/arch/alpha/kernel/pci_iommu.c +++ b/arch/alpha/kernel/pci_iommu.c @@ -187,6 +187,10 @@ iommu_arena_alloc(struct device *dev, struct pci_iommu_arena *arena, long n, /* Search for N empty ptes */ ptes = arena->ptes; mask = max(align, arena->align_entry) - 1; + + printk("%s: %p, %p, %d, %ld, %lx, %u\n", __func__, dev, arena, arena->size, + n, mask, align); + p = iommu_arena_find_pages(dev, arena, n, mask); if (p < 0) { spin_unlock_irqrestore(&arena->lock, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Dave Airlie on 22 Jun 2010 04:40 On Tue, Jun 22, 2010 at 3:59 PM, FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp> wrote: > On Mon, 21 Jun 2010 17:19:43 -0400 > Matt Turner <mattst88(a)gmail.com> wrote: > >> Michael Cree and I have been debugging FDO bug 26403 [1]. I tried >> booting with `radeon.test=1` and found this, which I think is related: >> >> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x202000 >> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x302000 >> [snip] >> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfd02000 >> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfe02000 >> > pci_map_single failed: could not allocate dma page tables >> > [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 128 pages at 0x0FF02000 >> > [TTM] Couldn't bind backend. >> > radeon 0000:00:07.0: object_init failed for (1048576, 0x00000002) >> > [drm:radeon_test_moves] *ERROR* Failed to create GTT object 253 >> > Error while testing BO move. >> >> From what I can see, the call chain is >> radeon_test_moves >> �(radeon_ttm_backend_bind called through callback function) >> �- radeon_ttm.c:radeon_ttm_backend_bind calls radeon_gart_bind >> � - radeon_gart.c:radeon_gart_bind calls pci_map_page >> � �- pci_map_page is alpha_pci_map_page, which calls... >> � � - alpha_pci_map_page calls pci_iommu.c:pci_map_single_1 >> � � �- pci_map_single_1 calls iommu_arena_alloc >> � � � - iommu_arena_alloc calls iommu_arena_find_pages >> � � � �- iommu_arena_find_pages returns non-0 >> � � � - iommu_arena_alloc returns non-0 >> � � �- pci_map_single_1 returns 0 after printing >> � � � �"could not allocate dma page tables" error >> � � - alpha_pci_map_page returns 0 from pci_map_single_1 >> � - radeon_gart_bind returns non-0, error path prints >> � � "*ERROR* failed to bind 128 pages at 0x0FF02000" > > This happens in the latest git, right? > > Is this a regression (what kernel version worked)? > > > Seems that the IOMMU can't find 128 pages. It's likely due to: > > - out of the IOMMU space (possibly someone doesn't free the IOMMU > �space). > > or > > - the mapping parameters (such as align) aren't appropriate so the > �IOMMU can't find space. I don't think KMS drivers have ever worked on alpha so its not a regression, they are working fine on x86 + powerpc and sparc has been run at least once. I suspect we are simply hitting the limits of the iommu, how big an address space does it handle? since generally graphics drivers try to bind a lot of things to the GART. It might be worth limiting the PCIGART in radeon to 32MB to see if the lower limit helps. Dave. > > >> Is this the cause of the bug we're seeing in the report [1]? >> >> Anyone know what's going wrong here? > > > I've attached a patch to print the debug info about the mapping > parameters. > > > diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c > index d1dbd9a..17cf0d8 100644 > --- a/arch/alpha/kernel/pci_iommu.c > +++ b/arch/alpha/kernel/pci_iommu.c > @@ -187,6 +187,10 @@ iommu_arena_alloc(struct device *dev, struct pci_iommu_arena *arena, long n, > � � � �/* Search for N empty ptes */ > � � � �ptes = arena->ptes; > � � � �mask = max(align, arena->align_entry) - 1; > + > + � � � printk("%s: %p, %p, %d, %ld, %lx, %u\n", __func__, dev, arena, arena->size, > + � � � � � � �n, mask, align); > + > � � � �p = iommu_arena_find_pages(dev, arena, n, mask); > � � � �if (p < 0) { > � � � � � � � �spin_unlock_irqrestore(&arena->lock, flags); > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael Cree on 24 Jun 2010 06:00 On 22/06/10 20:32, Dave Airlie wrote: > On Tue, Jun 22, 2010 at 3:59 PM, FUJITA Tomonori > <fujita.tomonori(a)lab.ntt.co.jp> wrote: >> On Mon, 21 Jun 2010 17:19:43 -0400 >> Matt Turner<mattst88(a)gmail.com> wrote: >> >>> Michael Cree and I have been debugging FDO bug 26403 [1]. I tried >>> booting with `radeon.test=1` and found this, which I think is related: Note that my radeon card is PCI whereas I think Matt may be using an AGP card. My logs are very similar to Matt's except I don't see the following line: >>>> pci_map_single failed: could not allocate dma page tables >> This happens in the latest git, right? Indeed, testing 2.6.35-rc3 (plus a couple or so extra patches to fix unrelated compile errors). >> Is this a regression (what kernel version worked)? >> >> Seems that the IOMMU can't find 128 pages. It's likely due to: >> >> - out of the IOMMU space (possibly someone doesn't free the IOMMU >> space). >> >> or >> >> - the mapping parameters (such as align) aren't appropriate so the >> IOMMU can't find space. > > I don't think KMS drivers have ever worked on alpha so its not a > regression, they are working fine on x86 + powerpc and sparc has been > run at least once. KMS on the console boot up has worked since about 2.6.32, but starting up the X server has always failed and, in my case, the system becomes unstable and eventually OOPs. > I suspect we are simply hitting the limits of the iommu, how big an > address space does it handle? since generally graphics drivers try to > bind a lot of things to the GART. No idea on the address space limit. I applied the patch of Fujita that logs all IOMMU allocations, and also inserted some extra printks in the ttm kernel code so that I could see which routines failed and the error code returned. Running the radeon test on boot exhibits the following: [ 238.712768] [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x1a312000 [ 239.281127] [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x1a412000 [ 239.281127] ttm_tt_bind belched -12 [ 239.282104] ttm_bo_handle_move_mem belched -12 [ 239.282104] ttm_bo_move_buffer belched -12 [ 239.282104] ttm_bo_validate belched -12 [ 239.282104] radeon 0000:01:00.0: object_init failed for (1048576, 0x00000002) err=-12 [ 239.282104] [drm:radeon_test_moves] *ERROR* Failed to create GTT object 419 [ 239.399291] Error while testing BO move. Note that no IOMMU allocations are printed while radeon_test_moves is running so iommu_arena_alloc doesn't appear to be called. Also the error code returned up to radeon_test_moves is -12 which is ENOMEM. So does appear to be some memory limit. > It might be worth limiting the PCIGART in radeon to 32MB to see if the > lower limit helps. So, how does one do that? Cheers Michael. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Matt Turner on 24 Jun 2010 11:00
On Tue, Jun 22, 2010 at 1:59 AM, FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp> wrote: > On Mon, 21 Jun 2010 17:19:43 -0400 > Matt Turner <mattst88(a)gmail.com> wrote: > >> Michael Cree and I have been debugging FDO bug 26403 [1]. I tried >> booting with `radeon.test=1` and found this, which I think is related: >> >> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x202000 >> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x302000 >> [snip] >> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfd02000 >> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfe02000 >> > pci_map_single failed: could not allocate dma page tables >> > [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 128 pages at 0x0FF02000 >> > [TTM] Couldn't bind backend. >> > radeon 0000:00:07.0: object_init failed for (1048576, 0x00000002) >> > [drm:radeon_test_moves] *ERROR* Failed to create GTT object 253 >> > Error while testing BO move. >> >> From what I can see, the call chain is >> radeon_test_moves >> (radeon_ttm_backend_bind called through callback function) >> - radeon_ttm.c:radeon_ttm_backend_bind calls radeon_gart_bind >> - radeon_gart.c:radeon_gart_bind calls pci_map_page >> - pci_map_page is alpha_pci_map_page, which calls... >> - alpha_pci_map_page calls pci_iommu.c:pci_map_single_1 >> - pci_map_single_1 calls iommu_arena_alloc >> - iommu_arena_alloc calls iommu_arena_find_pages >> - iommu_arena_find_pages returns non-0 >> - iommu_arena_alloc returns non-0 >> - pci_map_single_1 returns 0 after printing >> "could not allocate dma page tables" error >> - alpha_pci_map_page returns 0 from pci_map_single_1 >> - radeon_gart_bind returns non-0, error path prints >> "*ERROR* failed to bind 128 pages at 0x0FF02000" > > This happens in the latest git, right? I'm using 2.6.35-rc2, but I could try rc3 if you think it would make a difference. > Is this a regression (what kernel version worked)? The framebuffer console has always worked, but I've never known X on KMS to work. The radeon.test parameter hasn't existed the entire time, but I could try still previous kernels. > Seems that the IOMMU can't find 128 pages. It's likely due to: > > - out of the IOMMU space (possibly someone doesn't free the IOMMU > space). > > or > > - the mapping parameters (such as align) aren't appropriate so the > IOMMU can't find space. > > >> Is this the cause of the bug we're seeing in the report [1]? >> >> Anyone know what's going wrong here? > > > I've attached a patch to print the debug info about the mapping > parameters. > > > diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c > index d1dbd9a..17cf0d8 100644 > --- a/arch/alpha/kernel/pci_iommu.c > +++ b/arch/alpha/kernel/pci_iommu.c > @@ -187,6 +187,10 @@ iommu_arena_alloc(struct device *dev, struct pci_iommu_arena *arena, long n, > /* Search for N empty ptes */ > ptes = arena->ptes; > mask = max(align, arena->align_entry) - 1; > + > + printk("%s: %p, %p, %d, %ld, %lx, %u\n", __func__, dev, arena, arena->size, > + n, mask, align); > + > p = iommu_arena_find_pages(dev, arena, n, mask); > if (p < 0) { > spin_unlock_irqrestore(&arena->lock, flags); Using this patch, I log the attached output. Thanks for your help so far. :) Matt |