Prev: avoid return NULL on root rb_node in rb_next/rb_prev in lib/rbtree.c
Next: arch/x86/kernel/cpu/mtrr/cleanup.c: Use ";" not "," to terminate statements
From: Dan Williams on 6 Jul 2010 20:50 [ adding Matthew as one of last people to touch mm/dmapool.c ] On Tue, 2010-07-06 at 16:40 -0700, Chris Li wrote: > On Mon, Jul 5, 2010 at 3:16 AM, David Woodhouse <dwmw2(a)infradead.org> wrote: > > On Fri, 2010-07-02 at 20:00 +0100, Chris Li wrote: > >> But I don't see the line that print out BIOS is lying. > > > > Hrm. Want to augment the dmar_find_matched_drhd_unit() function to > > _always_ print the DRHD returned for the offending PCI device? And if > > that still doesn't show, make it print pdev->vendor, pdev->device and > > the returned DRHD pointer for _every_ call? > > I just did some experiment, my PCI device ID is PCI_DEVICE_ID_INTEL_ESB2_0 > (0x2670) instead of PCI_DEVICE_ID_INTEL_IOAT_SNB. No, it should be PCI_DEVICE_ID_INTEL_IOAT_SNB (0x402f) for the dma engine at 00:0f.0 . PCI_DEVICE_ID_INTEL_ESB2_0 is the LPC controller at 00:1f.0, > That seems to be the reason preventing the warning to be print out. I am not > sure the warning should be always print out. Just curious why it did > not trigger. It should always trigger, and I have verified as much with the attached replacement patch (by forcing the error on a working system), but we run into a new problem. dma_pool_alloc() assumes that any dma_mapping error is transient. Do we need a new type of dma_mapping_error() that indicates permanent failure versus ENOMEM? The driver can handle the allocation failure, but it never gets the chance. ------------[ cut here ]------------ WARNING: at drivers/pci/dmar.c:574 dmar_find_matched_drhd_unit+0xe4/0xfa() Hardware name: [redacted to protect the innocent] BIOS wrongly assigned I/OAT IOMMU 5: reg_base_addr fe71a000 cap 4900800c2f0462 ecap e01 Modules linked in: ioatdma(+) dca ipv6 snd_pcsp snd_pcm snd_timer snd soundcore i2c_i801 snd_page_alloc serio_raw i2c_core joydev Pid: 1166, comm: modprobe Not tainted 2.6.35-rc3+ #2 Call Trace: [<ffffffff8104bfd0>] warn_slowpath_common+0x85/0x9d [<ffffffff8104c043>] warn_slowpath_fmt_taint+0x3f/0x41 [<ffffffff8125dd4b>] dmar_find_matched_drhd_unit+0xe4/0xfa [<ffffffff8126179d>] get_domain_for_dev.clone.3+0x111/0x471 [<ffffffff81261cbb>] get_valid_domain_for_dev+0x26/0x9a [<ffffffff81261f51>] __intel_map_single+0x4c/0x175 [<ffffffff81262184>] intel_alloc_coherent+0xc7/0xef [<ffffffff810edcd2>] dma_pool_alloc+0x179/0x2ab [<ffffffffa00ed606>] ? kzalloc+0x14/0x16 [ioatdma] [<ffffffffa00efe58>] ioat2_alloc_chan_resources+0x4f/0x219 [ioatdma] [<ffffffffa00f33b9>] ioat_dma_self_test+0x94/0x2af [ioatdma] [<ffffffff8109bff2>] ? devm_request_threaded_irq+0x98/0xaa [<ffffffffa00f31cd>] ioat_probe+0x338/0x3aa [ioatdma] [<ffffffffa00f3657>] ioat2_dma_probe+0x83/0x106 [ioatdma] [<ffffffffa00f2ded>] ioat_pci_probe+0x133/0x195 [ioatdma] [<ffffffff8124b539>] local_pci_probe+0x17/0x1b [<ffffffff8124c2f5>] pci_device_probe+0xcd/0xfd [<ffffffff812ee5f5>] ? driver_sysfs_add+0x4c/0x71 [<ffffffff812ee81a>] driver_probe_device+0x12f/0x240 [<ffffffff812ee97a>] __driver_attach+0x4f/0x6b [<ffffffff812ee92b>] ? __driver_attach+0x0/0x6b [<ffffffff812edc66>] bus_for_each_dev+0x53/0x88 [<ffffffff812ee554>] driver_attach+0x1e/0x20 [<ffffffff812ee19a>] bus_add_driver+0xd5/0x23b [<ffffffff812eec54>] driver_register+0x9d/0x10e [<ffffffff8124c521>] __pci_register_driver+0x58/0xc8 [<ffffffffa00fc000>] ? ioat_init_module+0x0/0x85 [ioatdma] [<ffffffffa00fc000>] ? ioat_init_module+0x0/0x85 [ioatdma] [<ffffffffa00fc06d>] ioat_init_module+0x6d/0x85 [ioatdma] [<ffffffff81002069>] do_one_initcall+0x5e/0x159 [<ffffffff8107bd01>] sys_init_module+0xa1/0x1e0 [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b ---[ end trace 02c1ac1f56dc9544 ]--- Disabling lock debugging due to kernel taint IOMMU: can't find DMAR for device 0000:00:0f.0 Allocating domain for 0000:00:0f.0 failed IOMMU: can't find DMAR for device 0000:00:0f.0 Allocating domain for 0000:00:0f.0 failed [...ad infinitum...] -- Dan
From: Chris Li on 6 Jul 2010 21:00 On Tue, Jul 6, 2010 at 5:51 PM, Dan Williams <dan.j.williams(a)intel.com> wrote: > No, it should be PCI_DEVICE_ID_INTEL_IOAT_SNB (0x402f) for the dma > engine at 00:0f.0 . �PCI_DEVICE_ID_INTEL_ESB2_0 is the LPC controller at > 00:1f.0, > >> That seems to be the reason preventing the warning to be print out. I am not >> sure the warning should be always print out. Just curious why it did >> not trigger. > > It should always trigger, and I have verified as much with the attached > replacement patch (by forcing the error on a working system), but we run > into a new problem. �dma_pool_alloc() assumes that any dma_mapping error > is transient. �Do we need a new type of dma_mapping_error() that > indicates permanent failure versus ENOMEM? �The driver can handle the > allocation failure, but it never gets the chance. Should I test your V2 patch instead? Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Dan Williams on 6 Jul 2010 21:00 On 7/6/2010 5:51 PM, Chris Li wrote: > On Tue, Jul 6, 2010 at 5:51 PM, Dan Williams<dan.j.williams(a)intel.com> wrote: >> No, it should be PCI_DEVICE_ID_INTEL_IOAT_SNB (0x402f) for the dma >> engine at 00:0f.0 . PCI_DEVICE_ID_INTEL_ESB2_0 is the LPC controller at >> 00:1f.0, >> >>> That seems to be the reason preventing the warning to be print out. I am not >>> sure the warning should be always print out. Just curious why it did >>> not trigger. >> >> It should always trigger, and I have verified as much with the attached >> replacement patch (by forcing the error on a working system), but we run >> into a new problem. dma_pool_alloc() assumes that any dma_mapping error >> is transient. Do we need a new type of dma_mapping_error() that >> indicates permanent failure versus ENOMEM? The driver can handle the >> allocation failure, but it never gets the chance. > > Should I test your V2 patch instead? > It would confirm that we are catching the BIOS misconfiguration, but your system will get stuck in this loop. So just make sure you can get back to a working config, which it sounds like you can. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Chris Li on 6 Jul 2010 21:10 On Tue, Jul 6, 2010 at 5:58 PM, Dan Williams <dan.j.williams(a)intel.com> wrote: > > It would confirm that we are catching the BIOS misconfiguration, but your > system will get stuck in this loop. So just make sure you can get back to a > working config, which it sounds like you can. > Here is the new dmesg. Chris
From: David Woodhouse on 6 Jul 2010 23:30
On Tue, 2010-07-06 at 18:03 -0700, Chris Li wrote: > > Here is the new dmesg. > > Chris > > --00163646dc2e3280f2048ac1bd39 > Content-Type: application/octet-stream; name=dmesg > Content-Disposition: attachment; filename=dmesg Btw, it'd be really helpful if you could attach those as text files and inline so that they can just be read in the mailer rather than saved. -- dwmw2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |