Prev: AoE: undetected corruption of single bits?
Next: 2.6.35-rc[1-5] + Radeon-kms: 3D graphic performance
From: Stephen Rothwell on 22 Jul 2010 20:30 Hi all, My Power7 boot test paniced like this: (next-20100722) %GQLogic Fibre Channel HBA Driver: 8.03.03-k0 qla2xxx 0002:01:00.2: enabling device (0144 -> 0146) qla2xxx 0002:01:00.2: Found an ISP8001, irq 35, iobase 0xd000080080014000 ------------[ cut here ]------------ kernel BUG at drivers/pci/msi.c:205! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=128 NUMA pSeries last sysfs file: /sys/devices/virtual/tty/ptyz8/uevent Modules linked in: qla2xxx(+) NIP: c0000000002fcd54 LR: c000000000048d9c CTR: 0000000000000001 REGS: c00000000278aff0 TRAP: 0700 Not tainted (2.6.35-rc5-autokern1-next-20100721) MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 28422488 XER: 20000008 TASK = c000000002008000[2226] 'modprobe' THREAD: c000000002788000 CPU: 12 GPR00: 0000000000000001 c00000000278b270 c0000000009a36d0 c0000000009b8900 GPR04: c00000000278b2e8 ffffffffffffffff 0000000000000000 0000000000020000 GPR08: 00000000000033e7 c00000000a38b280 0000000000000000 0000000000000000 GPR12: 0000000088422488 c00000000f331800 00000fff921750a0 0000000000000000 GPR16: 0000000010033110 00000000100334b8 0000000000000000 0000000000000000 GPR20: d000080080018000 0000000000022225 c0000000009f7bb4 0000000000010200 GPR24: 000000002000020d 0000000000000025 c00000000278b2e0 c00000000278b2e8 GPR28: 0000000000000001 c00000000d0ac5f8 c000000000af8f00 c00000000a38b280 NIP [c0000000002fcd54] .read_msi_msg_desc+0x24/0x3c LR [c000000000048d9c] .rtas_setup_msi_irqs+0x1d8/0x254 Call Trace: [c00000000278b270] [c000000000048d9c] .rtas_setup_msi_irqs+0x1d8/0x254 (unreliable) [c00000000278b360] [c00000000002a9cc] .arch_setup_msi_irqs+0x34/0x4c [c00000000278b3e0] [c0000000002fd3fc] .pci_enable_msix+0x49c/0x4ac [c00000000278b4c0] [d0000000001a5e30] .qla2x00_request_irqs+0x158/0x5b4 [qla2xxx] [c00000000278b580] [d0000000001cb41c] .qla2x00_probe_one+0xeac/0x63b0 [qla2xxx] [c00000000278b6f0] [c0000000002f5c4c] .local_pci_probe+0x34/0x48 [c00000000278b760] [c0000000002f6078] .pci_device_probe+0xe8/0x130 [c00000000278b810] [c00000000036e648] .driver_probe_device+0xdc/0x1a4 [c00000000278b8a0] [c00000000036e7a4] .__driver_attach+0x94/0xd8 [c00000000278b930] [c00000000036dabc] .bus_for_each_dev+0x7c/0xe0 [c00000000278b9e0] [c00000000036e410] .driver_attach+0x28/0x40 [c00000000278ba60] [c00000000036d134] .bus_add_driver+0x144/0x310 [c00000000278bb10] [c00000000036ec28] .driver_register+0xd8/0x198 [c00000000278bbb0] [c0000000002f63d0] .__pci_register_driver+0x60/0x10c [c00000000278bc50] [d0000000001ca520] .qla2x00_module_init+0x150/0x1a0 [qla2xxx] [c00000000278bce0] [c00000000000947c] .do_one_initcall+0x80/0x1a8 [c00000000278bd90] [c0000000000a4364] .SyS_init_module+0xd8/0x244 [c00000000278be30] [c00000000000852c] syscall_exit+0x0/0x40 Instruction dump: ebe1fff8 7c0803a6 4e800020 e9230028 81490030 80090034 81690038 7d400378 7c005b78 7c000034 5400d97e 78000020 <0b000000> 81690038 e8090030 91640008 ---[ end trace f67a78811ed47c60 ]--- %Gudevd-work[1379]: '/sbin/modprobe -b pci:v00001077d00008001sv00001077sd0000017Fbc0Csc04i00' unexpected exit with status 0x0005 That line number is this: BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo | entry->msg.data)); in read_msi_msg_desc(). That BUG_ON was added by commit 2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 ("PCI: MSI: Remove unsafe and unnecessary hardware access") from the pci tree. -- Cheers, Stephen Rothwell sfr(a)canb.auug.org.au http://www.canb.auug.org.au/~sfr/
From: Ben Hutchings on 22 Jul 2010 21:30 On Fri, 2010-07-23 at 10:22 +1000, Stephen Rothwell wrote: > Hi all, > > My Power7 boot test paniced like this: (next-20100722) > > %GQLogic Fibre Channel HBA Driver: 8.03.03-k0 > qla2xxx 0002:01:00.2: enabling device (0144 -> 0146) > qla2xxx 0002:01:00.2: Found an ISP8001, irq 35, iobase 0xd000080080014000 > ------------[ cut here ]------------ > kernel BUG at drivers/pci/msi.c:205! [...] > Call Trace: > [c00000000278b270] [c000000000048d9c] .rtas_setup_msi_irqs+0x1d8/0x254 (unreliable) > [c00000000278b360] [c00000000002a9cc] .arch_setup_msi_irqs+0x34/0x4c > [c00000000278b3e0] [c0000000002fd3fc] .pci_enable_msix+0x49c/0x4ac [...] > That line number is this: > > BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo | > entry->msg.data)); > > in read_msi_msg_desc(). That BUG_ON was added by commit > 2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 ("PCI: MSI: Remove unsafe and > unnecessary hardware access") from the pci tree. I wanted to assert that read_msi_msg_desc() is only used to update MSI/MSI-X descriptors that have already been generated by Linux. It looks like you found an exception. We could make read_msi_msg() fall back to reading from the hardware, but I think that what the pSeries code is trying to do - save an MSI message generated by firmware - is different from what the other callers want. Instead we could add: void save_msi_msg(unsigned int irq) { struct irq_desc *desc = irq_to_desc(irq); struct msi_desc *entry = get_irq_desc_msi(desc); struct msi_msg *msg = &entry->msg; /* ...followed by the old implementation of read_msi_msg_desc() */ } Possibly conditional on something like CONFIG_ARCH_NEEDS_SAVE_MSI_MSG. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael Ellerman on 22 Jul 2010 22:10
On Fri, 2010-07-23 at 02:19 +0100, Ben Hutchings wrote: > On Fri, 2010-07-23 at 10:22 +1000, Stephen Rothwell wrote: > > Hi all, > > > > My Power7 boot test paniced like this: (next-20100722) > > > > %GQLogic Fibre Channel HBA Driver: 8.03.03-k0 > > qla2xxx 0002:01:00.2: enabling device (0144 -> 0146) > > qla2xxx 0002:01:00.2: Found an ISP8001, irq 35, iobase 0xd000080080014000 > > ------------[ cut here ]------------ > > kernel BUG at drivers/pci/msi.c:205! > [...] > > Call Trace: > > [c00000000278b270] [c000000000048d9c] .rtas_setup_msi_irqs+0x1d8/0x254 (unreliable) > > [c00000000278b360] [c00000000002a9cc] .arch_setup_msi_irqs+0x34/0x4c > > [c00000000278b3e0] [c0000000002fd3fc] .pci_enable_msix+0x49c/0x4ac > [...] > > That line number is this: > > > > BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo | > > entry->msg.data)); > > > > in read_msi_msg_desc(). That BUG_ON was added by commit > > 2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 ("PCI: MSI: Remove unsafe and > > unnecessary hardware access") from the pci tree. > > I wanted to assert that read_msi_msg_desc() is only used to update > MSI/MSI-X descriptors that have already been generated by Linux. It > looks like you found an exception. > > We could make read_msi_msg() fall back to reading from the hardware, but > I think that what the pSeries code is trying to do - save an MSI message > generated by firmware - is different from what the other callers want. > Instead we could add: > > void save_msi_msg(unsigned int irq) > { > struct irq_desc *desc = irq_to_desc(irq); > struct msi_desc *entry = get_irq_desc_msi(desc); > struct msi_msg *msg = &entry->msg; > > /* ...followed by the old implementation of read_msi_msg_desc() */ > } > > Possibly conditional on something like CONFIG_ARCH_NEEDS_SAVE_MSI_MSG. Maybe. But then you end up with read_msi_msg(), which doesn't actually read anything, which I think is confusing. I'd rather read_msi_msg() read the message, from the device, and we have another routine which returns the previously saved msg from the msi_desc. cheers |