AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40 BUG: unable to handle kernel NULL pointer dereference at 0000000000000198 [Kernel]

Prev: AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40 BUG: unable to handle kernel NULL pointer dereference at 0000000000000198
Next: [PATCH] nfs: Add "lookupcache" to displayed mount options

From: Joerg Roedel on 10 Aug 2010 14:10

On Tue, Aug 10, 2010 at 06:57:45PM +0200, Sander Eikelenboom wrote:
> The requested info is attached.
> So that would mean a bios problem ? (those are not on my wishlist :-p)

Yeah, looks like a BIOS problem. But the driver should handle that
without crashing the system, so there is a bug in the driver too.

Problem is:

AMD-Vi: DEV_ALIAS_RANGE devid: 0a:01.0 flags: 00 devid_to: 0a:00.0
AMD-Vi: DEV_RANGE_END devid: 0a:1f.7

This means that PCI devices from 0a:01.0 to 0a:1f.7 may use their own
device-id or 0a:00.0. But a device which id 0a:00.0 is not present in
the system. From the lspci output this looks like your USB3 controler
should alias to 09:00.0. I prepare a patch for you to fix the crash but
I can't guarantee that your USB3 controler will work afterwards. If you
see IO-Page-Faults please report them to me.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Joerg Roedel on 10 Aug 2010 16:30

On Tue, Aug 10, 2010 at 08:05:14PM +0200, Sander Eikelenboom wrote:
> Could you also provide a perhaps more specific message what is wrong
> with the bios, that i could forward to MSI, in the hope it will reach
> the bios engineers someday ? :-)

Lets first prove that my theory is right before contacting MSI directly.
Can you try the attached patch? it should fix the boot-crash. When the
system booted successfully please try some USB device (make sure it uses
the seperate usb-controler, I guess the seperate device is responsible
for USB 3, so try to plug a device into one of your USB 3 ports).
If you finished that please send me whether it worked or not and the
full dmesg output of the system.

Joerg

From: Joerg Roedel on 10 Aug 2010 16:50

Hi Sander,

On Tue, Aug 10, 2010 at 10:36:35PM +0200, Sander Eikelenboom wrote:
> Errr which seperate usb controller ? .. it has actually:
> - 1 pci-e usb 2.0 controller
> - 2 pci-e usb 3.0 controller (one of which includes a sata controller as well)

The devices should be attached to this controler:

0a:01.0 USB Controller [0c03]: NEC Corporation USB [1033:0035] (rev 43) (prog-if 10 [OHCI])
0a:01.1 USB Controller [0c03]: NEC Corporation USB [1033:0035] (rev 43) (prog-if 10 [OHCI])
0a:01.2 USB Controller [0c03]: NEC Corporation USB 2.0 [1033:00e0] (rev 04) (prog-if 20 [EHCI])

The PCI devices associated with that controler alias to 0a:00.0 which
does not exist in your system (hence the crash). And the fact that these
devices have an alias makes me believe that the BIOS detects them as
legacy PCI devices. PCI-e does typically not has aliases. Can you send
lcpi -t output to see to which upstream bridge these devices are
connected to?

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Joerg Roedel on 10 Aug 2010 17:30

On Tue, Aug 10, 2010 at 10:57:26PM +0200, Sander Eikelenboom wrote:
> Hmmm the fun part seems to be .. that the usb devices on that usb2
> controller seemed to work fine on Xen.

Hmm, thats weird. In this case these devices probably do not alias at
all. But lets wait for the results when you test my patch.

> +-0a.0-[0000:09-0a]----00.0-[0000:0a]--+-01.0
> | +-01.1
> | \-01.2

Yeah, device 09:00.0 is a PCIe-to-PCI bridge and the addtional USB
controlers are behind that bridge as legacy PCI devices. Thats why the
BIOS sets up the alias-entry. It should set up 09:00.0 instead of
0a:00.0 to make things work correctly.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Joerg Roedel on 10 Aug 2010 18:10

Ok,

On Tue, Aug 10, 2010 at 11:36:59PM +0200, Sander Eikelenboom wrote:
> It boots now, dmesg attached.

AMD-Vi: Event logged [IO_PAGE_FAULT device=0a:00.0 domain=0x0000 address=0x0000000000001080 flags=0x0070]

So it indeed uses 0a:00.0 as the device id. Thats weird but states that
the BIOS is actually ok. I need to fix that in the driver.

Thanks,

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2
Prev: AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40 BUG: unable to handle kernel NULL pointer dereference at 0000000000000198
Next: [PATCH] nfs: Add "lookupcache" to displayed mount options