From: Joerg Roedel on 2 Jun 2010 06:30 On Wed, Jun 02, 2010 at 01:15:34PM +0300, Michael S. Tsirkin wrote: > One of the issues I see with the current patch is that > it uses the mlock rlimit to do this pinning. So this wastes the rlimit > for an app that did mlockall already, and also consumes > this resource transparently, so an app might call mlock > on a small buffer and be surprised that it fails. > > Using mmu notifiers might help? MMU notifiers are problematic because they are designed for situations where we can do demand paging. The invalidate_range_start and invalidate_range_end functions are not only called on munmap, they also run when mprotect is called (in which case we don't want to tear down iommu mappings). So what may happen with mmu notifiers is that we accidentially tear down iommu mappings. With demand-paging this is no problem because the io-ptes could be re-faulted. But that does not work here. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on 2 Jun 2010 06:40 On Wed, Jun 02, 2010 at 01:21:44PM +0300, Michael S. Tsirkin wrote: > On Wed, Jun 02, 2010 at 12:19:40PM +0200, Joerg Roedel wrote: > > It can. The worst thing that can happen is an io-page-fault. > > devices might not be able to recover from this. With the userspace interface a process can create io-page-faults anyway if it wants. We can't protect us from this. And the process is also responsible to not tear down iommu-mappings that are currently in use. > What you describe below does 3 ioctls for what can be done with 1. The second IOMMU_MAP ioctl is just to show that existing mappings would be destroyed if the device is assigned to another address space. Not strictly necessary. So we have two ioctls but save one call to create the iommu-domain. > > ioctl(dev2, IOMMU_SHARE, dev1); /* destroys all mapping for dev2 and > > assigns it to the same domain as > > dev1. Domain has a refcount of two > > now */ > > Or maybe it destroys mapping for dev1? > How do you remember? Because we express here that "dev2 shares the iommu mappings of dev1". Thats easy to remember. > Also, no way to unshare? That seems limiting. Just left out for simplicity reasons. An IOMMU_UNBIND (no IOMMU_UNSHARE because that would require a second parameter) ioctl is certainly also required. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on 2 Jun 2010 06:50 On Wed, Jun 02, 2010 at 12:19:40PM +0200, Joerg Roedel wrote: > On Wed, Jun 02, 2010 at 12:53:12PM +0300, Michael S. Tsirkin wrote: > > On Wed, Jun 02, 2010 at 11:42:01AM +0200, Joerg Roedel wrote: > > > > IMO a seperate iommu-userspace driver is a nightmare for a userspace > > > interface. It is just too complicated to use. > > > > One advantage would be that we can reuse the uio framework > > for the devices themselves. So an existing app can just program > > an iommu for DMA and keep using uio for interrupts and access. > > The driver is called UIO and not U-INTR-MMIO ;-) So I think handling > IOMMU mappings belongs there. Maybe it could be put there but the patch posted did not use uio. And one of the reasons is that uio framework provides for device access and interrupts but not for programming memory mappings. Solutions (besides giving up on uio completely) could include extending the framework in some way (which was tried, but the result was not pretty) or adding a separate driver for iommu and binding to that. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on 2 Jun 2010 07:20 On Wed, Jun 02, 2010 at 01:38:28PM +0300, Michael S. Tsirkin wrote: > On Wed, Jun 02, 2010 at 12:35:16PM +0200, Joerg Roedel wrote: > > With the userspace interface a process can create io-page-faults > > anyway if it wants. We can't protect us from this. > > We could fail all operations until an iommu is bound. This will help > catch bugs with access before setup. We can not do this if a domain is > bound by default. Even if it is bound to a domain the userspace driver could program the device to do dma to unmapped regions causing io-page-faults. The kernel can't do anything about it. > > The second IOMMU_MAP ioctl is just to show that existing mappings would > > be destroyed if the device is assigned to another address space. Not > > strictly necessary. So we have two ioctls but save one call to create > > the iommu-domain. > > With 10 devices you have 10 extra ioctls. And this works implicitly with your proposal? Remember that we still need to be able to provide seperate mappings for each device to support IOMMU emulation for the guest. I think my proposal does not have any extra costs. > > Because we express here that "dev2 shares the iommu mappings of dev1". > > Thats easy to remember. > > they both share the mappings. which one gets the iommu > destroyed (breaking the device if it is now doing DMA)? As I wrote the domain has a reference count and is destroyed only when it goes down to zero. This does not happen as long as a device is bound to it. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on 2 Jun 2010 07:30
On Wed, Jun 02, 2010 at 01:12:25PM +0200, Joerg Roedel wrote: > On Wed, Jun 02, 2010 at 01:38:28PM +0300, Michael S. Tsirkin wrote: > > On Wed, Jun 02, 2010 at 12:35:16PM +0200, Joerg Roedel wrote: > > > > With the userspace interface a process can create io-page-faults > > > anyway if it wants. We can't protect us from this. > > > > We could fail all operations until an iommu is bound. This will help > > catch bugs with access before setup. We can not do this if a domain is > > bound by default. > > Even if it is bound to a domain the userspace driver could program the > device to do dma to unmapped regions causing io-page-faults. The kernel > can't do anything about it. It can always corrupt its own memory directly as well :) But that is not a reason not to detect errors if we can, and not to make APIs hard to misuse. > > > The second IOMMU_MAP ioctl is just to show that existing mappings would > > > be destroyed if the device is assigned to another address space. Not > > > strictly necessary. So we have two ioctls but save one call to create > > > the iommu-domain. > > > > With 10 devices you have 10 extra ioctls. > > And this works implicitly with your proposal? Yes. so you do: iommu = open ioctl(dev1, BIND, iommu) ioctl(dev2, BIND, iommu) ioctl(dev3, BIND, iommu) ioctl(dev4, BIND, iommu) No need to add a SHARE ioctl. > Remember that we still > need to be able to provide seperate mappings for each device to support > IOMMU emulation for the guest. Generally not true. E.g. guest can enable iommu passthrough or have domain per a group of devices. > I think my proposal does not have any > extra costs. with my proposal we have 1 ioctl per device + 1 per domain. with yours we have 2 ioctls per device is iommu is shared and 1 if it is not shared. as current apps share iommu it seems to make sense to optimize for that. > > > Because we express here that "dev2 shares the iommu mappings of dev1". > > > Thats easy to remember. > > > > they both share the mappings. which one gets the iommu > > destroyed (breaking the device if it is now doing DMA)? > > As I wrote the domain has a reference count and is destroyed only when > it goes down to zero. This does not happen as long as a device is bound > to it. > > Joerg We were talking about UNSHARE ioctl: ioctl(dev1, UNSHARE, dev2) Does it change the domain for dev1 or dev2? If you make a mistake you get a hard to debug bug. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |