Prev: [PATCH] mfd: fix check on unsigned in twl4030_configure_resource()
Next: [PATCH 03/12] Staging: rt28x0: remove private RTPRIV_IOCTL_SET ioctl
From: Gleb Natapov on 21 Oct 2009 11:30 On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote: > IRQFD currently uses a deferred workqueue item to execute the injection > operation. It was originally designed this way because kvm_set_irq() > required the caller to hold the irq_lock mutex, and the eventfd callback > is invoked from within a non-preemptible critical section. > > With the advent of lockless injection support in kvm_set_irq, the deferment > mechanism is no longer technically needed. Since context switching to the > workqueue is a source of interrupt latency, lets switch to a direct > method. > kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes. > Signed-off-by: Gregory Haskins <ghaskins(a)novell.com> > --- > > virt/kvm/eventfd.c | 15 +++------------ > 1 files changed, 3 insertions(+), 12 deletions(-) > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index 30f70fd..1a529d4 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -49,16 +49,14 @@ struct _irqfd { > poll_table pt; > wait_queue_head_t *wqh; > wait_queue_t wait; > - struct work_struct inject; > struct work_struct shutdown; > }; > > static struct workqueue_struct *irqfd_cleanup_wq; > > static void > -irqfd_inject(struct work_struct *work) > +irqfd_inject(struct _irqfd *irqfd) > { > - struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); > struct kvm *kvm = irqfd->kvm; > > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); > @@ -80,12 +78,6 @@ irqfd_shutdown(struct work_struct *work) > remove_wait_queue(irqfd->wqh, &irqfd->wait); > > /* > - * We know no new events will be scheduled at this point, so block > - * until all previously outstanding events have completed > - */ > - flush_work(&irqfd->inject); > - > - /* > * It is now safe to release the object's resources > */ > eventfd_ctx_put(irqfd->eventfd); > @@ -126,7 +118,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) > > if (flags & POLLIN) > /* An event has been signaled, inject an interrupt */ > - schedule_work(&irqfd->inject); > + irqfd_inject(irqfd); > > if (flags & POLLHUP) { > /* The eventfd is closing, detach from KVM */ > @@ -179,7 +171,6 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) > irqfd->kvm = kvm; > irqfd->gsi = gsi; > INIT_LIST_HEAD(&irqfd->list); > - INIT_WORK(&irqfd->inject, irqfd_inject); > INIT_WORK(&irqfd->shutdown, irqfd_shutdown); > > file = eventfd_fget(fd); > @@ -214,7 +205,7 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) > * before we registered, and trigger it as if we didn't miss it. > */ > if (events & POLLIN) > - schedule_work(&irqfd->inject); > + irqfd_inject(irqfd); > > /* > * do not drop the file until the irqfd is fully initialized, otherwise > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo(a)vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Gleb Natapov on 21 Oct 2009 11:40 On Wed, Oct 21, 2009 at 11:34:51AM -0400, Gregory Haskins wrote: > Gleb Natapov wrote: > > On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote: > >> IRQFD currently uses a deferred workqueue item to execute the injection > >> operation. It was originally designed this way because kvm_set_irq() > >> required the caller to hold the irq_lock mutex, and the eventfd callback > >> is invoked from within a non-preemptible critical section. > >> > >> With the advent of lockless injection support in kvm_set_irq, the deferment > >> mechanism is no longer technically needed. Since context switching to the > >> workqueue is a source of interrupt latency, lets switch to a direct > >> method. > >> > > kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes. > > Right, but irqfd by design only works with MSI (or MSI like edge > triggers) anyway. Legacy-type injections follow a different path. > Ah, If this the case and it will stay that way then the change looks OK to me. > In any case, I didn't change the locking (you did ;). You recently > patched the irqfd code to remove the irq_lock, but we still had the > deferment mechanism in place to avoid the mutex_lock from within the > POLLIN callback. Since the mutex_lock is now no longer acquired in this > path, the deferment technique is not needed either. Its only adding > overhead for no purpose. So I am simply cleaning that up to improve > interrupt performance. > > HTH, > -Greg > > -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Gregory Haskins on 21 Oct 2009 11:40 Gleb Natapov wrote: > On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote: >> IRQFD currently uses a deferred workqueue item to execute the injection >> operation. It was originally designed this way because kvm_set_irq() >> required the caller to hold the irq_lock mutex, and the eventfd callback >> is invoked from within a non-preemptible critical section. >> >> With the advent of lockless injection support in kvm_set_irq, the deferment >> mechanism is no longer technically needed. Since context switching to the >> workqueue is a source of interrupt latency, lets switch to a direct >> method. >> > kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes. Right, but irqfd by design only works with MSI (or MSI like edge triggers) anyway. Legacy-type injections follow a different path. In any case, I didn't change the locking (you did ;). You recently patched the irqfd code to remove the irq_lock, but we still had the deferment mechanism in place to avoid the mutex_lock from within the POLLIN callback. Since the mutex_lock is now no longer acquired in this path, the deferment technique is not needed either. Its only adding overhead for no purpose. So I am simply cleaning that up to improve interrupt performance. HTH, -Greg
From: Gregory Haskins on 21 Oct 2009 11:50 Gleb Natapov wrote: > On Wed, Oct 21, 2009 at 11:34:51AM -0400, Gregory Haskins wrote: >> Gleb Natapov wrote: >>> On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote: >>>> IRQFD currently uses a deferred workqueue item to execute the injection >>>> operation. It was originally designed this way because kvm_set_irq() >>>> required the caller to hold the irq_lock mutex, and the eventfd callback >>>> is invoked from within a non-preemptible critical section. >>>> >>>> With the advent of lockless injection support in kvm_set_irq, the deferment >>>> mechanism is no longer technically needed. Since context switching to the >>>> workqueue is a source of interrupt latency, lets switch to a direct >>>> method. >>>> >>> kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes. >> Right, but irqfd by design only works with MSI (or MSI like edge >> triggers) anyway. Legacy-type injections follow a different path. >> > Ah, If this the case and it will stay that way then the change looks OK > to me. I believe Avi, Michael, et. al. were in agreement with me on that design choice. I believe the reason is that there is no good way to do EOI/ACK feedback within the constraints of an eventfd pipe which would be required for the legacy pin-type interrupts. Therefore, we won't even bother trying. High-performance subsystems will use irqfd/msi, and legacy emulation can use the existing injection code (which includes the necessary feedback for ack/eoi). To that point, perhaps it should be better documented. If we need a v2, I will add a comment. Kind Regards, -Greg
From: Avi Kivity on 22 Oct 2009 11:10
On 10/21/2009 05:42 PM, Gregory Haskins wrote: > I believe Avi, Michael, et. al. were in agreement with me on that design > choice. I believe the reason is that there is no good way to do EOI/ACK > feedback within the constraints of an eventfd pipe which would be > required for the legacy pin-type interrupts. Therefore, we won't even > bother trying. High-performance subsystems will use irqfd/msi, and > legacy emulation can use the existing injection code (which includes the > necessary feedback for ack/eoi). > > Right. But we don't actually prevent anyone using non-msi with irqfd, which can trigger the bad lock usage from irq context, with a nice boom afterwards. So we need to either prevent it during registration, or to gracefully handle it afterwards. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |