KVM: Directly inject interrupts via irqfd [Kernel]

Prev: [PATCH] mfd: fix check on unsigned in twl4030_configure_resource()
Next: [PATCH 03/12] Staging: rt28x0: remove private RTPRIV_IOCTL_SET ioctl

From: Gleb Natapov on 21 Oct 2009 11:30

On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote:
> IRQFD currently uses a deferred workqueue item to execute the injection
> operation. It was originally designed this way because kvm_set_irq()
> required the caller to hold the irq_lock mutex, and the eventfd callback
> is invoked from within a non-preemptible critical section.
>
> With the advent of lockless injection support in kvm_set_irq, the deferment
> mechanism is no longer technically needed. Since context switching to the
> workqueue is a source of interrupt latency, lets switch to a direct
> method.
>
kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes.

> Signed-off-by: Gregory Haskins <ghaskins(a)novell.com>
> ---
>
> virt/kvm/eventfd.c | 15 +++------------
> 1 files changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 30f70fd..1a529d4 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -49,16 +49,14 @@ struct _irqfd {
> poll_table pt;
> wait_queue_head_t *wqh;
> wait_queue_t wait;
> - struct work_struct inject;
> struct work_struct shutdown;
> };
>
> static struct workqueue_struct *irqfd_cleanup_wq;
>
> static void
> -irqfd_inject(struct work_struct *work)
> +irqfd_inject(struct _irqfd *irqfd)
> {
> - struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> struct kvm *kvm = irqfd->kvm;
>
> kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> @@ -80,12 +78,6 @@ irqfd_shutdown(struct work_struct *work)
> remove_wait_queue(irqfd->wqh, &irqfd->wait);
>
> /*
> - * We know no new events will be scheduled at this point, so block
> - * until all previously outstanding events have completed
> - */
> - flush_work(&irqfd->inject);
> -
> - /*
> * It is now safe to release the object's resources
> */
> eventfd_ctx_put(irqfd->eventfd);
> @@ -126,7 +118,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
>
> if (flags & POLLIN)
> /* An event has been signaled, inject an interrupt */
> - schedule_work(&irqfd->inject);
> + irqfd_inject(irqfd);
>
> if (flags & POLLHUP) {
> /* The eventfd is closing, detach from KVM */
> @@ -179,7 +171,6 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> irqfd->kvm = kvm;
> irqfd->gsi = gsi;
> INIT_LIST_HEAD(&irqfd->list);
> - INIT_WORK(&irqfd->inject, irqfd_inject);
> INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
>
> file = eventfd_fget(fd);
> @@ -214,7 +205,7 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> * before we registered, and trigger it as if we didn't miss it.
> */
> if (events & POLLIN)
> - schedule_work(&irqfd->inject);
> + irqfd_inject(irqfd);
>
> /*
> * do not drop the file until the irqfd is fully initialized, otherwise
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Gleb Natapov on 21 Oct 2009 11:40

On Wed, Oct 21, 2009 at 11:34:51AM -0400, Gregory Haskins wrote:
> Gleb Natapov wrote:
> > On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote:
> >> IRQFD currently uses a deferred workqueue item to execute the injection
> >> operation. It was originally designed this way because kvm_set_irq()
> >> required the caller to hold the irq_lock mutex, and the eventfd callback
> >> is invoked from within a non-preemptible critical section.
> >>
> >> With the advent of lockless injection support in kvm_set_irq, the deferment
> >> mechanism is no longer technically needed. Since context switching to the
> >> workqueue is a source of interrupt latency, lets switch to a direct
> >> method.
> >>
> > kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes.
>
> Right, but irqfd by design only works with MSI (or MSI like edge
> triggers) anyway. Legacy-type injections follow a different path.
>
Ah, If this the case and it will stay that way then the change looks OK
to me.

> In any case, I didn't change the locking (you did ;). You recently
> patched the irqfd code to remove the irq_lock, but we still had the
> deferment mechanism in place to avoid the mutex_lock from within the
> POLLIN callback. Since the mutex_lock is now no longer acquired in this
> path, the deferment technique is not needed either. Its only adding
> overhead for no purpose. So I am simply cleaning that up to improve
> interrupt performance.
>
> HTH,
> -Greg
>
>

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Gregory Haskins on 21 Oct 2009 11:40

Gleb Natapov wrote:
> On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote:
>> IRQFD currently uses a deferred workqueue item to execute the injection
>> operation. It was originally designed this way because kvm_set_irq()
>> required the caller to hold the irq_lock mutex, and the eventfd callback
>> is invoked from within a non-preemptible critical section.
>>
>> With the advent of lockless injection support in kvm_set_irq, the deferment
>> mechanism is no longer technically needed. Since context switching to the
>> workqueue is a source of interrupt latency, lets switch to a direct
>> method.
>>
> kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes.

Right, but irqfd by design only works with MSI (or MSI like edge
triggers) anyway. Legacy-type injections follow a different path.

In any case, I didn't change the locking (you did ;). You recently
patched the irqfd code to remove the irq_lock, but we still had the
deferment mechanism in place to avoid the mutex_lock from within the
POLLIN callback. Since the mutex_lock is now no longer acquired in this
path, the deferment technique is not needed either. Its only adding
overhead for no purpose. So I am simply cleaning that up to improve
interrupt performance.

HTH,
-Greg

From: Gregory Haskins on 21 Oct 2009 11:50

Gleb Natapov wrote:
> On Wed, Oct 21, 2009 at 11:34:51AM -0400, Gregory Haskins wrote:
>> Gleb Natapov wrote:
>>> On Wed, Oct 21, 2009 at 10:34:53AM -0400, Gregory Haskins wrote:
>>>> IRQFD currently uses a deferred workqueue item to execute the injection
>>>> operation. It was originally designed this way because kvm_set_irq()
>>>> required the caller to hold the irq_lock mutex, and the eventfd callback
>>>> is invoked from within a non-preemptible critical section.
>>>>
>>>> With the advent of lockless injection support in kvm_set_irq, the deferment
>>>> mechanism is no longer technically needed. Since context switching to the
>>>> workqueue is a source of interrupt latency, lets switch to a direct
>>>> method.
>>>>
>>> kvm_set_irq is fully lockless only in MSI case. IOAPIC/PIC has mutexes.
>> Right, but irqfd by design only works with MSI (or MSI like edge
>> triggers) anyway. Legacy-type injections follow a different path.
>>
> Ah, If this the case and it will stay that way then the change looks OK
> to me.

I believe Avi, Michael, et. al. were in agreement with me on that design
choice. I believe the reason is that there is no good way to do EOI/ACK
feedback within the constraints of an eventfd pipe which would be
required for the legacy pin-type interrupts. Therefore, we won't even
bother trying. High-performance subsystems will use irqfd/msi, and
legacy emulation can use the existing injection code (which includes the
necessary feedback for ack/eoi).

To that point, perhaps it should be better documented. If we need a v2,
I will add a comment.

Kind Regards,
-Greg

From: Avi Kivity on 22 Oct 2009 11:10

On 10/21/2009 05:42 PM, Gregory Haskins wrote:
> I believe Avi, Michael, et. al. were in agreement with me on that design
> choice. I believe the reason is that there is no good way to do EOI/ACK
> feedback within the constraints of an eventfd pipe which would be
> required for the legacy pin-type interrupts. Therefore, we won't even
> bother trying. High-performance subsystems will use irqfd/msi, and
> legacy emulation can use the existing injection code (which includes the
> necessary feedback for ack/eoi).
>
>

Right. But we don't actually prevent anyone using non-msi with irqfd,
which can trigger the bad lock usage from irq context, with a nice boom
afterwards. So we need to either prevent it during registration, or to
gracefully handle it afterwards.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2
Prev: [PATCH] mfd: fix check on unsigned in twl4030_configure_resource()
Next: [PATCH 03/12] Staging: rt28x0: remove private RTPRIV_IOCTL_SET ioctl