vhost_net: a kernel-level virtio server [Kernel]

Prev: bfa: Brocade BFA FC SCSI driver (bfad)
Next: Atheros Linux wireless drivers home page - and two new driver projects

From: Gregory Haskins on 15 Sep 2009 10:00

Avi Kivity wrote:
> On 09/15/2009 04:03 PM, Gregory Haskins wrote:
>>
>>> In this case the x86 is the owner and the ppc boards use translated
>>> access. Just switch drivers and device and it falls into place.
>>>
>>>
>> You could switch vbus roles as well, I suppose.
>
> Right, there's not real difference in this regard.
>
>> Another potential
>> option is that he can stop mapping host memory on the guest so that it
>> follows the more traditional model. As a bus-master device, the ppc
>> boards should have access to any host memory at least in the GFP_DMA
>> range, which would include all relevant pointers here.
>>
>> I digress: I was primarily addressing the concern that Ira would need
>> to manage the "host" side of the link using hvas mapped from userspace
>> (even if host side is the ppc boards). vbus abstracts that access so as
>> to allow something other than userspace/hva mappings. OTOH, having each
>> ppc board run a userspace app to do the mapping on its behalf and feed
>> it to vhost is probably not a huge deal either. Where vhost might
>> really fall apart is when any assumptions about pageable memory occur,
>> if any.
>>
>
> Why? vhost will call get_user_pages() or copy_*_user() which ought to
> do the right thing.

I was speaking generally, not specifically to Ira's architecture. What
I mean is that vbus was designed to work without assuming that the
memory is pageable. There are environments in which the host is not
capable of mapping hvas/*page, but the memctx->copy_to/copy_from
paradigm could still work (think rdma, for instance).

>
>> As an aside: a bigger issue is that, iiuc, Ira wants more than a single
>> ethernet channel in his design (multiple ethernets, consoles, etc). A
>> vhost solution in this environment is incomplete.
>>
>
> Why? Instantiate as many vhost-nets as needed.

a) what about non-ethernets?
b) what do you suppose this protocol to aggregate the connections would
look like? (hint: this is what a vbus-connector does).
c) how do you manage the configuration, especially on a per-board basis?

>
>> Note that Ira's architecture highlights that vbus's explicit management
>> interface is more valuable here than it is in KVM, since KVM already has
>> its own management interface via QEMU.
>>
>
> vhost-net and vbus both need management, vhost-net via ioctls and vbus
> via configfs.

Actually I have patches queued to allow vbus to be managed via ioctls as
well, per your feedback (and it solves the permissions/lifetime
critisims in alacrityvm-v0.1).

> The only difference is the implementation. vhost-net
> leaves much more to userspace, that's the main difference.

Also,

*) vhost is virtio-net specific, whereas vbus is a more generic device
model where thing like virtio-net or venet ride on top.

*) vhost is only designed to work with environments that look very
similar to a KVM guest (slot/hva translatable). vbus can bridge various
environments by abstracting the key components (such as memory access).

*) vhost requires an active userspace management daemon, whereas vbus
can be driven by transient components, like scripts (ala udev)

Kind Regards,
-Greg

From: Michael S. Tsirkin on 15 Sep 2009 10:40

On Tue, Sep 15, 2009 at 09:50:39AM -0400, Gregory Haskins wrote:
> Avi Kivity wrote:
> > On 09/15/2009 04:03 PM, Gregory Haskins wrote:
> >>
> >>> In this case the x86 is the owner and the ppc boards use translated
> >>> access. Just switch drivers and device and it falls into place.
> >>>
> >>>
> >> You could switch vbus roles as well, I suppose.
> >
> > Right, there's not real difference in this regard.
> >
> >> Another potential
> >> option is that he can stop mapping host memory on the guest so that it
> >> follows the more traditional model. As a bus-master device, the ppc
> >> boards should have access to any host memory at least in the GFP_DMA
> >> range, which would include all relevant pointers here.
> >>
> >> I digress: I was primarily addressing the concern that Ira would need
> >> to manage the "host" side of the link using hvas mapped from userspace
> >> (even if host side is the ppc boards). vbus abstracts that access so as
> >> to allow something other than userspace/hva mappings. OTOH, having each
> >> ppc board run a userspace app to do the mapping on its behalf and feed
> >> it to vhost is probably not a huge deal either. Where vhost might
> >> really fall apart is when any assumptions about pageable memory occur,
> >> if any.
> >>
> >
> > Why? vhost will call get_user_pages() or copy_*_user() which ought to
> > do the right thing.
>
> I was speaking generally, not specifically to Ira's architecture. What
> I mean is that vbus was designed to work without assuming that the
> memory is pageable. There are environments in which the host is not
> capable of mapping hvas/*page, but the memctx->copy_to/copy_from
> paradigm could still work (think rdma, for instance).

rdma interfaces are typically asynchronous, so blocking
copy_from/copy_to can be made to work, but likely won't work
that well. DMA might work better if it is asynchronous as well.

Assuming a synchronous copy is what we need - maybe the issue is that
there aren't good APIs for x86/ppc communication? If so, sticking them in
vhost might not be the best place. Maybe the specific platform can
redefine copy_to/from_user to do the right thing? Or, maybe add another
API for that ...

> >
> >> As an aside: a bigger issue is that, iiuc, Ira wants more than a single
> >> ethernet channel in his design (multiple ethernets, consoles, etc). A
> >> vhost solution in this environment is incomplete.
> >>
> >
> > Why? Instantiate as many vhost-nets as needed.
>
> a) what about non-ethernets?

vhost-net actually does not care.
the packet is passed on to a socket, we are done.

> b) what do you suppose this protocol to aggregate the connections would
> look like? (hint: this is what a vbus-connector does).

You are talking about management protocol between ppc and x86, right?
One wonders why does it have to be in kernel at all.

> c) how do you manage the configuration, especially on a per-board basis?

not sure what a board is, but configuration is done in userspace.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 15 Sep 2009 11:10

On 09/15/2009 04:50 PM, Gregory Haskins wrote:
>> Why? vhost will call get_user_pages() or copy_*_user() which ought to
>> do the right thing.
>>
> I was speaking generally, not specifically to Ira's architecture. What
> I mean is that vbus was designed to work without assuming that the
> memory is pageable. There are environments in which the host is not
> capable of mapping hvas/*page, but the memctx->copy_to/copy_from
> paradigm could still work (think rdma, for instance).
>

Sure, vbus is more flexible here.

>>> As an aside: a bigger issue is that, iiuc, Ira wants more than a single
>>> ethernet channel in his design (multiple ethernets, consoles, etc). A
>>> vhost solution in this environment is incomplete.
>>>
>>>
>> Why? Instantiate as many vhost-nets as needed.
>>
> a) what about non-ethernets?
>

There's virtio-console, virtio-blk etc. None of these have kernel-mode
servers, but these could be implemented if/when needed.

> b) what do you suppose this protocol to aggregate the connections would
> look like? (hint: this is what a vbus-connector does).
>

You mean multilink? You expose the device as a multiqueue.

> c) how do you manage the configuration, especially on a per-board basis?
>

pci (for kvm/x86).

> Actually I have patches queued to allow vbus to be managed via ioctls as
> well, per your feedback (and it solves the permissions/lifetime
> critisims in alacrityvm-v0.1).
>

That will make qemu integration easier.

>> The only difference is the implementation. vhost-net
>> leaves much more to userspace, that's the main difference.
>>
> Also,
>
> *) vhost is virtio-net specific, whereas vbus is a more generic device
> model where thing like virtio-net or venet ride on top.
>

I think vhost-net is separated into vhost and vhost-net.

> *) vhost is only designed to work with environments that look very
> similar to a KVM guest (slot/hva translatable). vbus can bridge various
> environments by abstracting the key components (such as memory access).
>

Yes. virtio is really virtualization oriented.

> *) vhost requires an active userspace management daemon, whereas vbus
> can be driven by transient components, like scripts (ala udev)
>

vhost by design leaves configuration and handshaking to userspace. I
see it as an advantage.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Gregory Haskins on 15 Sep 2009 16:10

Avi Kivity wrote:
> On 09/15/2009 04:50 PM, Gregory Haskins wrote:
>>> Why? vhost will call get_user_pages() or copy_*_user() which ought to
>>> do the right thing.
>>>
>> I was speaking generally, not specifically to Ira's architecture. What
>> I mean is that vbus was designed to work without assuming that the
>> memory is pageable. There are environments in which the host is not
>> capable of mapping hvas/*page, but the memctx->copy_to/copy_from
>> paradigm could still work (think rdma, for instance).
>>
>
> Sure, vbus is more flexible here.
>
>>>> As an aside: a bigger issue is that, iiuc, Ira wants more than a single
>>>> ethernet channel in his design (multiple ethernets, consoles, etc). A
>>>> vhost solution in this environment is incomplete.
>>>>
>>>>
>>> Why? Instantiate as many vhost-nets as needed.
>>>
>> a) what about non-ethernets?
>>
>
> There's virtio-console, virtio-blk etc. None of these have kernel-mode
> servers, but these could be implemented if/when needed.

IIUC, Ira already needs at least ethernet and console capability.

>
>> b) what do you suppose this protocol to aggregate the connections would
>> look like? (hint: this is what a vbus-connector does).
>>
>
> You mean multilink? You expose the device as a multiqueue.

No, what I mean is how do you surface multiple ethernet and consoles to
the guests? For Ira's case, I think he needs at minimum at least one of
each, and he mentioned possibly having two unique ethernets at one point.

His slave boards surface themselves as PCI devices to the x86
host. So how do you use that to make multiple vhost-based devices (say
two virtio-nets, and a virtio-console) communicate across the transport?

There are multiple ways to do this, but what I am saying is that
whatever is conceived will start to look eerily like a vbus-connector,
since this is one of its primary purposes ;)

>
>> c) how do you manage the configuration, especially on a per-board basis?
>>
>
> pci (for kvm/x86).

Ok, for kvm understood (and I would also add "qemu" to that mix). But
we are talking about vhost's application in a non-kvm environment here,
right?.

So if the vhost-X devices are in the "guest", and the x86 board is just
a slave...How do you tell each ppc board how many devices and what
config (e.g. MACs, etc) to instantiate? Do you assume that they should
all be symmetric and based on positional (e.g. slot) data? What if you
want asymmetric configurations (if not here, perhaps in a different
environment)?

>
>> Actually I have patches queued to allow vbus to be managed via ioctls as
>> well, per your feedback (and it solves the permissions/lifetime
>> critisims in alacrityvm-v0.1).
>>
>
> That will make qemu integration easier.
>
>>> The only difference is the implementation. vhost-net
>>> leaves much more to userspace, that's the main difference.
>>>
>> Also,
>>
>> *) vhost is virtio-net specific, whereas vbus is a more generic device
>> model where thing like virtio-net or venet ride on top.
>>
>
> I think vhost-net is separated into vhost and vhost-net.

Thats good.

>
>> *) vhost is only designed to work with environments that look very
>> similar to a KVM guest (slot/hva translatable). vbus can bridge various
>> environments by abstracting the key components (such as memory access).
>>
>
> Yes. virtio is really virtualization oriented.

I would say that its vhost in particular that is virtualization
oriented. virtio, as a concept, generally should work in physical
systems, if perhaps with some minor modifications. The biggest "limit"
is having "virt" in its name ;)

>
>> *) vhost requires an active userspace management daemon, whereas vbus
>> can be driven by transient components, like scripts (ala udev)
>>
>
> vhost by design leaves configuration and handshaking to userspace. I
> see it as an advantage.

The misconception here is that vbus by design _doesn't define_ where
configuration/handshaking happens. It is primarily implemented by a
modular component called a "vbus-connector", and _I_ see this
flexibility as an advantage. vhost on the other hand depends on a
active userspace component and a slots/hva memory design, which is more
limiting in where it can be used and forces you to split the logic.
However, I think we both more or less agree on this point already.

For the record, vbus itself is simply a resource container for
virtual-devices, which provides abstractions for the various points of
interest to generalizing PV (memory, signals, etc) and the proper
isolation and protection guarantees. What you do with it is defined by
the modular virtual-devices (e.g. virtion-net, venet, sched, hrt, scsi,
rdma, etc) and vbus-connectors (vbus-kvm, etc) you plug into it.

As an example, you could emulate the vhost design in vbus by writing a
"vbus-vhost" connector. This connector would be very thin and terminate
locally in QEMU. It would provide a ioctl-based verb namespace similar
to the existing vhost verbs we have today. QEMU would then similarly
reflect the vbus-based virtio device as a PCI device to the guest, so
that virtio-pci works unmodified.

You would then have most of the advantages of the work I have done for
commoditizing/abstracting the key points for in-kernel PV, like the
memctx. In addition, much of the work could be reused in multiple
environments since any vbus-compliant device model that is plugged into
the framework would work with any connector that is plugged in (e.g.
vbus-kvm (alacrityvm), vbus-vhost (KVM), and "vbus-ira").

The only tradeoff is in features offered by the connector (e.g.
vbus-vhost has the advantage that existing PV guests can continue to
work unmodified, vbus-kvm has the advantage that it supports new
features like generic shared memory, non-virtio based devices,
priortizable interrupts, no dependencies on PCI for non PCI guests, etc).

Kind Regards,
-Greg

From: Michael S. Tsirkin on 15 Sep 2009 16:50

On Tue, Sep 15, 2009 at 04:08:23PM -0400, Gregory Haskins wrote:
> No, what I mean is how do you surface multiple ethernet and consoles to
> the guests? For Ira's case, I think he needs at minimum at least one of
> each, and he mentioned possibly having two unique ethernets at one point.
>
> His slave boards surface themselves as PCI devices to the x86
> host. So how do you use that to make multiple vhost-based devices (say
> two virtio-nets, and a virtio-console) communicate across the transport?
>
> There are multiple ways to do this, but what I am saying is that
> whatever is conceived will start to look eerily like a vbus-connector,
> since this is one of its primary purposes ;)

Can't all this be in userspace?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: bfa: Brocade BFA FC SCSI driver (bfad)
Next: Atheros Linux wireless drivers home page - and two new driver projects