vhost_net: a kernel-level virtio server [Kernel]

Prev: bfa: Brocade BFA FC SCSI driver (bfad)
Next: Atheros Linux wireless drivers home page - and two new driver projects

From: Avi Kivity on 16 Sep 2009 04:30

On 09/15/2009 11:08 PM, Gregory Haskins wrote:
>
>> There's virtio-console, virtio-blk etc. None of these have kernel-mode
>> servers, but these could be implemented if/when needed.
>>
> IIUC, Ira already needs at least ethernet and console capability.
>
>

He's welcome to pick up the necessary code from qemu.

>>> b) what do you suppose this protocol to aggregate the connections would
>>> look like? (hint: this is what a vbus-connector does).
>>>
>>>
>> You mean multilink? You expose the device as a multiqueue.
>>
> No, what I mean is how do you surface multiple ethernet and consoles to
> the guests? For Ira's case, I think he needs at minimum at least one of
> each, and he mentioned possibly having two unique ethernets at one point.
>

You instantiate multiple vhost-nets. Multiple ethernet NICs is a
supported configuration for kvm.

> His slave boards surface themselves as PCI devices to the x86
> host. So how do you use that to make multiple vhost-based devices (say
> two virtio-nets, and a virtio-console) communicate across the transport?
>

I don't really see the difference between 1 and N here.

> There are multiple ways to do this, but what I am saying is that
> whatever is conceived will start to look eerily like a vbus-connector,
> since this is one of its primary purposes ;)
>

I'm not sure if you're talking about the configuration interface or data
path here.

>>> c) how do you manage the configuration, especially on a per-board basis?
>>>
>>>
>> pci (for kvm/x86).
>>
> Ok, for kvm understood (and I would also add "qemu" to that mix). But
> we are talking about vhost's application in a non-kvm environment here,
> right?.
>
> So if the vhost-X devices are in the "guest",

They aren't in the "guest". The best way to look at it is

- a device side, with a dma engine: vhost-net
- a driver side, only accessing its own memory: virtio-net

Given that Ira's config has the dma engine in the ppc boards, that's
where vhost-net would live (the ppc boards acting as NICs to the x86
board, essentially).

> and the x86 board is just
> a slave...How do you tell each ppc board how many devices and what
> config (e.g. MACs, etc) to instantiate? Do you assume that they should
> all be symmetric and based on positional (e.g. slot) data? What if you
> want asymmetric configurations (if not here, perhaps in a different
> environment)?
>

I have no idea, that's for Ira to solve. If he could fake the PCI
config space as seen by the x86 board, he would just show the normal pci
config and use virtio-pci (multiple channels would show up as a
multifunction device). Given he can't, he needs to tunnel the virtio
config space some other way.

>> Yes. virtio is really virtualization oriented.
>>
> I would say that its vhost in particular that is virtualization
> oriented. virtio, as a concept, generally should work in physical
> systems, if perhaps with some minor modifications. The biggest "limit"
> is having "virt" in its name ;)
>

Let me rephrase. The virtio developers are virtualization oriented. If
it works for non-virt applications, that's good, but not a design goal.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Gregory Haskins on 16 Sep 2009 08:00

Avi Kivity wrote:
> On 09/15/2009 11:08 PM, Gregory Haskins wrote:
>>
>>> There's virtio-console, virtio-blk etc. None of these have kernel-mode
>>> servers, but these could be implemented if/when needed.
>>>
>> IIUC, Ira already needs at least ethernet and console capability.
>>
>>
>
> He's welcome to pick up the necessary code from qemu.

The problem isn't where to find the models...the problem is how to
aggregate multiple models to the guest.

>
>>>> b) what do you suppose this protocol to aggregate the connections would
>>>> look like? (hint: this is what a vbus-connector does).
>>>>
>>>>
>>> You mean multilink? You expose the device as a multiqueue.
>>>
>> No, what I mean is how do you surface multiple ethernet and consoles to
>> the guests? For Ira's case, I think he needs at minimum at least one of
>> each, and he mentioned possibly having two unique ethernets at one point.
>>
>
> You instantiate multiple vhost-nets. Multiple ethernet NICs is a
> supported configuration for kvm.

But this is not KVM.

>
>> His slave boards surface themselves as PCI devices to the x86
>> host. So how do you use that to make multiple vhost-based devices (say
>> two virtio-nets, and a virtio-console) communicate across the transport?
>>
>
> I don't really see the difference between 1 and N here.

A KVM surfaces N virtio-devices as N pci-devices to the guest. What do
we do in Ira's case where the entire guest represents itself as a PCI
device to the host, and nothing the other way around?

>
>> There are multiple ways to do this, but what I am saying is that
>> whatever is conceived will start to look eerily like a vbus-connector,
>> since this is one of its primary purposes ;)
>>
>
> I'm not sure if you're talking about the configuration interface or data
> path here.

I am talking about how we would tunnel the config space for N devices
across his transport.

As an aside, the vbus-kvm connector makes them one and the same, but
they do not have to be. Its all in the connector design.

>
>>>> c) how do you manage the configuration, especially on a per-board
>>>> basis?
>>>>
>>>>
>>> pci (for kvm/x86).
>>>
>> Ok, for kvm understood (and I would also add "qemu" to that mix). But
>> we are talking about vhost's application in a non-kvm environment here,
>> right?.
>>
>> So if the vhost-X devices are in the "guest",
>
> They aren't in the "guest". The best way to look at it is
>
> - a device side, with a dma engine: vhost-net
> - a driver side, only accessing its own memory: virtio-net
>
> Given that Ira's config has the dma engine in the ppc boards, that's
> where vhost-net would live (the ppc boards acting as NICs to the x86
> board, essentially).

That sounds convenient given his hardware, but it has its own set of
problems. For one, the configuration/inventory of these boards is now
driven by the wrong side and has to be addressed. Second, the role
reversal will likely not work for many models other than ethernet (e.g.
virtio-console or virtio-blk drivers running on the x86 board would be
naturally consuming services from the slave boards...virtio-net is an
exception because 802.x is generally symmetrical).

IIUC, vbus would support having the device models live properly on the
x86 side, solving both of these problems. It would be impossible to
reverse vhost given its current design.

>
>> and the x86 board is just
>> a slave...How do you tell each ppc board how many devices and what
>> config (e.g. MACs, etc) to instantiate? Do you assume that they should
>> all be symmetric and based on positional (e.g. slot) data? What if you
>> want asymmetric configurations (if not here, perhaps in a different
>> environment)?
>>
>
> I have no idea, that's for Ira to solve.

Bingo. Thus my statement that the vhost proposal is incomplete. You
have the virtio-net and vhost-net pieces covering the fast-path
end-points, but nothing in the middle (transport, aggregation,
config-space), and nothing on the management-side. vbus provides most
of the other pieces, and can even support the same virtio-net protocol
on top. The remaining part would be something like a udev script to
populate the vbus with devices on board-insert events.

> If he could fake the PCI
> config space as seen by the x86 board, he would just show the normal pci
> config and use virtio-pci (multiple channels would show up as a
> multifunction device). Given he can't, he needs to tunnel the virtio
> config space some other way.

Right, and note that vbus was designed to solve this. This tunneling
can, of course, be done without vbus using some other design. However,
whatever solution is created will look incredibly close to what I've
already done, so my point is "why reinvent it"?

>
>>> Yes. virtio is really virtualization oriented.
>>>
>> I would say that its vhost in particular that is virtualization
>> oriented. virtio, as a concept, generally should work in physical
>> systems, if perhaps with some minor modifications. The biggest "limit"
>> is having "virt" in its name ;)
>>
>
> Let me rephrase. The virtio developers are virtualization oriented. If
> it works for non-virt applications, that's good, but not a design goal.
>

Fair enough. Vbus was designed to support both HW and virt (as well as
other models, like containers), including tunneling virtio within those
environments. That is probably why IMO vbus is a better fit than vhost
here. (FWIW: I would love to see vhost use the vbus framework, then we
all win. You can do this and still retain virtio-pci compatiblity (at
least theoretically). I am still open to working with the team on this).

Kind Regards,
-Greg

From: Avi Kivity on 16 Sep 2009 09:10

On 09/16/2009 02:44 PM, Gregory Haskins wrote:
> The problem isn't where to find the models...the problem is how to
> aggregate multiple models to the guest.
>

You mean configuration?

>> You instantiate multiple vhost-nets. Multiple ethernet NICs is a
>> supported configuration for kvm.
>>
> But this is not KVM.
>
>

If kvm can do it, others can.

>>> His slave boards surface themselves as PCI devices to the x86
>>> host. So how do you use that to make multiple vhost-based devices (say
>>> two virtio-nets, and a virtio-console) communicate across the transport?
>>>
>>>
>> I don't really see the difference between 1 and N here.
>>
> A KVM surfaces N virtio-devices as N pci-devices to the guest. What do
> we do in Ira's case where the entire guest represents itself as a PCI
> device to the host, and nothing the other way around?
>

There is no guest and host in this scenario. There's a device side
(ppc) and a driver side (x86). The driver side can access configuration
information on the device side. How to multiplex multiple devices is an
interesting exercise for whoever writes the virtio binding for that setup.

>>> There are multiple ways to do this, but what I am saying is that
>>> whatever is conceived will start to look eerily like a vbus-connector,
>>> since this is one of its primary purposes ;)
>>>
>>>
>> I'm not sure if you're talking about the configuration interface or data
>> path here.
>>
> I am talking about how we would tunnel the config space for N devices
> across his transport.
>

Sounds trivial. Write an address containing the device number and
register number to on location, read or write data from another. Just
like the PCI cf8/cfc interface.

>> They aren't in the "guest". The best way to look at it is
>>
>> - a device side, with a dma engine: vhost-net
>> - a driver side, only accessing its own memory: virtio-net
>>
>> Given that Ira's config has the dma engine in the ppc boards, that's
>> where vhost-net would live (the ppc boards acting as NICs to the x86
>> board, essentially).
>>
> That sounds convenient given his hardware, but it has its own set of
> problems. For one, the configuration/inventory of these boards is now
> driven by the wrong side and has to be addressed.

Why is it the wrong side?

> Second, the role
> reversal will likely not work for many models other than ethernet (e.g.
> virtio-console or virtio-blk drivers running on the x86 board would be
> naturally consuming services from the slave boards...virtio-net is an
> exception because 802.x is generally symmetrical).
>

There is no role reversal. The side doing dma is the device, the side
accessing its own memory is the driver. Just like that other 1e12
driver/device pairs out there.

>> I have no idea, that's for Ira to solve.
>>
> Bingo. Thus my statement that the vhost proposal is incomplete. You
> have the virtio-net and vhost-net pieces covering the fast-path
> end-points, but nothing in the middle (transport, aggregation,
> config-space), and nothing on the management-side. vbus provides most
> of the other pieces, and can even support the same virtio-net protocol
> on top. The remaining part would be something like a udev script to
> populate the vbus with devices on board-insert events.
>

Of course vhost is incomplete, in the same sense that Linux is
incomplete. Both require userspace.

>> If he could fake the PCI
>> config space as seen by the x86 board, he would just show the normal pci
>> config and use virtio-pci (multiple channels would show up as a
>> multifunction device). Given he can't, he needs to tunnel the virtio
>> config space some other way.
>>
> Right, and note that vbus was designed to solve this. This tunneling
> can, of course, be done without vbus using some other design. However,
> whatever solution is created will look incredibly close to what I've
> already done, so my point is "why reinvent it"?
>

virtio requires binding for this tunnelling, so does vbus. Its the same
problem with the same solution.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Gregory Haskins on 16 Sep 2009 10:20

Avi Kivity wrote:
> On 09/16/2009 02:44 PM, Gregory Haskins wrote:
>> The problem isn't where to find the models...the problem is how to
>> aggregate multiple models to the guest.
>>
>
> You mean configuration?
>
>>> You instantiate multiple vhost-nets. Multiple ethernet NICs is a
>>> supported configuration for kvm.
>>>
>> But this is not KVM.
>>
>>
>
> If kvm can do it, others can.

The problem is that you seem to either hand-wave over details like this,
or you give details that are pretty much exactly what vbus does already.
My point is that I've already sat down and thought about these issues
and solved them in a freely available GPL'ed software package.

So the question is: is your position that vbus is all wrong and you wish
to create a new bus-like thing to solve the problem? If so, how is it
different from what Ive already done? More importantly, what specific
objections do you have to what Ive done, as perhaps they can be fixed
instead of starting over?

>
>>>> His slave boards surface themselves as PCI devices to the x86
>>>> host. So how do you use that to make multiple vhost-based devices (say
>>>> two virtio-nets, and a virtio-console) communicate across the
>>>> transport?
>>>>
>>>>
>>> I don't really see the difference between 1 and N here.
>>>
>> A KVM surfaces N virtio-devices as N pci-devices to the guest. What do
>> we do in Ira's case where the entire guest represents itself as a PCI
>> device to the host, and nothing the other way around?
>>
>
> There is no guest and host in this scenario. There's a device side
> (ppc) and a driver side (x86). The driver side can access configuration
> information on the device side. How to multiplex multiple devices is an
> interesting exercise for whoever writes the virtio binding for that setup.

Bingo. So now its a question of do you want to write this layer from
scratch, or re-use my framework.

>
>>>> There are multiple ways to do this, but what I am saying is that
>>>> whatever is conceived will start to look eerily like a vbus-connector,
>>>> since this is one of its primary purposes ;)
>>>>
>>>>
>>> I'm not sure if you're talking about the configuration interface or data
>>> path here.
>>>
>> I am talking about how we would tunnel the config space for N devices
>> across his transport.
>>
>
> Sounds trivial.

No one said it was rocket science. But it does need to be designed and
implemented end-to-end, much of which Ive already done in what I hope is
an extensible way.

> Write an address containing the device number and
> register number to on location, read or write data from another.

You mean like the "u64 devh", and "u32 func" fields I have here for the
vbus-kvm connector?

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=include/linux/vbus_pci.h;h=fe337590e644017392e4c9d9236150adb2333729;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025#l64

> Just
> like the PCI cf8/cfc interface.
>
>>> They aren't in the "guest". The best way to look at it is
>>>
>>> - a device side, with a dma engine: vhost-net
>>> - a driver side, only accessing its own memory: virtio-net
>>>
>>> Given that Ira's config has the dma engine in the ppc boards, that's
>>> where vhost-net would live (the ppc boards acting as NICs to the x86
>>> board, essentially).
>>>
>> That sounds convenient given his hardware, but it has its own set of
>> problems. For one, the configuration/inventory of these boards is now
>> driven by the wrong side and has to be addressed.
>
> Why is it the wrong side?

"Wrong" is probably too harsh a word when looking at ethernet. Its
certainly "odd", and possibly inconvenient. It would be like having
vhost in a KVM guest, and virtio-net running on the host. You could do
it, but its weird and awkward. Where it really falls apart and enters
the "wrong" category is for non-symmetric devices, like disk-io.

>
>> Second, the role
>> reversal will likely not work for many models other than ethernet (e.g.
>> virtio-console or virtio-blk drivers running on the x86 board would be
>> naturally consuming services from the slave boards...virtio-net is an
>> exception because 802.x is generally symmetrical).
>>
>
> There is no role reversal.

So if I have virtio-blk driver running on the x86 and vhost-blk device
running on the ppc board, I can use the ppc board as a block-device.
What if I really wanted to go the other way?

> The side doing dma is the device, the side
> accessing its own memory is the driver. Just like that other 1e12
> driver/device pairs out there.

IIUC, his ppc boards really can be seen as "guests" (they are linux
instances that are utilizing services from the x86, not the other way
around). vhost forces the model to have the ppc boards act as IO-hosts,
whereas vbus would likely work in either direction due to its more
refined abstraction layer.

>
>>> I have no idea, that's for Ira to solve.
>>>
>> Bingo. Thus my statement that the vhost proposal is incomplete. You
>> have the virtio-net and vhost-net pieces covering the fast-path
>> end-points, but nothing in the middle (transport, aggregation,
>> config-space), and nothing on the management-side. vbus provides most
>> of the other pieces, and can even support the same virtio-net protocol
>> on top. The remaining part would be something like a udev script to
>> populate the vbus with devices on board-insert events.
>>
>
> Of course vhost is incomplete, in the same sense that Linux is
> incomplete. Both require userspace.

A vhost based solution to Iras design is missing more than userspace.
Many of those gaps are addressed by a vbus based solution.

>
>>> If he could fake the PCI
>>> config space as seen by the x86 board, he would just show the normal pci
>>> config and use virtio-pci (multiple channels would show up as a
>>> multifunction device). Given he can't, he needs to tunnel the virtio
>>> config space some other way.
>>>
>> Right, and note that vbus was designed to solve this. This tunneling
>> can, of course, be done without vbus using some other design. However,
>> whatever solution is created will look incredibly close to what I've
>> already done, so my point is "why reinvent it"?
>>
>
> virtio requires binding for this tunnelling, so does vbus.

We aren't talking about virtio. Virtio would work with either vbus or
vhost. This is purely a question of what the layers below virtio and
the device backend looks like.

> Its the same problem with the same solution.

I disagree.

Kind Regards,
-Greg

From: Arnd Bergmann on 16 Sep 2009 11:00

On Tuesday 15 September 2009, Michael S. Tsirkin wrote:
> Userspace in x86 maps a PCI region, uses it for communication with ppc?

This might have portability issues. On x86 it should work, but if the
host is powerpc or similar, you cannot reliably access PCI I/O memory
through copy_tofrom_user but have to use memcpy_toio/fromio or readl/writel
calls, which don't work on user pointers.

Specifically on powerpc, copy_from_user cannot access unaligned buffers
if they are on an I/O mapping.

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: bfa: Brocade BFA FC SCSI driver (bfad)
Next: Atheros Linux wireless drivers home page - and two new driver projects