Prev: bfa: Brocade BFA FC SCSI driver (bfad)
Next: Atheros Linux wireless drivers home page - and two new driver projects
From: Avi Kivity on 16 Sep 2009 17:10 On 09/16/2009 10:22 PM, Gregory Haskins wrote: > Avi Kivity wrote: > >> On 09/16/2009 05:10 PM, Gregory Haskins wrote: >> >>>> If kvm can do it, others can. >>>> >>>> >>> The problem is that you seem to either hand-wave over details like this, >>> or you give details that are pretty much exactly what vbus does already. >>> My point is that I've already sat down and thought about these issues >>> and solved them in a freely available GPL'ed software package. >>> >>> >> In the kernel. IMO that's the wrong place for it. >> > 3) "in-kernel": You can do something like virtio-net to vhost to > potentially meet some of the requirements, but not all. > > In order to fully meet (3), you would need to do some of that stuff you > mentioned in the last reply with muxing device-nr/reg-nr. In addition, > we need to have a facility for mapping eventfds and establishing a > signaling mechanism (like PIO+qid), etc. KVM does this with > IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be > invented. > irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted. > To meet performance, this stuff has to be in kernel and there has to be > a way to manage it. and management belongs in userspace. > Since vbus was designed to do exactly that, this is > what I would advocate. You could also reinvent these concepts and put > your own mux and mapping code in place, in addition to all the other > stuff that vbus does. But I am not clear why anyone would want to. > Maybe they like their backward compatibility and Windows support. > So no, the kernel is not the wrong place for it. Its the _only_ place > for it. Otherwise, just use (1) and be done with it. > > I'm talking about the config stuff, not the data path. >> Further, if we adopt >> vbus, if drop compatibility with existing guests or have to support both >> vbus and virtio-pci. >> > We already need to support both (at least to support Ira). virtio-pci > doesn't work here. Something else (vbus, or vbus-like) is needed. > virtio-ira. >>> So the question is: is your position that vbus is all wrong and you wish >>> to create a new bus-like thing to solve the problem? >>> >> I don't intend to create anything new, I am satisfied with virtio. If >> it works for Ira, excellent. If not, too bad. >> > I think that about sums it up, then. > Yes. I'm all for reusing virtio, but I'm not going switch to vbus or support both for this esoteric use case. >>> If so, how is it >>> different from what Ive already done? More importantly, what specific >>> objections do you have to what Ive done, as perhaps they can be fixed >>> instead of starting over? >>> >>> >> The two biggest objections are: >> - the host side is in the kernel >> > As it needs to be. > vhost-net somehow manages to work without the config stuff in the kernel. > With all due respect, based on all of your comments in aggregate I > really do not think you are truly grasping what I am actually building here. > Thanks. >>> Bingo. So now its a question of do you want to write this layer from >>> scratch, or re-use my framework. >>> >>> >> You will have to implement a connector or whatever for vbus as well. >> vbus has more layers so it's probably smaller for vbus. >> > Bingo! (addictive, isn't it) > That is precisely the point. > > All the stuff for how to map eventfds, handle signal mitigation, demux > device/function pointers, isolation, etc, are built in. All the > connector has to do is transport the 4-6 verbs and provide a memory > mapping/copy function, and the rest is reusable. The device models > would then work in all environments unmodified, and likewise the > connectors could use all device-models unmodified. > Well, virtio has a similar abstraction on the guest side. The host side abstraction is limited to signalling since all configuration is in userspace. vhost-net ought to work for lguest and s390 without change. >> It was already implemented three times for virtio, so apparently that's >> extensible too. >> > And to my point, I'm trying to commoditize as much of that process as > possible on both the front and backends (at least for cases where > performance matters) so that you don't need to reinvent the wheel for > each one. > Since you're interested in any-to-any connectors it makes sense to you. I'm only interested in kvm-host-to-kvm-guest, so reducing the already minor effort to implement a new virtio binding has little appeal to me. >> You mean, if the x86 board was able to access the disks and dma into the >> ppb boards memory? You'd run vhost-blk on x86 and virtio-net on ppc. >> > But as we discussed, vhost doesn't work well if you try to run it on the > x86 side due to its assumptions about pagable "guest" memory, right? So > is that even an option? And even still, you would still need to solve > the aggregation problem so that multiple devices can coexist. > I don't know. Maybe it can be made to work and maybe it cannot. It probably can with some determined hacking. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Gregory Haskins on 16 Sep 2009 23:20 Avi Kivity wrote: > On 09/16/2009 10:22 PM, Gregory Haskins wrote: >> Avi Kivity wrote: >> >>> On 09/16/2009 05:10 PM, Gregory Haskins wrote: >>> >>>>> If kvm can do it, others can. >>>>> >>>>> >>>> The problem is that you seem to either hand-wave over details like >>>> this, >>>> or you give details that are pretty much exactly what vbus does >>>> already. >>>> My point is that I've already sat down and thought about these >>>> issues >>>> and solved them in a freely available GPL'ed software package. >>>> >>>> >>> In the kernel. IMO that's the wrong place for it. >>> >> 3) "in-kernel": You can do something like virtio-net to vhost to >> potentially meet some of the requirements, but not all. >> >> In order to fully meet (3), you would need to do some of that stuff you >> mentioned in the last reply with muxing device-nr/reg-nr. In addition, >> we need to have a facility for mapping eventfds and establishing a >> signaling mechanism (like PIO+qid), etc. KVM does this with >> IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be >> invented. >> > > irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted. Not per se, but it needs to be interfaced. How do I register that eventfd with the fastpath in Ira's rig? How do I signal the eventfd (x86->ppc, and ppc->x86)? To take it to the next level, how do I organize that mechanism so that it works for more than one IO-stream (e.g. address the various queues within ethernet or a different device like the console)? KVM has IOEVENTFD and IRQFD managed with MSI and PIO. This new rig does not have the luxury of an established IO paradigm. Is vbus the only way to implement a solution? No. But it is _a_ way, and its one that was specifically designed to solve this very problem (as well as others). (As an aside, note that you generally will want an abstraction on top of irqfd/eventfd like shm-signal or virtqueues to do shared-memory based event mitigation, but I digress. That is a separate topic). > >> To meet performance, this stuff has to be in kernel and there has to be >> a way to manage it. > > and management belongs in userspace. vbus does not dictate where the management must be. Its an extensible framework, governed by what you plug into it (ala connectors and devices). For instance, the vbus-kvm connector in alacrityvm chooses to put DEVADD and DEVDROP hotswap events into the interrupt stream, because they are simple and we already needed the interrupt stream anyway for fast-path. As another example: venet chose to put ->call(MACQUERY) "config-space" into its call namespace because its simple, and we already need ->calls() for fastpath. It therefore exports an attribute to sysfs that allows the management app to set it. I could likewise have designed the connector or device-model differently as to keep the mac-address and hotswap-events somewhere else (QEMU/PCI userspace) but this seems silly to me when they are so trivial, so I didn't. > >> Since vbus was designed to do exactly that, this is >> what I would advocate. You could also reinvent these concepts and put >> your own mux and mapping code in place, in addition to all the other >> stuff that vbus does. But I am not clear why anyone would want to. >> > > Maybe they like their backward compatibility and Windows support. This is really not relevant to this thread, since we are talking about Ira's hardware. But if you must bring this up, then I will reiterate that you just design the connector to interface with QEMU+PCI and you have that too if that was important to you. But on that topic: Since you could consider KVM a "motherboard manufacturer" of sorts (it just happens to be virtual hardware), I don't know why KVM seems to consider itself the only motherboard manufacturer in the world that has to make everything look legacy. If a company like ASUS wants to add some cutting edge IO controller/bus, they simply do it. Pretty much every product release may contain a different array of devices, many of which are not backwards compatible with any prior silicon. The guy/gal installing Windows on that system may see a "?" in device-manager until they load a driver that supports the new chip, and subsequently it works. It is certainly not a requirement to make said chip somehow work with existing drivers/facilities on bare metal, per se. Why should virtual systems be different? So, yeah, the current design of the vbus-kvm connector means I have to provide a driver. This is understood, and I have no problem with that. The only thing that I would agree has to be backwards compatible is the BIOS/boot function. If you can't support running an image like the Windows installer, you are hosed. If you can't use your ethernet until you get a chance to install a driver after the install completes, its just like most other systems in existence. IOW: It's not a big deal. For cases where the IO system is needed as part of the boot/install, you provide BIOS and/or an install-disk support for it. > >> So no, the kernel is not the wrong place for it. Its the _only_ place >> for it. Otherwise, just use (1) and be done with it. >> >> > > I'm talking about the config stuff, not the data path. As stated above, where config stuff lives is a function of what you interface to vbus. Data-path stuff must be in the kernel for performance reasons, and this is what I was referring to. I think we are generally both in agreement, here. What I was getting at is that you can't just hand-wave the datapath stuff. We do fast path in KVM with IRQFD/IOEVENTFD+PIO, and we do device discovery/addressing with PCI. Neither of those are available here in Ira's case yet the general concepts are needed. Therefore, we have to come up with something else. > >>> Further, if we adopt >>> vbus, if drop compatibility with existing guests or have to support both >>> vbus and virtio-pci. >>> >> We already need to support both (at least to support Ira). virtio-pci >> doesn't work here. Something else (vbus, or vbus-like) is needed. >> > > virtio-ira. Sure, virtio-ira and he is on his own to make a bus-model under that, or virtio-vbus + vbus-ira-connector to use the vbus framework. Either model can work, I agree. > >>>> So the question is: is your position that vbus is all wrong and you >>>> wish >>>> to create a new bus-like thing to solve the problem? >>>> >>> I don't intend to create anything new, I am satisfied with virtio. If >>> it works for Ira, excellent. If not, too bad. >>> >> I think that about sums it up, then. >> > > Yes. I'm all for reusing virtio, but I'm not going switch to vbus or > support both for this esoteric use case. With all due respect, no one asked you to. This sub-thread was originally about using vhost in Ira's rig. When problems surfaced in that proposed model, I highlighted that I had already addressed that problem in vbus, and here we are. > >>>> If so, how is it >>>> different from what Ive already done? More importantly, what specific >>>> objections do you have to what Ive done, as perhaps they can be fixed >>>> instead of starting over? >>>> >>>> >>> The two biggest objections are: >>> - the host side is in the kernel >>> >> As it needs to be. >> > > vhost-net somehow manages to work without the config stuff in the kernel. I was referring to data-path stuff, like signal and memory configuration/routing. As an aside, it should be noted that vhost under KVM has IRQFD/IOEVENTFD, PCI-emulation, QEMU, etc to complement it and fill in some of the pieces one needs for a complete solution. Not all environments have all of those pieces (nor should they), and those pieces need to come from somewhere. It should also be noted that what remains (config/management) after the data-path stuff is laid out is actually quite simple. It consists of pretty much an enumerated list of device-ids within a container, DEVADD(id), DEVDROP(id) events, and some sysfs attributes as defined on a per-device basis (many of which are often needed regardless of whether the "config-space" operation is handled in-kernel or not) Therefore, the configuration aspect of the system does not necessitate a complicated (e.g. full PCI emulation) or external (e.g. userspace) component per se. The parts of vbus that could be construed as "management" are (afaict) built using accepted/best-practices for managing arbitrary kernel subsystems (sysfs, configfs, ioctls, etc) so there is nothing new or reasonably controversial there. It is for this reason that I think the objection to "in-kernel config" is unfounded. Disagreements on this point may be settled by the connector design, while still utilizing vbus, and thus retaining most of the other benefits of using the vbus framework. The connector ultimately dictates how and what is exposed to the "guest". > >> With all due respect, based on all of your comments in aggregate I >> really do not think you are truly grasping what I am actually building >> here. >> > > Thanks. > > > >>>> Bingo. So now its a question of do you want to write this layer from >>>> scratch, or re-use my framework. >>>> >>>> >>> You will have to implement a connector or whatever for vbus as well. >>> vbus has more layers so it's probably smaller for vbus. >>> >> Bingo! > > (addictive, isn't it) Apparently. > >> That is precisely the point. >> >> All the stuff for how to map eventfds, handle signal mitigation, demux >> device/function pointers, isolation, etc, are built in. All the >> connector has to do is transport the 4-6 verbs and provide a memory >> mapping/copy function, and the rest is reusable. The device models >> would then work in all environments unmodified, and likewise the >> connectors could use all device-models unmodified. >> > > Well, virtio has a similar abstraction on the guest side. The host side > abstraction is limited to signalling since all configuration is in > userspace. vhost-net ought to work for lguest and s390 without change. But IIUC that is primarily because the revectoring work is already in QEMU for virtio-u and it rides on that, right? Not knocking that, thats nice and a distinct advantage. It should just be noted that its based on sunk-cost, and not truly free. Its just already paid for, which is different. It also means it only works in environments based on QEMU, which not all are (as evident by this sub-thread). > >>> It was already implemented three times for virtio, so apparently that's >>> extensible too. >>> >> And to my point, I'm trying to commoditize as much of that process as >> possible on both the front and backends (at least for cases where >> performance matters) so that you don't need to reinvent the wheel for >> each one. >> > > Since you're interested in any-to-any connectors it makes sense to you. > I'm only interested in kvm-host-to-kvm-guest, so reducing the already > minor effort to implement a new virtio binding has little appeal to me. > Fair enough. >>> You mean, if the x86 board was able to access the disks and dma into the >>> ppb boards memory? You'd run vhost-blk on x86 and virtio-net on ppc. >>> >> But as we discussed, vhost doesn't work well if you try to run it on the >> x86 side due to its assumptions about pagable "guest" memory, right? So >> is that even an option? And even still, you would still need to solve >> the aggregation problem so that multiple devices can coexist. >> > > I don't know. Maybe it can be made to work and maybe it cannot. It > probably can with some determined hacking. > I guess you can say the same for any of the solutions. Kind Regards, -Greg
From: Michael S. Tsirkin on 17 Sep 2009 00:00 On Wed, Sep 16, 2009 at 10:10:55AM -0400, Gregory Haskins wrote: > > There is no role reversal. > > So if I have virtio-blk driver running on the x86 and vhost-blk device > running on the ppc board, I can use the ppc board as a block-device. > What if I really wanted to go the other way? It seems ppc is the only one that can initiate DMA to an arbitrary address, so you can't do this really, or you can by tunneling each request back to ppc, or doing an extra data copy, but it's unlikely to work well. The limitation comes from hardware, not from the API we use. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Gregory Haskins on 17 Sep 2009 00:20 Michael S. Tsirkin wrote: > On Wed, Sep 16, 2009 at 10:10:55AM -0400, Gregory Haskins wrote: >>> There is no role reversal. >> So if I have virtio-blk driver running on the x86 and vhost-blk device >> running on the ppc board, I can use the ppc board as a block-device. >> What if I really wanted to go the other way? > > It seems ppc is the only one that can initiate DMA to an arbitrary > address, so you can't do this really, or you can by tunneling each > request back to ppc, or doing an extra data copy, but it's unlikely to > work well. > > The limitation comes from hardware, not from the API we use. Understood, but presumably it can be exposed as a sub-function of the ppc's board's register file as a DMA-controller service to the x86. This would fall into the "tunnel requests back" category you mention above, though I think "tunnel" implies a heavier protocol than it would actually require. This would look more like a PIO cycle to a DMA controller than some higher layer protocol. You would then utilize that DMA service inside the memctx, and it the rest of vbus would work transparently with the existing devices/drivers. I do agree it would require some benchmarking to determine its feasibility, which is why I was careful to say things like "may work" ;). I also do not even know if its possible to expose the service this way on his system. If this design is not possible or performs poorly, I admit vbus is just as hosed as vhost in regard to the "role correction" benefit. Kind Regards, -Greg
From: Avi Kivity on 17 Sep 2009 04:00
On 09/17/2009 06:11 AM, Gregory Haskins wrote: > >> irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted. >> > Not per se, but it needs to be interfaced. How do I register that > eventfd with the fastpath in Ira's rig? How do I signal the eventfd > (x86->ppc, and ppc->x86)? > You write a userspace or kernel module to do it. It's a few dozen lines of code. > To take it to the next level, how do I organize that mechanism so that > it works for more than one IO-stream (e.g. address the various queues > within ethernet or a different device like the console)? KVM has > IOEVENTFD and IRQFD managed with MSI and PIO. This new rig does not > have the luxury of an established IO paradigm. > > Is vbus the only way to implement a solution? No. But it is _a_ way, > and its one that was specifically designed to solve this very problem > (as well as others). > virtio assumes that the number of transports will be limited and interesting growth is in the number of device classes and drivers. So we have support for just three transports, but 6 device classes (9p, rng, balloon, console, blk, net) and 8 drivers (the preceding 6 for linux, plus blk/net for Windows). It would have nice to be able to write a new binding in Visual Basic but it's hardly a killer feature. >>> Since vbus was designed to do exactly that, this is >>> what I would advocate. You could also reinvent these concepts and put >>> your own mux and mapping code in place, in addition to all the other >>> stuff that vbus does. But I am not clear why anyone would want to. >>> >>> >> Maybe they like their backward compatibility and Windows support. >> > This is really not relevant to this thread, since we are talking about > Ira's hardware. But if you must bring this up, then I will reiterate > that you just design the connector to interface with QEMU+PCI and you > have that too if that was important to you. > Well, for Ira the major issue is probably inclusion in the upstream kernel. > But on that topic: Since you could consider KVM a "motherboard > manufacturer" of sorts (it just happens to be virtual hardware), I don't > know why KVM seems to consider itself the only motherboard manufacturer > in the world that has to make everything look legacy. If a company like > ASUS wants to add some cutting edge IO controller/bus, they simply do > it. No, they don't. New buses are added through industry consortiums these days. No one adds a bus that is only available with their machine, not even Apple. > Pretty much every product release may contain a different array of > devices, many of which are not backwards compatible with any prior > silicon. The guy/gal installing Windows on that system may see a "?" in > device-manager until they load a driver that supports the new chip, and > subsequently it works. It is certainly not a requirement to make said > chip somehow work with existing drivers/facilities on bare metal, per > se. Why should virtual systems be different? > Devices/drivers are a different matter, and if you have a virtio-net device you'll get the same "?" until you load the driver. That's how people and the OS vendors expect things to work. > What I was getting at is that you can't just hand-wave the datapath > stuff. We do fast path in KVM with IRQFD/IOEVENTFD+PIO, and we do > device discovery/addressing with PCI. That's not datapath stuff. > Neither of those are available > here in Ira's case yet the general concepts are needed. Therefore, we > have to come up with something else. > Ira has to implement virtio's ->kick() function and come up with something for discovery. It's a lot less lines of code than there are messages in this thread. >> Yes. I'm all for reusing virtio, but I'm not going switch to vbus or >> support both for this esoteric use case. >> > With all due respect, no one asked you to. This sub-thread was > originally about using vhost in Ira's rig. When problems surfaced in > that proposed model, I highlighted that I had already addressed that > problem in vbus, and here we are. > Ah, okay. I have no interest in Ira choosing either virtio or vbus. >> vhost-net somehow manages to work without the config stuff in the kernel. >> > I was referring to data-path stuff, like signal and memory > configuration/routing. > signal and memory configuration/routing are not data-path stuff. >> Well, virtio has a similar abstraction on the guest side. The host side >> abstraction is limited to signalling since all configuration is in >> userspace. vhost-net ought to work for lguest and s390 without change. >> > But IIUC that is primarily because the revectoring work is already in > QEMU for virtio-u and it rides on that, right? Not knocking that, thats > nice and a distinct advantage. It should just be noted that its based > on sunk-cost, and not truly free. Its just already paid for, which is > different. It also means it only works in environments based on QEMU, > which not all are (as evident by this sub-thread). > No. We expose a mix of emulated-in-userspace and emulated-in-the-kernel devices on one bus. Devices emulated in userspace only lose by having the bus emulated in the kernel. Devices in the kernel gain nothing from having the bus emulated in the kernel. It's a complete slow path so it belongs in userspace where state is easy to get at, development is faster, and bugs are cheaper to fix. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |