From: oerg Roedel on
On Mon, Mar 22, 2010 at 01:23:26PM +0000, Richard W.M. Jones wrote:
> On Mon, Mar 22, 2010 at 01:05:13PM +0000, Daniel P. Berrange wrote:
> > This is close to the way libguestfs already works. It boots QEMU/KVM pointing
> > to a minimal stripped down appliance linux OS image, containing a small agent
> > it talks to over some form of vmchannel/serial/virtio-serial device. Thus the
> > kernel in the appliance it runs is the only thing that needs to know about the
> > filesystem/lvm/dm on-disk formats - libguestfs definitely does not want to be
> > duplicating this detailed knowledge of on disk format itself. It is doing
> > full read-write access to the guest filesystem in offline mode - one of the
> > major use cases is disaster recovery from a unbootable guest OS image.
>
> As Dan said, the 'daemon' part is separate and could be run as a
> standard part of a guest install, talking over vmchannel to the host.
> The only real issue I can see is adding access control to the daemon
> (currently it doesn't need it and doesn't do any). Doing it this way
> you'd be leveraging the ~250,000 lines of existing libguestfs code,
> bindings in multiple languages, tools etc.

I think we don't need per-guest-file access control. Probably we could
apply the image-file permissions to all guestfs files. This would cover
the usecases:

* perf for reading symbol information (needs ro-access only
anyway)
* Desktop like host<->guest file copy

I have not looked into libguestfs yet but I guess this approach is
easier to achieve.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Mon, Mar 22, 2010 at 05:32:15PM +0100, Ingo Molnar wrote:
> I dont know how you can find the situation of Alpha comparable, which is a
> legacy architecture for which no new CPU was manufactored in the past ~10
> years.
>
> The negative effects of physical obscolescence cannot be overcome even by the
> very best of development models ...

The maintainers of that architecture could at least continue to maintain
it. But that is not the case. Most newer syscalls are not available and
overall stability on alpha sucks (kernel crashed when I tried to start
Xorg for example) but nobody cares about it. Hardware is still around
and there are still some users of it.

> > > * Joerg Roedel <joro(a)8bytes.org> wrote:
> > No, the split-repository situation was the smallest problem after all. Its
> > was a community thing. If the community doesn't work a single-repo project
> > will also fail. [...]
>
> So, what do you think creates code communities and keeps them alive?
> Developers and code. And the wellbeing of developers are primarily influenced
> by the repository structure and by the development/maintenance process - i.e.
> by the 'fun' aspect. (i'm simplifying things there but that's the crux of it.)

Right. A living community needs developers that write new code. And the
repository structure is one important thing. But in my opinion it is not
the most important one. With my 3-4 years experience in the kernel
community I made the experience that the maintainers are the most
important factor. I find a maintainer not commiting or caring about
patches or not releasing new versions much worse than the wrong
repository structure.
oProfile has this problem with its userspace part. I partly made this
bad experience with x86-64 before the architecture merge. KVM does not
have this problem.

> So yes, i do claim that what stiffled and eventually killed off the Oprofile
> community was the split repository. None of the other Oprofile shortcomings
> were really unfixable, but this one was. It gave no way for the community to
> grow in a healthy way, after the initial phase. Features were more difficult
> and less fun to develop.

The biggest problem oProfile has is that it does not support per-process
measuring. This is indeed not unfixable but it also doesn't fit well in
the overall oProfile concept.

> I simply do not want to see KVM face the same fate, and yes i do see similar
> warnings signs.

In fact, the development process in KVM has improved over time. In the
early beginnings everything was kept in svn. Avi switched to git some
day but at the time when we had these kvm-XX releases both kernel- and
user-space together were unbisectable. This has improved to a point
where the kernel-part could be bisected. The KVM maintainers and
community have shown in the past that they can address problems with the
development process if they come up.

> Oprofile certainly had good developers and maintainers as well. In the end it
> wasnt enough ...
>
> Also, a project can easily still be 'alive' but not reach its full potential.
>
> Why do you assume that my argument means that KVM isnt viable today? It can
> very well still be viable and even healthy - just not _as healthy_ as it could
> be ...

I am not aware that I made you say anything ;-)

>
> > > The difference is that we dont have KVM with a decade of history and we
> > > dont have a 'told you so' KVM reimplementation to show that proves the
> > > point. I guess it's a matter of time before that happens, because Qemu
> > > usability is so absymal today - so i guess we should suspend any
> > > discussions until that happens, no need to waste time on arguing
> > > hypoteticals.
> >
> > We actually have lguest which is small. But it lacks functionality and the
> > developer community KVM has attracted.
>
> I suggested long ago to merge lguest into KVM to cover non-VMX/non-SVM
> execution.

That would have been the best. Rusty already started this work and
presented it at the first KVM Forum. But I have never seen patches ...

> > > I think you are rationalizing the status quo.
> >
> > I see that there are issues with KVM today in some areas. You pointed out
> > the desktop usability already. I personally have trouble with the
> > qem-kvm.git because it is unbisectable. But repository unification doesn't
> > solve the problem here.
>
> Why doesnt it solve the bisectability problem? The kernel repo is supposed to
> be bisectable so that problem would be solved.

Because Marcelo and Avi try to keep as close to upstream qemu as
possible. So the qemu repo is regularly merged in qemu-kvm and if you
want to bisect you may end up somewhere in the middle of the qemu
repository which has only very minimal kvm-support.
The problem here is that two qemu repositorys exist. But the current
effort of Anthony is directed to create a single qemu repository. But
thats not done overnight.
Merging qemu into the kernel would make Linus in fact a qemu maintainer.
I am not sure he wants to be that ;-)

> In my judgement you'd have to do that more frequently, if KVM was properly
> weighting its priorities. For example regarding this recent KVM commit of
> yours:
>
> | commit ec1ff79084fccdae0dca9b04b89dcdf3235bbfa1
> | Author: Joerg Roedel <joerg.roedel(a)amd.com>
> | Date: Fri Oct 9 16:08:31 2009 +0200
> |
> | KVM: SVM: Add tracepoint for invlpga instruction
> |
> | This patch adds a tracepoint for the event that the guest
> | executed the INVLPGA instruction.
>
> With integrated KVM tooling i might have insisted for that new tracepoint to
> be available to users as well via some more meaningful tooling than just a
> pure tracepoint.
>
> There's synergies like that all around the place.

True. Tools for better analyzing kvm traces is for sure something that
belongs to tools/kvm. I am not sure if anyone has such tools. If yes,
they should send it upstream.

> > > It's as if you argued in 1990 that the unification of East and West
> > > Germany wouldnt make much sense because despite clear problems and
> > > incompatibilites and different styles westerners were still allowed to
> > > visit eastern relatives and they both spoke the same language after all
> > > ;-)
> >
> > Um, hmm. I don't think these situations have enough in common to compare
> > them ;-)
>
> Probably, but it's an interesting parallel nevertheless ;-)

That for sure ;-)

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Mon, Mar 22, 2010 at 05:06:17PM -0500, Anthony Liguori wrote:
> There always needs to be a system wide entity. There are two ways to
> enumerate instances from that system wide entity. You can centralize
> the creation of instances and there by maintain an list of current
> instances. You can also allow instances to be created in a
> decentralized manner and provide a standard mechanism for instances to
> register themselves with the system wide entity.

And this system wide entity is the kvm module. It creates instances of
'struct kvm' and destroys them. I see no problem if we just attach a
name to every instance with a good default value like kvm0, kvm1 ... or
guest0, guest1 ... User-space can override the name if it wants. The kvm
module takes care about the names being unique.
This is very much the same as network card numbering is implemented in
the kernel.
Forcing perf to talk to qemu or even libvirt produces to much overhead
imho. Instrumentation only produces useful results with low overhead.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote:
> On 03/23/2010 04:06 PM, Joerg Roedel wrote:

>> And this system wide entity is the kvm module. It creates instances of
>> 'struct kvm' and destroys them. I see no problem if we just attach a
>> name to every instance with a good default value like kvm0, kvm1 ... or
>> guest0, guest1 ... User-space can override the name if it wants. The kvm
>> module takes care about the names being unique.
>>
>
> So, two users can't have a guest named MyGuest each? What about
> namespace support? There's a lot of work in virtualizing all kernel
> namespaces, you're adding to that.

This enumeration is a very small and non-intrusive feature. Making it
aware of namespaces is easy too.

> What about notifications when guests are added or removed?

Who would be the consumer of such notifications? A 'perf kvm list' can
live without I guess. If we need them later we can still add them.

>> This is very much the same as network card numbering is implemented in
>> the kernel.
>> Forcing perf to talk to qemu or even libvirt produces to much overhead
>> imho. Instrumentation only produces useful results with low overhead.
>>
>
> It's a setup cost only.

My statement was not limited to enumeration, I should have been more
clear about that. The guest filesystem access-channel is another
affected part. The 'perf kvm top' command will access the guest
filesystem regularly and going over qemu would be more overhead here.
Providing this in the KVM module directly also has the benefit that it
would work out-of-the-box with different userspaces too. Or do we want
to limit 'perf kvm' to the libvirt-qemu-kvm software stack?

Sidenote: I really think we should come to a conclusion about the
concept. KVM integration into perf is very useful feature to
analyze virtualization workloads.

Thanks,

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Wed, Mar 24, 2010 at 06:57:47AM +0200, Avi Kivity wrote:
> On 03/23/2010 08:21 PM, Joerg Roedel wrote:
>> This enumeration is a very small and non-intrusive feature. Making it
>> aware of namespaces is easy too.
>>
>
> It's easier (and safer and all the other boring bits) not to do it at
> all in the kernel.

For the KVM stack is doesn't matter where it is implemented. It is as
easy in qemu or libvirt as in the kernel. I also don't see big risks. On
the perf side and for its users it is a lot easier to have this in the
kernel.
I for example always use plain qemu when running kvm guests and never
used libvirt. The only central entity I have here is the kvm kernel
modules. I don't want to start using it only to be able to use perf kvm.

>> Who would be the consumer of such notifications? A 'perf kvm list' can
>> live without I guess. If we need them later we can still add them.
>
> System-wide monitoring needs to work equally well for guests started
> before or after the monitor.

Could be easily done using notifier chains already in the kernel.
Probably implemented with much less than 100 lines of additional code.

> Even disregarding that, if you introduce an API, people will start
> using it and complaining if it's incomplete.

There is nothing wrong with that. We only need to define what this API
should be used for to prevent rank growth. It could be an
instrumentation-only API for example.

>> My statement was not limited to enumeration, I should have been more
>> clear about that. The guest filesystem access-channel is another
>> affected part. The 'perf kvm top' command will access the guest
>> filesystem regularly and going over qemu would be more overhead here.
>>
>
> Why? Also, the real cost would be accessing the filesystem, not copying
> data over qemu.

When measuring cache-misses any additional (and in this case
unnecessary) copy-overhead result in less appropriate results.

>> Providing this in the KVM module directly also has the benefit that it
>> would work out-of-the-box with different userspaces too. Or do we want
>> to limit 'perf kvm' to the libvirt-qemu-kvm software stack?
>
> Other userspaces can also provide this functionality, like they have to
> provide disk, network, and display emulation. The kernel is not a huge
> library.

This has nothing to do with a library. It is about entity and resource
management which is what os kernels are about. The virtual machine is
the entity (similar to a process) and we want to add additional access
channels and names to it.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/