Enhance perf to collect KVM guest os statistics from host side [Kernel]

Prev: + tmpfs-fix-oops-on-remounts-with-mpol=default.patch added to -mm tree
Next: [PATCH 5/5] doc: add the documentation for mpol=local

From: Masami Hiramatsu on 16 Mar 2010 19:10

oerg Roedel wrote:
> On Tue, Mar 16, 2010 at 12:25:00PM +0100, Ingo Molnar wrote:
>> Hm, that sounds rather messy if we want to use it to basically expose kernel
>> functionality in a guest/host unified way. Is the qemu process discoverable in
>> some secure way? Can we trust it? Is there some proper tooling available to do
>> it, or do we have to push it through 2-3 packages to get such a useful feature
>> done?
>
> Since we want to implement a pmu usable for the guest anyway why we
> don't just use a guests perf to get all information we want? If we get a
> pmu-nmi from the guest we just re-inject it to the guest and perf in the
> guest gives us all information we wand including kernel and userspace
> symbols, stack traces, and so on.

I guess this aims to get information from old environments running on
kvm for life extension :)

> In the previous thread we discussed about a direct trace channel between
> guest and host kernel (which can be used for ftrace events for example).
> This channel could be used to transport this information to the host
> kernel.

Interesting! I know the people who are trying to do that with systemtap.
See, http://vesper.sourceforge.net/

>
> The only additional feature needed is a way for the host to start a perf
> instance in the guest.

# ssh localguest perf record --host-chanel ... ? B-)

Thank you,

>
> Opinions?
>
>
> Joerg
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Masami Hiramatsu
e-mail: mhiramat(a)redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Anthony Liguori on 16 Mar 2010 19:10

On 03/16/2010 01:28 PM, Ingo Molnar wrote:
> * Anthony Liguori<aliguori(a)linux.vnet.ibm.com> wrote:
>
>
>> On 03/16/2010 12:52 PM, Ingo Molnar wrote:
>>
>>> * Anthony Liguori<aliguori(a)linux.vnet.ibm.com> wrote:
>>>
>>>
>>>> On 03/16/2010 10:52 AM, Ingo Molnar wrote:
>>>>
>>>>> You are quite mistaken: KVM isnt really a 'random unprivileged application' in
>>>>> this context, it is clearly an extension of system/kernel services.
>>>>>
>>>>> ( Which can be seen from the simple fact that what started the discussion was
>>>>> 'how do we get /proc/kallsyms from the guest'. I.e. an extension of the
>>>>> existing host-space /proc/kallsyms was desired. )
>>>>>
>>>> Random tools (like perf) should not be able to do what you describe. It's a
>>>> security nightmare.
>>>>
>>> A security nightmare exactly how? Mind to go into details as i dont understand
>>> your point.
>>>
>> Assume you're using SELinux to implement mandatory access control.
>> How do you label this file system?
>>
>> Generally speaking, we don't know the difference between /proc/kallsyms vs.
>> /dev/mem if we do generic passthrough. While it might be safe to have a
>> relaxed label of kallsyms (since it's read only), it's clearly not safe to
>> do that for /dev/mem, /etc/shadow, or any file containing sensitive
>> information.
>>
> What's your _point_? Please outline a threat model, a vector of attack,
> _anything_ that substantiates your "it's a security nightmare" claim.
>

You suggested "to have a (read only) mount of all guest filesystems".

As I described earlier, not all of the information within the guest
filesystem has the same level of sensitivity. If you exposed a generic
interface like this, it makes it very difficult to delegate privileges.

Delegating privileges is important because from in a higher security
environment, you may want to prevent a management tool from accessing
the VM's disk directly, but still allow it to do basic operations (in
particular, to view performance statistics).

>> Rather, we ought to expose a higher level interface that we have more
>> confidence in with respect to understanding the ramifications of exposing
>> that guest data.
>>
> Exactly, we want something that has a flexible namespace and works well with
> Linux tools in general. Preferably that namespace should be human readable,
> and it should be hierarchic, and it should have a well-known permission model.
>
> This concept exists in Linux and is generally called a 'filesystem'.
>

If you want to use a synthetic filesystem as the management interface
for qemu, that's one thing. But you suggested exposing the guest
filesystem in its entirely and that's what I disagreed with.

> If a user cannot read the image file then the user has no access to its
> contents via other namespaces either. That is, of course, a basic security
> aspect.
>
> ( That is perfectly true with a non-SELinux Unix permission model as well, and
> is true in the SELinux case as well. )
>

I don't think that's reasonable at all. The guest may encrypt it's disk
image. It still ought to be possible to run perf against that guest, no?

> Erm. Please explain to me, what exactly is 'not that simple' in a MAC
> environment?
>
> Also, i'd like to note that the 'restrictive SELinux setups' usecases are
> pretty secondary.
>
> To demonstrate that, i'd like every KVM developer on this list who reads this
> mail and who has their home development system where they produce their
> patches set up in a restrictive MAC environment, in that you cannot even read
> the images you are using, to chime in with a "I'm doing that" reply.
>

My home system doesn't run SELinux but I work daily with systems that
are using SELinux.

I want to be able to run tools like perf on these systems because
ultimately, I need to debug these systems on a daily basis.

But that's missing the point. We want to have an interface that works
for both cases so that we're not maintaining two separate interfaces.

We've rat holed a bit though. You want:

1) to run perf kvm list and be able to enumerate KVM guests

2) for this to Just Work with qemu guests launched from the command line

You could achieve (1) by tying perf to libvirt but that won't work for
(2). There are a few practical problems with (2).

qemu does not require the user to associate any uniquely identifying
information with a VM. We've also optimized the command line use case
so that if all you want to do is run a disk image, you just execute
"qemu foo.img". To satisfy your use case, we would either have to force
a use to always specify unique information, which would be less
convenient for our users or we would have to let the name be an optional
parameter.

As it turns out, we already support "qemu -name Fedora foo.img". What
we don't do today, but I've been suggesting we should, is automatically
create a QMP management socket in a well known location based on the
-name parameter when it's specified. That would let a tool like perf
Just Work provided that a user specified -name.

No one uses -name today though and I'm sure you don't either.

The only way to really address this is to change the interaction.
Instead of running perf externally to qemu, we should support a perf
command in the qemu monitor that can then tie directly to the perf
tooling. That gives us the best possible user experience.

We can't do that though unless perf is a library or is in some way more
programmatic.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Anthony Liguori on 16 Mar 2010 19:20

On 03/16/2010 12:39 PM, Ingo Molnar wrote:
>> If we look at the use-case, it's going to be something like, a user is
>> creating virtual machines and wants to get performance information about
>> them.
>>
>> Having to run a separate tool like perf is not going to be what they would
>> expect they had to do. Instead, they would either use their existing GUI
>> tool (like virt-manager) or they would use their management interface
>> (either QMP or libvirt).
>>
>> The complexity of interaction is due to the fact that perf shouldn't be a
>> stand alone tool. It should be a library or something with a programmatic
>> interface that another tool can make use of.
>>
> But ... a GUI interface/integration is of course possible too, and it's being
> worked on.
>
> perf is mainly a kernel developer tool, and kernel developers generally dont
> use GUIs to do their stuff: which is the (sole) reason why its first ~850
> commits of tools/perf/ were done without a GUI. We go where our developers
> are.
>
> In any case it's not an excuse to have no proper command-line tooling. In fact
> if you cannot get simpler, more atomic command-line tooling right then you'll
> probably doubly suck at doing a GUI as well.
>

It's about who owns the user interface.

If qemu owns the user interface, than we can satisfy this in a very
simple way by adding a perf monitor command. If we have to support
third party tools, then it significantly complicates things.

Regards,

Anthony Liguori

> Ingo
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 17 Mar 2010 00:00

On 03/17/2010 02:41 AM, Frank Ch. Eigler wrote:
> Hi -
>
> On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote:
>
>> [...]
>> The only way to really address this is to change the interaction.
>> Instead of running perf externally to qemu, we should support a perf
>> command in the qemu monitor that can then tie directly to the perf
>> tooling. That gives us the best possible user experience.
>>
> To what extent could this be solved with less crossing of
> isolation/abstraction layers, if the perfctr facilities were properly
> virtualized?
>

That's the more interesting (by far) usage model. In general guest
owners don't have access to the host, and host owners can't (and
shouldn't) change guests.

Monitoring guests from the host is useful for kvm developers, but less
so for users.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 17 Mar 2010 03:30

* oerg Roedel <joro(a)8bytes.org> wrote:

> On Tue, Mar 16, 2010 at 12:25:00PM +0100, Ingo Molnar wrote:
> > Hm, that sounds rather messy if we want to use it to basically expose kernel
> > functionality in a guest/host unified way. Is the qemu process discoverable in
> > some secure way? Can we trust it? Is there some proper tooling available to do
> > it, or do we have to push it through 2-3 packages to get such a useful feature
> > done?
>
> Since we want to implement a pmu usable for the guest anyway why we don't
> just use a guests perf to get all information we want? [...]

Look at the previous posting of this patch, this is something new and rather
unique. The main power in the 'perf kvm' kind of instrumentation is to profile
_both_ the host and the guest on the host, using the same tool (often using
the same kernel) and using similar workloads, and do profile comparisons using
'perf diff'.

Note that KVM's in-kernel design makes it easy to offer this kind of
host/guest shared implementation that Yanmin has created. Other virtulization
solutions with a poorer design (for example where the hypervisor code base is
split away from the guest implementation) will have it much harder to create
something similar.

That kind of integrated approach can result in very interesting finds straight
away, see:

http://lkml.indiana.edu/hypermail/linux/kernel/1003.0/00613.html

( the profile there demoes the need for spinlock accelerators for example -
there's clearly assymetrically large overhead in guest spinlock code. Guess
how much else we'll be able to find with a full 'perf kvm' implementation. )

One of the main goals of a virtualization implementation is to eliminate as
many performance differences to the host kernel as possible. From the first
day KVM was released the overriding question from users was always: 'how much
slower is it than native, and which workloads are hit worst, and why, and
could you pretty please speed up important workload XYZ'.

'perf kvm' helps exactly that kind of development workflow.

Note that with oprofile you can already do separate guest space and host space
profiling (with the timer driven fallbackin the guest). One idea with 'perf
kvm' is to change that paradigm of forced separation and forced duplication
and to supprt the workflow that most developers employ: use the host space for
development and unify instrumentation in an intuitive framework. Yanmin's
'perf kvm' patch is a very good step towards that goal.

Anyway ... look at the patches, try them and see it for yourself. Back in the
days when i did KVM performance work i wish i had something like Yanmin's
'perf kvm' feature. I'd probably still be hacking KVM today ;-)

So, the code is there, it's useful and it's up to you guys whether you live
with this opportunity - the perf developers are certainly eager to help out
with the details. There's already tons of per kernel subsystem perf helper
tools: perf sched, perf kmem, perf lock, perf bench, perf timechart.

'perf kvm' is really a natural and good next step IMO that underlines the main
design goodness KVM brought to the world of virtualization: proper guest/host
code base integration.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: + tmpfs-fix-oops-on-remounts-with-mpol=default.patch added to -mm tree
Next: [PATCH 5/5] doc: add the documentation for mpol=local