A question of perf NMI handler [Kernel]

Prev: [tip:timers/core] Checkpatch: Warn about unexpectedly long msleep's
Next: genhd, efi: add efi partition metadata to hd_structs

From: Cyrill Gorcunov on 4 Aug 2010 12:40

On Wed, Aug 04, 2010 at 12:20:26PM -0400, Don Zickus wrote:
....
> > >
> > > Because the reason registers are never set. If they were, then the code
> > > wouldn't have to walk the notify_chain. :-)
> > >
> >
> > maybe we're talking about different things. i meant that if there is nmi
> > with a reason (from 0x61) the handling of such nmi should be done before
> > notify_die I think (if only I not miss something behind).
>
> No we are talking about the same thing. :-) And that code is already

seems not actually ;)

> there. The problem is the bits in register 0x61 are not always set
> correctly in the case of SERRs (well at least in all the cases I have
> dealt with). So you can easily can a flood of unknown nmis from an SERR
> and register 0x61 would have the PERR/SERR bits set to 0. Fun, huh?

if there is nothing in nmi_sc the code flows into another branch. And
it hits the problem of perf events eating all nmi giving no chance the
others. So we take if (!(reason & 0xc0)) case and hit DIE_NMI_IPI
(/me scratching the head why it's not under CONFIG_X86_LOCAL_APIC) and
drop all code, unpleasant.

>
> Cheers,
> Don
>
-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Robert Richter on 4 Aug 2010 15:00

(cc'ing Andi)

On 04.08.10 12:39:30, Cyrill Gorcunov wrote:
> On Wed, Aug 04, 2010 at 12:20:26PM -0400, Don Zickus wrote:

> > there. The problem is the bits in register 0x61 are not always set
> > correctly in the case of SERRs (well at least in all the cases I have
> > dealt with). So you can easily can a flood of unknown nmis from an SERR
> > and register 0x61 would have the PERR/SERR bits set to 0. Fun, huh?
>
> if there is nothing in nmi_sc the code flows into another branch. And
> it hits the problem of perf events eating all nmi giving no chance the
> others. So we take if (!(reason & 0xc0)) case and hit DIE_NMI_IPI
> (/me scratching the head why it's not under CONFIG_X86_LOCAL_APIC) and
> drop all code, unpleasant.

Only the upper 2 bits in io_61h indicate the nmi reason, so in case of
(!(reason & 0xc0)) the source simply can not be determined and all nmi
handlers in the chain must be called (DIE_NMI/DIE_NMI_IPI). The
perfctr handler then stops it.

So you can decide to either get an unrecovered nmi panic triggered by
a perfctr or losing unknown nmis from other sources. Maybe this can be
fixed by implementing handlers for those sources.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Cyrill Gorcunov on 4 Aug 2010 15:30

On Wed, Aug 04, 2010 at 08:48:06PM +0200, Robert Richter wrote:
> (cc'ing Andi)
>
> On 04.08.10 12:39:30, Cyrill Gorcunov wrote:
> > On Wed, Aug 04, 2010 at 12:20:26PM -0400, Don Zickus wrote:
>
> > > there. The problem is the bits in register 0x61 are not always set
> > > correctly in the case of SERRs (well at least in all the cases I have
> > > dealt with). So you can easily can a flood of unknown nmis from an SERR
> > > and register 0x61 would have the PERR/SERR bits set to 0. Fun, huh?
> >
> > if there is nothing in nmi_sc the code flows into another branch. And
> > it hits the problem of perf events eating all nmi giving no chance the
> > others. So we take if (!(reason & 0xc0)) case and hit DIE_NMI_IPI
> > (/me scratching the head why it's not under CONFIG_X86_LOCAL_APIC) and
> > drop all code, unpleasant.
>
> Only the upper 2 bits in io_61h indicate the nmi reason, so in case of
> (!(reason & 0xc0)) the source simply can not be determined and all nmi
> handlers in the chain must be called (DIE_NMI/DIE_NMI_IPI). The
> perfctr handler then stops it.

yes, that is what I meant by nmi_sc register. I think we need to restucturize
current default_do_nmi handler but how to be with perfs I don't know at moment
if perf register gets overflowed (ie already has pedning nmi) but we handle
it in early nmi cycle this would lead to strange results. Need to think.

>
> So you can decide to either get an unrecovered nmi panic triggered by
> a perfctr or losing unknown nmis from other sources. Maybe this can be
> fixed by implementing handlers for those sources.
>
> -Robert
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>
-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andi Kleen on 4 Aug 2010 15:30

> Only the upper 2 bits in io_61h indicate the nmi reason, so in case of
> (!(reason & 0xc0)) the source simply can not be determined and all nmi
> handlers in the chain must be called (DIE_NMI/DIE_NMI_IPI). The
> perfctr handler then stops it.
>
> So you can decide to either get an unrecovered nmi panic triggered by
> a perfctr or losing unknown nmis from other sources. Maybe this can be
> fixed by implementing handlers for those sources.

This is a tricky area. Me and Ying have been looking at this recently.

Hardware traditionally signals NMI when it has a uncontained error and really
expects the OS to shut down to prevent data corruption spreading. i

Unfortunately especially for some older hardware
there can be cases where this is not expressed in port 61.
But the default behaviour of Linux for this today is quite wrong.

Some cases can be also determined with the help of APEI, which
can give you more information about the error (and tell you
if shutdown is needed).

But of course we can still have performance counter and other NMI
users.

So the right flow might be something like

- check software events (like crash dump or reboot)
- check perfctrs
- check APEI
- check port 61 for known events (it's probably a good idea
to check perfctrs first because accessing io ports is quite slow.
But the perfctr handler has to make sure it doesn't eat unknown
events, otherwise error handling would be impacted)
- check other event sources
- shutdown (depending on the chipset likely)

This means the NMI users who cannot determine themselves if a event
happened and eat everything (like oprofile today) would need to be fixed.

-Andi

--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Robert Richter on 6 Aug 2010 03:00

On 04.08.10 15:26:34, Cyrill Gorcunov wrote:

> yes, that is what I meant by nmi_sc register. I think we need to restucturize
> current default_do_nmi handler but how to be with perfs I don't know at moment
> if perf register gets overflowed (ie already has pedning nmi) but we handle
> it in early nmi cycle this would lead to strange results. Need to think.
>
> >
> > So you can decide to either get an unrecovered nmi panic triggered by
> > a perfctr or losing unknown nmis from other sources. Maybe this can be
> > fixed by implementing handlers for those sources.

I was playing around with it yesterday trying to fix this. My idea is
to skip an unkown nmi if the privious nmi was a *handled* perfctr
nmi. I will probably post an rfc patch early next week.

Another problem I encountered is that unknown nmis from the chipset
are not reenabled, thus when hitting the nmi button I only see one
unknown nmi message per boot, if I reenable it, I get an nmi
storm firing nmi_watchdog. Uhh....

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: [tip:timers/core] Checkpatch: Warn about unexpectedly long msleep's
Next: genhd, efi: add efi partition metadata to hd_structs