Prev: [PATCH 14/18 v2] staging: rtl8187se: call disable_pci_device() if pci_probe() failed
Next: [PATCH 1/1] Char: mxser, call pci_disable_device from probe/remove
From: Cyrill Gorcunov on 12 Aug 2010 10:40 On Thu, Aug 12, 2010 at 03:24:19PM +0200, Robert Richter wrote: > On 10.08.10 15:24:28, Cyrill Gorcunov wrote: > > On Tue, Aug 10, 2010 at 09:05:41PM +0200, Robert Richter wrote: > > > On 10.08.10 13:24:51, Cyrill Gorcunov wrote: > > > > > > > It gets masked on NMI arrival, at least for some models (Core Duo, P4, > > > > P6 M and I suspect more theh that, that was the reason oprofile has > > > > it, also there is a note in SDM V3a page 643). > > > > > > Yes, that's right, I never noticed that. Maybe it is better to > > > implement the apic write it in model specific code then. > > > > > > > Perhaps we can make it simplier I think, ie like it was before -- we just > > move it under your new DIE_NMIUNKNOWN, in separate patch of course. Though > > I'm fine with either way. > > I do not understand why you want to put this in the 'unknown' > path. Isn't it necessary to unmask the vector with every call of the > nmi handler? > > -Robert > Heh, it's simple - I'm screwed. Robert you're right, of course it should NOT be under every 'unknown' nmi. I thought about small optimization here, but I think all this should be done only _after_ your patch is merged. Sorry for confuse ;) -- Cyrill -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on 13 Aug 2010 00:30 On Thu, Aug 12, 2010 at 12:00:58AM +0200, Robert Richter wrote: > I was debuging this a little more, see version 2 below. > > -Robert > > -- > > From 8bb831af56d118b85fc38e0ddc2e516f7504b9fb Mon Sep 17 00:00:00 2001 > From: Robert Richter <robert.richter(a)amd.com> > Date: Thu, 5 Aug 2010 16:19:59 +0200 > Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs > > When perfctrs are running it is valid to have unhandled nmis, two > events could trigger 'simultaneously' raising two back-to-back > NMIs. If the first NMI handles both, the latter will be empty and daze > the CPU. > > The solution to avoid an 'unknown nmi' massage in this case was simply > to stop the nmi handler chain when perfctrs are runnning by stating > the nmi was handled. This has the drawback that a) we can not detect > unknown nmis anymore, and b) subsequent nmi handlers are not called. > > This patch addresses this. Now, we check this unknown NMI if it could > be a perfctr back-to-back NMI. Otherwise we pass it and let the kernel > handle the unknown nmi. > > This is a debug log: > > cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430 > cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616 > cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320 > cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139 > cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100 > cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607 > cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416 > cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032 > cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830 > cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743 > cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552 > cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224 > cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677 > cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772 > cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818 > cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591 > Uhhuh. NMI received for unknown reason 00 on CPU 6. > Do you have a strange power saving mode enabled? > Dazed and confused, but trying to continue > > Deltas: > > nmi #32334 340186 > nmi #32335 1327704 > nmi #32336 1819 <<<< back-to-back nmi [1] > nmi #32337 85961 > nmi #32338 284507 > nmi #32339 1578809 > nmi #32340 217616 > nmi #32341 1798 <<<< back-to-back nmi [2] > nmi #32342 240913 > nmi #32343 1512809 > nmi #32344 116672 > nmi #32345 412453 > nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters > nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one counter > nmi #32348 1773 <<<< 3rd nmi (back-to-back) handling no counter! [3] > > For back-to-back nmi detection there are the following rules: > > The perfctr nmi handler was handling more than one counter and no > counter was handled in the subsequent nmi (see [1] and [2] above). > > There is another case if there are two subsequent back-to-back nmis > [3]. In this case we measure the time between the first and the > 2nd. The 2nd is detected as back-to-back because the first handled > more than one counter. The time between the 1st and the 2nd is used to > calculate a range for which we assume a back-to-back nmi. Now, the 3rd > nmi triggers, we measure again the time delta and compare it with the > first delta from which we know it was a back-to-back nmi. If the 3rd > nmi is within the range, it is also a back-to-back nmi and we drop it. > > Signed-off-by: Robert Richter <robert.richter(a)amd.com> > --- That time based thing looks a bit complicated. I'm still not sure why you don't want to use a simple flag: After handled a perf NMI: if (handled more than one counter) __get_cpu_var(skip_unknown) = 1; While handling an unknown NMI: if (__get_cpu_var(skip_unknown)) { __get_cpu_var(skip_unknow) = 0; return NOTIFY_STOP; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on 13 Aug 2010 00:40
On Wed, Aug 11, 2010 at 01:10:46PM +0200, Robert Richter wrote: > On 10.08.10 22:44:55, Frederic Weisbecker wrote: > > On Tue, Aug 10, 2010 at 04:48:56PM -0400, Don Zickus wrote: > > > @@ -1200,7 +1200,7 @@ void perf_events_lapic_init(void) > > > apic_write(APIC_LVTPC, APIC_DM_NMI); > > > } > > > > > > -static DEFINE_PER_CPU(unsigned int, perfctr_handled); > > > +static DEFINE_PER_CPU(unsigned int, perfctr_skip); > > Yes, using perfctr_skip is better to understand ... > > > > @@ -1229,14 +1228,11 @@ perf_event_nmi_handler(struct notifier_block *self, > > > * was handling a perfctr. Otherwise we pass it and > > > * let the kernel handle the unknown nmi. > > > * > > > - * Note: this could be improved if we drop unknown > > > - * NMIs only if we handled more than one perfctr in > > > - * the previous NMI. > > > */ > > > - this_nmi = percpu_read(irq_stat.__nmi_count); > > > - prev_nmi = __get_cpu_var(perfctr_handled); > > > - if (this_nmi == prev_nmi + 1) > > > + if (__get_cpu_var(perfctr_skip)){ > > > + __get_cpu_var(perfctr_skip) -=1; > > > return NOTIFY_STOP; > > > + } > > > return NOTIFY_DONE; > > > default: > > > return NOTIFY_DONE; > > > @@ -1246,11 +1242,21 @@ perf_event_nmi_handler(struct notifier_block *self, > > > > > > apic_write(APIC_LVTPC, APIC_DM_NMI); > > > > > > - if (!x86_pmu.handle_irq(regs)) > > > + handled = x86_pmu.handle_irq(regs); > > > + if (!handled) > > > + /* not our NMI */ > > > return NOTIFY_DONE; > > > - > > > - /* handled */ > > > - __get_cpu_var(perfctr_handled) = percpu_read(irq_stat.__nmi_count); > > > + else if (handled > 1) > > > + /* > > > + * More than one perfctr triggered. This could have > > > + * caused a second NMI that we must now skip because > > > + * we have already handled it. Remember it. > > > + * > > > + * NOTE: We have no way of knowing if a second NMI was > > > + * actually triggered, so we may accidentally skip a valid > > > + * unknown nmi later. > > > + */ > > > + __get_cpu_var(perfctr_skip) +=1; > > ... but this will not work. You have to mark the *absolute* nmi number > here. If you only raise a flag, the next unknown nmi will be dropped, > every. Isn't it what we want? Only the next unknown nmi gets dropped. > Because, in between there could have been other nmis that > stopped the chain and thus the 'unknown' path is not executed. I'm not sure what you mean here. Are you thinking about a third NMI source that triggers while we are still handling the first NMI in the back to back sequence? > The trick in my patch is that you *know*, which nmi you want to skip. Well with the flag you also know which nmi you want to skip. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |