From: Don Zickus on
On Fri, Apr 09, 2010 at 06:56:50PM +0400, Cyrill Gorcunov wrote:
> On Fri, Apr 09, 2010 at 02:00:38AM +0200, Frederic Weisbecker wrote:
> > On Tue, Apr 06, 2010 at 07:31:15PM +0400, Cyrill Gorcunov wrote:
> > > > I fear the cpu clock is not going to help you detecting any hard lockups.
> > > > If you're stuck in an interrupt or an irq disabled loop, your cpu clock is
> > > > not going to fire.
> > > >
> > >
> > > I guess it's not supposed to. For such cases only nmi irqs may help for which
> > > the perf events are there (/me need to check if we program apic timer for anything
> > > like that). But it should help for other deadlocks. Or I miss something?
> >
> >
> > Actually not. What the hardlockup detector does it to check the progression
> > of irqs.
> >
>
> yup, i know what nmi-watchdog is doing. I guess you've misunderstood me. I meant
> that sw-driven detector is not supposed to guard against the cases you're
> referring to. I don't remember the details but someone proposed to make a
> fallback to sw-watchdog if there is no ability to use nmi from perf-events
> (for any reason) which eventually being implemented in Don's patch. And
> there will be a message that watchdog has been switched to sw-driven
> scaffold. So user will (or should) see this message and mark it I believe.
> This sw-watchdog is like "ok, we've been trying our best but there is a
> problem and the only solution we could offer -- is to use sw-watchdog".
> That is how I understand the reason for sw-watchdog there.

Correct.

>
> >
> > So it detects true hardlockups: stuck in an irq disabled section.
> > If you don't have NMI to detect that (here this made by hardware clock based
> > on cpu cycles overflows), then you're screwed. The hardlockup detector is
> > useless with a maskable irq based clock.
> >
> -- Cyrill

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/