From: Andrew Morton on 26 Jan 2010 15:20 On Tue, 26 Jan 2010 14:09:45 GMT tip-bot for Thomas Gleixner <tglx(a)linutronix.de> wrote: > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -343,7 +343,19 @@ static void clocksource_resume_watchdog(void) > { > unsigned long flags; > > - spin_lock_irqsave(&watchdog_lock, flags); > + /* > + * We use trylock here to avoid a potential dead lock when > + * kgdb calls this code after the kernel has been stopped with > + * watchdog_lock held. When watchdog_lock is held we just > + * return and accept, that the watchdog might trigger and mark > + * the monitored clock source (usually TSC) unstable. > + * > + * This does not affect the other caller clocksource_resume() > + * because at this point the kernel is UP, interrupts are > + * disabled and nothing can hold watchdog_lock. > + */ > + if (!spin_trylock_irqsave(&watchdog_lock, flags)) > + return; > clocksource_reset_watchdog(); > spin_unlock_irqrestore(&watchdog_lock, flags); > } > > @@ -458,8 +470,8 @@ void clocksource_resume(void) > * clocksource_touch_watchdog - Update watchdog > * > * Update the watchdog after exception contexts such as kgdb so as not > - * to incorrectly trip the watchdog. > - * > + * to incorrectly trip the watchdog. This might fail when the kernel > + * was stopped in code which holds watchdog_lock. > */ > void clocksource_touch_watchdog(void) > { It would be neater and a shade more reliable to add a separate clocksource_try_touch_watchdog() for kgdb's use. Which could be inside #ifdef CONFIG_KGDB. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jason Wessel on 26 Jan 2010 15:50 Andrew Morton wrote: > On Tue, 26 Jan 2010 14:09:45 GMT tip-bot for Thomas Gleixner <tglx(a)linutronix.de> wrote: > > >> --- a/kernel/time/clocksource.c >> +++ b/kernel/time/clocksource.c >> @@ -343,7 +343,19 @@ static void clocksource_resume_watchdog(void) >> { >> unsigned long flags; >> >> - spin_lock_irqsave(&watchdog_lock, flags); >> + /* >> + * We use trylock here to avoid a potential dead lock when >> + * kgdb calls this code after the kernel has been stopped with >> + * watchdog_lock held. When watchdog_lock is held we just >> + * return and accept, that the watchdog might trigger and mark >> + * the monitored clock source (usually TSC) unstable. >> + * >> + * This does not affect the other caller clocksource_resume() >> + * because at this point the kernel is UP, interrupts are >> + * disabled and nothing can hold watchdog_lock. >> + */ >> + if (!spin_trylock_irqsave(&watchdog_lock, flags)) >> + return; >> clocksource_reset_watchdog(); >> spin_unlock_irqrestore(&watchdog_lock, flags); >> } >> >> @@ -458,8 +470,8 @@ void clocksource_resume(void) >> * clocksource_touch_watchdog - Update watchdog >> * >> * Update the watchdog after exception contexts such as kgdb so as not >> - * to incorrectly trip the watchdog. >> - * >> + * to incorrectly trip the watchdog. This might fail when the kernel >> + * was stopped in code which holds watchdog_lock. >> */ >> void clocksource_touch_watchdog(void) >> { >> > > It would be neater and a shade more reliable to add a separate > clocksource_try_touch_watchdog() for kgdb's use. Which could be inside > #ifdef CONFIG_KGDB. > > The kernel debugger is the only user of this API function so realistically you do not need a new function, and could simply rename this one, if the name does not fit. -- more notes on the problem -- I further diagnosed the problem, and it is reasonably rare that it happens in the first place. I had an instrumented version of the kernel debugger which showed the problem happening quite frequently, while trying to fix hw breakpoints. The problem is not actually an interprocessor deadlock, but that of re-entering the spin_lock() when it is already held by the interrupted task on the same cpu. There is no obvious way that I could see to check for this specific condition and to abort. For now the patch Thomas created is sufficient, and given some time we'll decide how to best live with the side effects or to consider any further changes. Jason. Tested-by: Jason Wessel <jason.wessel(a)windriver.com> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Pages: 1 Prev: tty: fix race in tty_fasync Next: How to write driver that needs to call another driver? |