Prev: sys_unshare: simplify the not-really-implemented CLONE_THREAD/SIGHAND/VM code
Next: FS: libfs, implement simple_write_to_buffer
From: Cyrill Gorcunov on 16 Apr 2010 11:00 On Fri, Apr 16, 2010 at 04:46:17PM +0200, Frederic Weisbecker wrote: .... > > > > + if (hardlockup_panic) > > > > + panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu); > > > > + else > > > > + WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu); > > > > + > > > > + cpumask_set_cpu(this_cpu, to_cpumask(hardlockup_mask)); > > > > > > > > > > > > May be have an arch spin lock there to update your cpu mask safely. > > > > > > > Hmm, this is NMI handler path so from what we protect this per-cpu data? > > Do I miss something? /me confused > > > The cpu mask is not per cpu here, this is a shared bitmap, so you > can race against other cpus NMIs. > > That said, as I suggested, having a per cpu var that we set when we > warned would be much better than a spinlock here. > yeah, saw DECLARE_BITMAP but read it as DEFINE_PER_CPU for some reason. having any spinlock in irq handler is really under suspicious. -- Cyrill -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Don Zickus on 16 Apr 2010 11:10 On Fri, Apr 16, 2010 at 04:43:04PM +0200, Frederic Weisbecker wrote: > On Fri, Apr 16, 2010 at 10:12:13AM -0400, Don Zickus wrote: > > On Fri, Apr 16, 2010 at 03:47:14AM +0200, Frederic Weisbecker wrote: > > > > config PERF_EVENTS_NMI > > > > bool > > > > + depends on PERF_EVENTS > > > > help > > > > Arch has support for nmi_watchdog > > > > > > > > > > > > That looks too general. It's more about the fact the arch supports > > > cpu cycle events and generates NMIs on overflow. > > > > I was trying to figure out a way to add the PERF_EVENTS dependency as I > > didn't want to impose it on the CONFIG_NMI_WATCHDOG if that config > > supported softlockup (which doesn't need the PERF_EVENTS). > > > > Yeah and this is fine. I was talking about the help description. Oh. heh. ok, will expand that. > > > > > > I'm confused, do we have two versions of the softlockup > > > detector now? You should drop the older one. > > > > Originally Ingo talked about a migration path, so I was going to support > > the older one in case the new one was having issues, sort of like what he > > suggested about moving the nmi code from arch/x86/kernel/apic/nmi.c to > > kernel/watchdog.c. But I can probably drop the softlockup case as the > > migration isn't as tricky as the nmi case. > > > > Ok. > > > > > + return; > > > > + } > > > > + > > > > + cpumask_clear_cpu(this_cpu, to_cpumask(hardlockup_mask)); > > > > > > > > > > > > Hmm...this is probably not necessary. > > > > I was just thinking of the case where dispite the WARN above, the cpu > > actually recovered and then failed again separately. But I probably won't > > spend anymore time defending it. :-) > > > > This is really just a corner case, I guess you don't need to > bother with that. It is actually racy against other cpus and adding > a spinlock here (in the everything is fine path) would be an overkill. > > In fact, having two per cpu vars named hardlockup_warned and > softlockup_warned would be better than cpumasks. I'm sorry I > suggested you the cpumask, but such per cpu vars will avoid > you dealing with these synchonization issues. And one of the primary > rules is usually to never take a lock from NMIs if we can :) Yeah, I guess per cpu is better. I agree that locks in NMI are frowned upon but I wasn't sure of it was dealt with. I'll try to implement this. Any objections if I combined hardlockup and softlockup with per cpu watchdog_warn and have bit masks for HARDLOCKUP and SOFTLOCKUP? I hate to just waste per cpu space for this. > > > > > > You probably want a backtrace cpu mask here as well > > > (but better don't use the same than the hardlockup thing) > > > > yup. > > > So actually, per_cpu softlockup_warned would be better :) > > > > > Also you should half-drop the DETECT_SOFTLOCKUP thing: > > > keep it's definition but drop the ability to choose it from > > > the prompt: > > > > > > config DETECT_SOFTLOCKUP > > > bool > > > depends on DEBUG_KERNEL && !S390 > > > default y > > > > > > This way we keep it for compatibility with def_configs, it will > > > enable the WATCHDOG by default if it is "y", we can schedule > > > its removal later. > > > I understand the general idea but not quite the implementation idea. I will work > > on it and see what I come up with. > > > We current have: > > config DETECT_SOFTLOCKUP > bool "Blah" > depends on DEBUG_KERNEL && !S390 > default y > help > ....... > > The idea is to remove the "Blah" so that the user can't select it > anymore from make menuconfig, and to remove the help too as it's useless > too. > > So that config WATCHDOG can be default y if DETECT_SOFTLOCKUP. > Then if someone comes with a config that has DETECT_SOFTLOCKUP, > it's new implementation (WATCHDOG) will enabled by default. Ah, I missed the bool part. I got it. Thanks for the clarification. Cheers, Don > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on 16 Apr 2010 11:40 On Fri, Apr 16, 2010 at 11:04:07AM -0400, Don Zickus wrote: > > This is really just a corner case, I guess you don't need to > > bother with that. It is actually racy against other cpus and adding > > a spinlock here (in the everything is fine path) would be an overkill. > > > > In fact, having two per cpu vars named hardlockup_warned and > > softlockup_warned would be better than cpumasks. I'm sorry I > > suggested you the cpumask, but such per cpu vars will avoid > > you dealing with these synchonization issues. And one of the primary > > rules is usually to never take a lock from NMIs if we can :) > > Yeah, I guess per cpu is better. I agree that locks in NMI are frowned > upon but I wasn't sure of it was dealt with. They work in fact. They are just not checked by lockdep. And mostly they are very dangerous: if something else can take it (from interrupt, from context) then this is a deadlock. And even though we ensure this is only taken from NMI, we tend to avoid that. > I'll try to implement this. Any objections if I combined hardlockup and > softlockup with per cpu watchdog_warn and have bit masks for HARDLOCKUP > and SOFTLOCKUP? I hate to just waste per cpu space for this. Hmm, a hardlockup can come in after a softlockup. Don't worry too much about memory: usually the more you have cpu, the more you have memory :) Plus this is debugging code, not something supposed to be enabled in production. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Don Zickus on 16 Apr 2010 12:20 On Fri, Apr 16, 2010 at 05:32:12PM +0200, Frederic Weisbecker wrote: > > I'll try to implement this. Any objections if I combined hardlockup and > > softlockup with per cpu watchdog_warn and have bit masks for HARDLOCKUP > > and SOFTLOCKUP? I hate to just waste per cpu space for this. > > > > Hmm, a hardlockup can come in after a softlockup. Let me re-explain what I meant. It was meant to do double duty. The softlockup code only checks the SOFTLOCKUP bit and the hardlockup only ever checks the HARDLOCKUP bit. ie if get_cpu_var(watchdog_warn) && HARDLOCKUP { return; } > Don't worry too much about memory: usually the more you have cpu, > the more you have memory :) > Plus this is debugging code, not something supposed to be enabled > in production. Well depends on your POV. In RHEL we enable both NMI_WATCHDOG and SOFTLOCKUP on production systems (and we have customers that are thankful for that :-) ). Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on 16 Apr 2010 12:30
On Fri, Apr 16, 2010 at 12:14:01PM -0400, Don Zickus wrote: > On Fri, Apr 16, 2010 at 05:32:12PM +0200, Frederic Weisbecker wrote: > > > I'll try to implement this. Any objections if I combined hardlockup and > > > softlockup with per cpu watchdog_warn and have bit masks for HARDLOCKUP > > > and SOFTLOCKUP? I hate to just waste per cpu space for this. > > > > > > > > Hmm, a hardlockup can come in after a softlockup. > > Let me re-explain what I meant. It was meant to do double duty. The > softlockup code only checks the SOFTLOCKUP bit and the hardlockup only > ever checks the HARDLOCKUP bit. > > ie if get_cpu_var(watchdog_warn) && HARDLOCKUP { return; } Ah right. > > > Don't worry too much about memory: usually the more you have cpu, > > the more you have memory :) > > Plus this is debugging code, not something supposed to be enabled > > in production. > > Well depends on your POV. In RHEL we enable both NMI_WATCHDOG and > SOFTLOCKUP on production systems (and we have customers that are > thankful for that :-) ). Ok :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |