Prev: [PATCH v3] lockdep: Make lockstats counting per cpu
Next: [PATCH] rcu: make dead code really dead
From: Aristeu Sergio Rozanski Filho on 27 Mar 2010 23:10 Hi Don, > +/* deprecated */ > +static int __init nosoftlockup_setup(char *str) > +{ > + no_watchdog = 1; > + return 1; > +} > +__setup("nosoftlockup", nosoftlockup_setup); > +static int __init nonmi_watchdog_setup(char *str) > +{ > + no_watchdog = 1; > + return 1; > +} > +__setup("nonmi_watchdog", nonmi_watchdog_setup); didn't you just add nonmi_watchdog parameter? I don't think there's a reason to keep compatibility here. the rest of the patch looks fine to me -- Aristeu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Don Zickus on 29 Mar 2010 14:30 On Sat, Mar 27, 2010 at 10:46:50PM -0400, Aristeu Sergio Rozanski Filho wrote: > Hi Don, > > +/* deprecated */ > > +static int __init nosoftlockup_setup(char *str) > > +{ > > + no_watchdog = 1; > > + return 1; > > +} > > +__setup("nosoftlockup", nosoftlockup_setup); > > +static int __init nonmi_watchdog_setup(char *str) > > +{ > > + no_watchdog = 1; > > + return 1; > > +} > > +__setup("nonmi_watchdog", nonmi_watchdog_setup); > didn't you just add nonmi_watchdog parameter? I don't think there's a reason > to keep compatibility here. Hmm, I think you are right. I thought I added that because it existed in the old nmi_watchdog setup but I can't find it. So yeah, I can drop that. Thanks, Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Aristeu Sergio Rozanski Filho on 30 Mar 2010 11:00 > On Sat, Mar 27, 2010 at 10:46:50PM -0400, Aristeu Sergio Rozanski Filho wrote: > > Hi Don, > > > +/* deprecated */ > > > +static int __init nosoftlockup_setup(char *str) > > > +{ > > > + no_watchdog = 1; > > > + return 1; > > > +} > > > +__setup("nosoftlockup", nosoftlockup_setup); > > > +static int __init nonmi_watchdog_setup(char *str) > > > +{ > > > + no_watchdog = 1; > > > + return 1; > > > +} > > > +__setup("nonmi_watchdog", nonmi_watchdog_setup); > > didn't you just add nonmi_watchdog parameter? I don't think there's a reason > > to keep compatibility here. > > Hmm, I think you are right. I thought I added that because it existed in > the old nmi_watchdog setup but I can't find it. So yeah, I can drop that. you could provide a nmi_watchdog=0 backwards compatibility and warn about values != 0 -- Aristeu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Don Zickus on 5 Apr 2010 10:20 On Tue, Mar 23, 2010 at 05:33:38PM -0400, Don Zickus wrote: > The new nmi_watchdog (which uses the perf event subsystem) is very > similar in structure to the softlockup detector. Using Ingo's suggestion, > I combined the two functionalities into one file, kernel/watchdog.c. > > Now both the nmi_watchdog (or hardlockup detector) and softlockup detector > sit on top of the perf event subsystem, which is run every 60 seconds or so > to see if there are any lockups. I raised some questions privately to Ingo, he asked I re-iterate them with Peter Z. and Frederic W. cc'd. > Ok thanks. When you get a chance I had a couple of questions I was hoping > you could answer for me. > > - does the hrtimer stuff look ok? > > - any thoughts on how to achieve arch-independent way of calculating a > sample period for perf events? otherwise i am stuck with an arch hook. > > - I wanted to merge the hung task detector code into watchdog.c. The main > logic of the code is to walk the task list which i thought about doing > in the watchdog kthread. I assume that is the right way to go, but i was a > little confused on how the scheduler worked. I thought the watchdog kthread > would be scheduled very frequently (being a high priority task) but it seems > to only schedule when the code wakes it up. Is that right? Cheers, Don > > --- > arch/x86/kernel/apic/hw_nmi.c | 2 +- > include/linux/nmi.h | 2 +- > kernel/Makefile | 2 +- > kernel/sysctl.c | 2 +- > kernel/watchdog.c | 526 +++++++++++++++++++++++++++++++++++++++++ > lib/Kconfig.debug | 24 ++- > 6 files changed, 546 insertions(+), 12 deletions(-) > create mode 100644 kernel/watchdog.c > > diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c > index e8b78a0..79425f9 100644 > --- a/arch/x86/kernel/apic/hw_nmi.c > +++ b/arch/x86/kernel/apic/hw_nmi.c > @@ -89,7 +89,7 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *regs) > > u64 hw_nmi_get_sample_period(void) > { > - return cpu_khz * 1000; > + return (u64)(cpu_khz) * 1000 * 60; > } > > #ifdef ARCH_HAS_NMI_WATCHDOG > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > index 22cc796..a501de9 100644 > --- a/include/linux/nmi.h > +++ b/include/linux/nmi.h > @@ -54,7 +54,7 @@ static inline bool trigger_all_cpu_backtrace(void) > #ifdef CONFIG_NMI_WATCHDOG > int hw_nmi_is_cpu_stuck(struct pt_regs *); > u64 hw_nmi_get_sample_period(void); > -extern int nmi_watchdog_enabled; > +extern int watchdog_enabled; > struct ctl_table; > extern int proc_nmi_enabled(struct ctl_table *, int , > void __user *, size_t *, loff_t *); > diff --git a/kernel/Makefile b/kernel/Makefile > index 8a5abe5..c8e3e7c 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -76,7 +76,7 @@ obj-$(CONFIG_AUDIT_TREE) += audit_tree.o > obj-$(CONFIG_KPROBES) += kprobes.o > obj-$(CONFIG_KGDB) += kgdb.o > obj-$(CONFIG_DETECT_SOFTLOCKUP) += softlockup.o > -obj-$(CONFIG_NMI_WATCHDOG) += nmi_watchdog.o > +obj-$(CONFIG_NMI_WATCHDOG) += watchdog.o > obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o > obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ > obj-$(CONFIG_SECCOMP) += seccomp.o > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index ac72c9e..6066e3d 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -699,7 +699,7 @@ static struct ctl_table kern_table[] = { > #if defined(CONFIG_NMI_WATCHDOG) > { > .procname = "nmi_watchdog", > - .data = &nmi_watchdog_enabled, > + .data = &watchdog_enabled, > .maxlen = sizeof (int), > .mode = 0644, > .proc_handler = proc_nmi_enabled, > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > new file mode 100644 > index 0000000..7334565 > --- /dev/null > +++ b/kernel/watchdog.c > @@ -0,0 +1,526 @@ > +/* > + * Detect Hard/Soft Lockups using the NMI > + * > + * started by Don Zickus, Copyright (C) 2010 Red Hat, Inc. > + * > + * this code detects hard lockups: incidents in where on a CPU > + * the kernel does not respond to anything except NMI. > + * > + * Note: Most of this code is borrowed heavily from softlockup.c, > + * so thanks to Ingo for the initial implementation. > + * Some chunks also taken from arch/x86/kernel/apic/nmi.c, thanks > + * to those contributors as well. > + */ > + > +#include <linux/mm.h> > +#include <linux/cpu.h> > +#include <linux/nmi.h> > +#include <linux/init.h> > +#include <linux/delay.h> > +#include <linux/freezer.h> > +#include <linux/kthread.h> > +#include <linux/lockdep.h> > +#include <linux/notifier.h> > +#include <linux/module.h> > +#include <linux/sysctl.h> > + > +#include <asm/irq_regs.h> > +#include <linux/perf_event.h> > + > +int watchdog_enabled; > +int __read_mostly softlockup_thresh = 60; > + > +static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); > +static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts); > +static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer); > +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); > +static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog); > + > +static int __read_mostly did_panic; > +static int __initdata no_watchdog; > + > + > +/* boot commands */ > +/* > + * Should we panic when a soft-lockup or hard-lockup occurs: > + */ > +static int hardlockup_panic; > + > +unsigned int __read_mostly softlockup_panic = > + CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE; > + > +static int __init hardlockup_panic_setup(char *str) > +{ > + if (!strncmp(str, "panic", 5)) > + hardlockup_panic = 1; > + return 1; > +} > +__setup("nmi_watchdog=", hardlockup_panic_setup); > + > +static int __init softlockup_panic_setup(char *str) > +{ > + softlockup_panic = simple_strtoul(str, NULL, 0); > + > + return 1; > +} > +__setup("softlockup_panic=", softlockup_panic_setup); > + > +static int __init no_watchdog_setup(char *str) > +{ > + no_watchdog = 1; > + return 1; > +} > +__setup("no_watchdog", no_watchdog_setup); > + > +/* deprecated */ > +static int __init nosoftlockup_setup(char *str) > +{ > + no_watchdog = 1; > + return 1; > +} > +__setup("nosoftlockup", nosoftlockup_setup); > +static int __init nonmi_watchdog_setup(char *str) > +{ > + no_watchdog = 1; > + return 1; > +} > +__setup("nonmi_watchdog", nonmi_watchdog_setup); > +/* */ > + > + > +/* > + * Returns seconds, approximately. We don't need nanosecond > + * resolution, and we don't need to waste time with a big divide when > + * 2^30ns == 1.074s. > + */ > +static unsigned long get_timestamp(int this_cpu) > +{ > + return cpu_clock(this_cpu) >> 30LL; /* 2^30 ~= 10^9 */ > +} > + > +static unsigned long get_sample_period(void) > +{ > + /* > + * convert softlockup_thresh from seconds to ns > + * the divide by 5 is to give hrtimer 5 chances to > + * increment before the hardlockup detector generates > + * a warning > + */ > + return softlockup_thresh / 5 * NSEC_PER_SEC; > +} > + > +/* Commands for resetting the watchdog */ > +static void __touch_watchdog(void) > +{ > + int this_cpu = raw_smp_processor_id(); > + > + __raw_get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu); > +} > + > +void touch_watchdog(void) > +{ > + __raw_get_cpu_var(watchdog_touch_ts) = 0; > +} > +EXPORT_SYMBOL(touch_watchdog); > + > +void touch_all_watchdog(void) > +{ > + int cpu; > + > + for_each_online_cpu(cpu) > + per_cpu(watchdog_touch_ts, cpu) = 0; > +} > + > +void touch_nmi_watchdog(void) > +{ > + touch_watchdog(); > +} > +EXPORT_SYMBOL(touch_nmi_watchdog); > + > +void touch_all_nmi_watchdog(void) > +{ > + touch_all_watchdog(); > +} > +/* end of deprecated functions */ > + > +/* watchdog detector functions */ > +static int is_hardlockup(int cpu) > +{ > + unsigned long hrint = per_cpu(hrtimer_interrupts, cpu); > + > + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) > + return 1; > + > + per_cpu(hrtimer_interrupts_saved, cpu) = hrint; > + return 0; > +} > + > +static int is_softlockup(unsigned long touch_ts, int cpu) > +{ > + unsigned long now = get_timestamp(cpu); > + > + /* Warn about unreasonable delays: */ > + if (now > (touch_ts + softlockup_thresh)) { > + return now - touch_ts; > + } > + > + return 0; > +} > + > +static int > +watchdog_panic(struct notifier_block *this, unsigned long event, void *ptr) > +{ > + did_panic = 1; > + > + return NOTIFY_DONE; > +} > + > +static struct notifier_block panic_block = { > + .notifier_call = watchdog_panic, > +}; > + > +struct perf_event_attr wd_hw_attr = { > + .type = PERF_TYPE_HARDWARE, > + .config = PERF_COUNT_HW_CPU_CYCLES, > + .size = sizeof(struct perf_event_attr), > + .pinned = 1, > + .disabled = 1, > +}; > + > +struct perf_event_attr wd_sw_attr = { > + .type = PERF_TYPE_SOFTWARE, > + .config = PERF_COUNT_SW_CPU_CLOCK, > + .size = sizeof(struct perf_event_attr), > + .pinned = 1, > + .disabled = 1, > +}; > + > +/* Callback function for perf event subsystem */ > +void watchdog_overflow_callback(struct perf_event *event, int nmi, > + struct perf_sample_data *data, > + struct pt_regs *regs) > +{ > + int this_cpu = smp_processor_id(); > + unsigned long touch_ts = per_cpu(watchdog_touch_ts, this_cpu); > + int duration; > + > + if (touch_ts == 0) { > + __touch_watchdog(); > + return; > + } > + > + /* check for a hardlockup > + * This is done by making sure our timer interrupt > + * is incrementing. The timer interrupt should have > + * fired multiple times before we overflow'd. If it hasn't > + * then this is a good indication the cpu is stuck > + */ > + if (is_hardlockup(this_cpu)) { > + if (hardlockup_panic) > + panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu); > + else > + WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu); > + } > + > + /* check for a softlockup > + * This is done by making sure a high priority task is > + * being scheduled. The task touches the watchdog to > + * indicate it is getting cpu time. If it hasn't then > + * this is a good indication some task is hogging the cpu > + */ > + duration = is_softlockup(touch_ts, this_cpu); > + if (duration) { > + printk(KERN_ERR "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > + this_cpu, duration, > + current->comm, task_pid_nr(current)); > + print_modules(); > + print_irqtrace_events(current); > + if (regs) > + show_regs(regs); > + else > + dump_stack(); > + > + if (softlockup_panic) > + panic("softlockup: hung tasks"); > + } > + > + return; > +} > + > +/* watchdog kicker functions */ > +static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > +{ > + /* kick the hardlockup detector */ > + __get_cpu_var(hrtimer_interrupts)++; > + > + /* kick the softlockup detector */ > + wake_up_process(__get_cpu_var(softlockup_watchdog)); > + > + /* .. and repeat */ > + hrtimer_forward_now(hrtimer, ns_to_ktime(get_sample_period())); > + > + return HRTIMER_RESTART; > +} > + > + > +/* > + * The watchdog thread - touches the timestamp. > + */ > +static int watchdog(void *__bind_cpu) > +{ > + struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 }; > + struct hrtimer *hrtimer = &per_cpu(watchdog_hrtimer, (unsigned long)__bind_cpu); > + > + sched_setscheduler(current, SCHED_FIFO, ¶m); > + > + /* initialize timestamp */ > + __touch_watchdog(); > + > + /* kick off the timer for the hardlockup detector */ > + /* done here because hrtimer_start can only pin to smp_processor_id() */ > + hrtimer_start(hrtimer, ns_to_ktime(get_sample_period()), > + HRTIMER_MODE_REL_PINNED); > + > + set_current_state(TASK_INTERRUPTIBLE); > + /* > + * Run briefly once per second to reset the softlockup timestamp. > + * If this gets delayed for more than 60 seconds then the > + * debug-printout triggers in softlockup_tick(). > + */ > + while (!kthread_should_stop()) { > + __touch_watchdog(); > + schedule(); > + > + if (kthread_should_stop()) > + break; > + > + set_current_state(TASK_INTERRUPTIBLE); > + } > + __set_current_state(TASK_RUNNING); > + > + return 0; > +} > + > + > +/* prepare/enable/disable routines */ > +static int watchdog_prepare_cpu(int cpu) > +{ > + struct hrtimer *hrtimer = &per_cpu(watchdog_hrtimer, cpu); > + struct task_struct *p; > + > + BUG_ON(per_cpu(softlockup_watchdog, cpu)); > + p = kthread_create(watchdog, (void *)(unsigned long)cpu, "watchdog/%d", cpu); > + if (IS_ERR(p)) { > + printk(KERN_ERR "softlockup watchdog for %i failed\n", cpu); > + return -1; > + } > + per_cpu(watchdog_touch_ts, cpu) = 0; > + per_cpu(softlockup_watchdog, cpu) = p; > + hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); > + hrtimer->function = watchdog_timer_fn; > + > + return 0; > +} > + > +static int watchdog_enable(int cpu) > +{ > + struct perf_event_attr *wd_attr; > + struct perf_event *event = per_cpu(watchdog_ev, cpu); > + struct task_struct *p = per_cpu(softlockup_watchdog, cpu); > + > + /* is it already setup and enabled? */ > + if (event && event->state > PERF_EVENT_STATE_OFF) > + goto out; > + > + /* it is setup but not enabled */ > + if (event != NULL) > + goto out_enable; > + > + /* Try to register using hardware perf events first */ > + wd_attr = &wd_hw_attr; > + wd_attr->sample_period = hw_nmi_get_sample_period(); > + event = perf_event_create_kernel_counter(wd_attr, cpu, -1, watchdog_overflow_callback); > + if (!IS_ERR(event)) { > + printk(KERN_INFO "NMI watchdog enabled, takes one hw-pmu counter.\n"); > + goto out_save; > + } > + > + /* hardware doesn't exist or not supported, fallback to software events */ > + printk(KERN_INFO "NMI watchdog: hardware not available, trying software events\n"); > + wd_attr = &wd_sw_attr; > + wd_attr->sample_period = softlockup_thresh * NSEC_PER_SEC; > + event = perf_event_create_kernel_counter(wd_attr, cpu, -1, watchdog_overflow_callback); > + if (!IS_ERR(event)) { > + printk(KERN_INFO "NMI watchdog enabled, takes one software counter.\n"); > + goto out_save; > + } > + > + printk(KERN_ERR "NMI watchdog failed to create perf event on cpu%i: %p\n", cpu, event); > + return -1; > + > + /* success path */ > +out_save: > + per_cpu(watchdog_ev, cpu) = event; > +out_enable: > + perf_event_enable(per_cpu(watchdog_ev, cpu)); > +out: > + /* kick the softlockup thread */ > + if (p) { > + kthread_bind(p, cpu); > + wake_up_process(p); > + } > + > + /* if any cpu succeeds, watchdog is considered enabled for the system */ > + watchdog_enabled = 1; > + > + return 0; > +} > + > +static void watchdog_disable(int cpu) > +{ > + struct perf_event *event = per_cpu(watchdog_ev, cpu); > + struct task_struct *p = per_cpu(softlockup_watchdog, cpu); > + struct hrtimer *hrtimer = &per_cpu(watchdog_hrtimer, cpu); > + > + /* > + * cancel the timer first to stop incrementing the stats > + * and waking up the kthread > + */ > + hrtimer_cancel(hrtimer); > + > + if (event) { > + perf_event_disable(event); > + per_cpu(watchdog_ev, cpu) = NULL; > + > + /* should be in cleanup, but blocks oprofile */ > + perf_event_release_kernel(event); > + } > + > + if (p) { > + kthread_bind(p, cpumask_any(cpu_online_mask)); > + kthread_stop(p); > + } > +} > + > +static void watchdog_cleanup(int cpu) > +{ > + per_cpu(softlockup_watchdog, cpu) = NULL; > +} > + > +static void watchdog_enable_all_cpus(void) > +{ > + int cpu; > + int result; > + > + if (watchdog_enabled) > + return; > + > + for_each_online_cpu(cpu) > + result += watchdog_enable(cpu); > + > + if (result) > + printk(KERN_ERR "watchdog: failed to be enabled on some cpus\n"); > + > +} > + > +static void watchdog_disable_all_cpus(void) > +{ > + int cpu; > + > + for_each_online_cpu(cpu) > + watchdog_disable(cpu); > + > + /* if all watchdogs are disabled, then they are disabled for the system */ > + watchdog_enabled = 0; > +} > + > + > +/* sysctl functions */ > +#ifdef CONFIG_SYSCTL > +/* > + * proc handler for /proc/sys/kernel/nmi_watchdog > + */ > + > +int proc_nmi_enabled(struct ctl_table *table, int write, > + void __user *buffer, size_t *length, loff_t *ppos) > +{ > + touch_all_watchdog(); > + proc_dointvec(table, write, buffer, length, ppos); > + if (watchdog_enabled) > + watchdog_enable_all_cpus(); > + else > + watchdog_disable_all_cpus(); > + return 0; > +} > + > +int proc_dowatchdog_thresh(struct ctl_table *table, int write, > + void __user *buffer, > + size_t *lenp, loff_t *ppos) > +{ > + touch_all_watchdog(); > + return proc_dointvec_minmax(table, write, buffer, lenp, ppos); > +} > + > +#endif /* CONFIG_SYSCTL */ > + > + > +/* > + * Create/destroy watchdog threads as CPUs come and go: > + */ > +static int __cpuinit > +cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) > +{ > + int hotcpu = (unsigned long)hcpu; > + > + switch (action) { > + case CPU_UP_PREPARE: > + case CPU_UP_PREPARE_FROZEN: > + if (watchdog_prepare_cpu(hotcpu)) > + return NOTIFY_BAD; > + break; > + case CPU_ONLINE: > + case CPU_ONLINE_FROZEN: > + if (watchdog_enable(hotcpu)) > + return NOTIFY_BAD; > + break; > +#ifdef CONFIG_HOTPLUG_CPU > + case CPU_UP_CANCELED: > + case CPU_UP_CANCELED_FROZEN: > + watchdog_disable(hotcpu); > + break; > + case CPU_DEAD: > + case CPU_DEAD_FROZEN: > + watchdog_disable(hotcpu); > + watchdog_cleanup(hotcpu); > + break; > +#endif /* CONFIG_HOTPLUG_CPU */ > + } > + return NOTIFY_OK; > +} > + > +static struct notifier_block __cpuinitdata cpu_nfb = { > + .notifier_call = cpu_callback > +}; > + > +static int __init spawn_watchdog_task(void) > +{ > + void *cpu = (void *)(long)smp_processor_id(); > + int err; > + > + if (no_watchdog) > + return 0; > + > + err = cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu); > + if (err == NOTIFY_BAD) { > + BUG(); > + return 1; > + } > + cpu_callback(&cpu_nfb, CPU_ONLINE, cpu); > + register_cpu_notifier(&cpu_nfb); > + > + atomic_notifier_chain_register(&panic_notifier_list, &panic_block); > + > + return 0; > +} > +early_initcall(spawn_watchdog_task); > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index e2e73cc..518ec79 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -171,20 +171,28 @@ config DETECT_SOFTLOCKUP > support it.) > > config NMI_WATCHDOG > - bool "Detect Hard Lockups with an NMI Watchdog" > - depends on DEBUG_KERNEL && PERF_EVENTS && PERF_EVENTS_NMI > + bool "Detect Hard and Soft Lockups" > + depends on DEBUG_KERNEL && PERF_EVENTS && PERF_EVENTS_NMI && !DETECT_SOFTLOCKUP > help > Say Y here to enable the kernel to use the NMI as a watchdog > - to detect hard lockups. This is useful when a cpu hangs for no > - reason but can still respond to NMIs. A backtrace is displayed > - for reviewing and reporting. > + to detect hard and soft lockups. > > - The overhead should be minimal, just an extra NMI every few > + Softlockups are bugs that cause the kernel to loop in kernel > + mode for more than 60 seconds, without giving other tasks a > + chance to run. The current stack trace is displayed upon > + detection and the system will stay locked up. > + > + Hardlockups are bugs that cause the cpu to loop in kernel mode > + for more than 60 seconds, without letting other interrupts a > + chance to run. The current stack trace is displayed upon detection > + and the system will stay locked up. > + > + The overhead should me minimal, just an extra NMI every few > seconds. > > config BOOTPARAM_SOFTLOCKUP_PANIC > bool "Panic (Reboot) On Soft Lockups" > - depends on DETECT_SOFTLOCKUP > + depends on DETECT_SOFTLOCKUP || NMI_WATCHDOG > help > Say Y here to enable the kernel to panic on "soft lockups", > which are bugs that cause the kernel to loop in kernel > @@ -201,7 +209,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC > > config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE > int > - depends on DETECT_SOFTLOCKUP > + depends on DETECT_SOFTLOCKUP || NMI_WATCHDOG > range 0 1 > default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC > default 1 if BOOTPARAM_SOFTLOCKUP_PANIC > -- > 1.6.5.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Don Zickus on 5 Apr 2010 16:20 On Tue, Apr 06, 2010 at 12:11:11AM +0400, Cyrill Gorcunov wrote: > On Tue, Mar 30, 2010 at 10:52:38AM -0400, Aristeu Sergio Rozanski Filho wrote: > > > On Sat, Mar 27, 2010 at 10:46:50PM -0400, Aristeu Sergio Rozanski Filho wrote: > > > > Hi Don, > > > > > +/* deprecated */ > > > > > +static int __init nosoftlockup_setup(char *str) > > > > > +{ > > > > > + no_watchdog = 1; > > > > > + return 1; > > > > > +} > > > > > +__setup("nosoftlockup", nosoftlockup_setup); > > > > > +static int __init nonmi_watchdog_setup(char *str) > > > > > +{ > > > > > + no_watchdog = 1; > > > > > + return 1; > > > > > +} > > > > > +__setup("nonmi_watchdog", nonmi_watchdog_setup); > > > > didn't you just add nonmi_watchdog parameter? I don't think there's a reason > > > > to keep compatibility here. > > > > > > Hmm, I think you are right. I thought I added that because it existed in > > > the old nmi_watchdog setup but I can't find it. So yeah, I can drop that. > > you could provide a nmi_watchdog=0 backwards compatibility and warn about > > values != 0 > > > > -- > > Aristeu > > > > Sorry for a long delay, I think we might need to inform a user that "lapic", > "ioapic" is no longer used (perf-nmi is supposed to substitute the former nmi > code in a long term right?) so that for some time period, say the whole release > cycle, if lapic or ioapic, or numbers are passed to nmi_watchdog= setup option > we would just print out that the parameters are deprecated and better to not > use them any longer. Hm? Agreed. Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 3 4 Prev: [PATCH v3] lockdep: Make lockstats counting per cpu Next: [PATCH] rcu: make dead code really dead |