Prev: sys_unshare: simplify the not-really-implemented CLONE_THREAD/SIGHAND/VM code
Next: FS: libfs, implement simple_write_to_buffer
From: Don Zickus on 28 Apr 2010 16:30 On Wed, Apr 28, 2010 at 02:36:54PM +0200, Frederic Weisbecker wrote: > On Fri, Apr 23, 2010 at 12:13:29PM -0400, Don Zickus wrote: > > +void watchdog_overflow_callback(struct perf_event *event, int nmi, > > + struct perf_sample_data *data, > > + struct pt_regs *regs) > > +{ > > + int this_cpu = smp_processor_id(); > > + unsigned long touch_ts = per_cpu(watchdog_touch_ts, this_cpu); > > + char warn = __get_cpu_var(watchdog_warn); > > + > > + if (touch_ts == 0) { > > + __touch_watchdog(); > > + return; > > + } > > + > > + /* check for a hardlockup > > + * This is done by making sure our timer interrupt > > + * is incrementing. The timer interrupt should have > > + * fired multiple times before we overflow'd. If it hasn't > > + * then this is a good indication the cpu is stuck > > + */ > > + if (is_hardlockup(this_cpu)) { > > + /* only print hardlockups once */ > > + if (warn & HARDLOCKUP) > > + return; > > + > > + if (hardlockup_panic) > > + panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu); > > + else > > + WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu); > > + > > + __get_cpu_var(watchdog_warn) = warn | HARDLOCKUP; > > + return; > > + } > > + > > + __get_cpu_var(watchdog_warn) = warn & ~HARDLOCKUP; > > + return; > > +} > [...] > > +static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > > +{ > > + int this_cpu = smp_processor_id(); > > + unsigned long touch_ts = __get_cpu_var(watchdog_touch_ts); > > + char warn = __get_cpu_var(watchdog_warn); > > + struct pt_regs *regs = get_irq_regs(); > > + int duration; > > + > > + /* kick the hardlockup detector */ > > + watchdog_interrupt_count(); > > + > > + /* kick the softlockup detector */ > > + wake_up_process(__get_cpu_var(softlockup_watchdog)); > > + > > + /* .. and repeat */ > > + hrtimer_forward_now(hrtimer, ns_to_ktime(get_sample_period())); > > + > > + if (touch_ts == 0) { > > + __touch_watchdog(); > > + return HRTIMER_RESTART; > > + } > > + > > + /* check for a softlockup > > + * This is done by making sure a high priority task is > > + * being scheduled. The task touches the watchdog to > > + * indicate it is getting cpu time. If it hasn't then > > + * this is a good indication some task is hogging the cpu > > + */ > > + duration = is_softlockup(touch_ts, this_cpu); > > + if (unlikely(duration)) { > > + /* only warn once */ > > + if (warn & SOFTLOCKUP) > > + return HRTIMER_RESTART; > > + > > + printk(KERN_ERR "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > > + this_cpu, duration, > > + current->comm, task_pid_nr(current)); > > + print_modules(); > > + print_irqtrace_events(current); > > + if (regs) > > + show_regs(regs); > > + else > > + dump_stack(); > > + > > + if (softlockup_panic) > > + panic("softlockup: hung tasks"); > > + __get_cpu_var(watchdog_warn) = warn | SOFTLOCKUP; > > + } else > > + __get_cpu_var(watchdog_warn) = warn & ~SOFTLOCKUP; > > > Note these watchdog_warn modifications are racy against the same that > happens with HARDLOCKUP. You might clear what did the nmi. > > The race is harmless enough that we don't care much I think, but that's > why it would have make sense to separate watchdog_warn tracking space > between both. Heh. Good point. I'll respin. Cheers, Don > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on 12 May 2010 16:00
On Fri, May 07, 2010 at 05:11:44PM -0400, Don Zickus wrote: > The new nmi_watchdog (which uses the perf event subsystem) is very > similar in structure to the softlockup detector. Using Ingo's suggestion, > I combined the two functionalities into one file, kernel/watchdog.c. > > Now both the nmi_watchdog (or hardlockup detector) and softlockup detector > sit on top of the perf event subsystem, which is run every 60 seconds or so > to see if there are any lockups. > > To detect hardlockups, cpus not responding to interrupts, I implemented an > hrtimer that runs 5 times for every perf event overflow event. If that stops > counting on a cpu, then the cpu is most likely in trouble. > > To detect softlockups, tasks not yielding to the scheduler, I used the > previous kthread idea that now gets kicked every time the hrtimer fires. > If the kthread isn't being scheduled neither is anyone else and the > warning is printed to the console. > > I tested this on x86_64 and both the softlockup and hardlockup paths work. > > V2: > - cleaned up the Kconfig and softlockup combination > - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI > - seperated out the softlockup case from perf event subsystem > - re-arranged the enabling/disabling nmi watchdog from proc space > - added cpumasks for hardlockup failure cases > - removed fallback to soft events if no PMU exists for hard events > > V3: > - comment cleanups > - drop support for older softlockup code > - per_cpu cleanups > - completely remove software clock base hardlockup detector > - use per_cpu masking on hard/soft lockup detection > - #ifdef cleanups > - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR > - documentation additions > > V4: > - documentation fixes > - convert per_cpu to __get_cpu_var > - powerpc compile fixes > > V5: > - split apart warn flags for hard and soft lockups > > TODO: > - figure out how to make an arch-agnostic clock2cycles call (if possible) > to feed into perf events as a sample period > > Signed-off-by: Don Zickus <dzickus(a)redhat.com> > --- > Documentation/kernel-parameters.txt | 2 + > arch/x86/include/asm/nmi.h | 2 +- > arch/x86/kernel/apic/Makefile | 4 +- > arch/x86/kernel/apic/hw_nmi.c | 2 +- > arch/x86/kernel/traps.c | 4 +- > include/linux/nmi.h | 8 +- > include/linux/sched.h | 6 + > init/Kconfig | 5 +- > kernel/Makefile | 3 +- > kernel/sysctl.c | 21 +- > kernel/watchdog.c | 577 +++++++++++++++++++++++++++++++++++ > lib/Kconfig.debug | 30 ++- > 12 files changed, 635 insertions(+), 29 deletions(-) > create mode 100644 kernel/watchdog.c > > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt > index 736d456..705f16f 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -1764,6 +1764,8 @@ and is between 256 and 4096 characters. It is defined in the file > > nousb [USB] Disable the USB subsystem > > + nowatchdog [KNL] Disable the lockup detector. > + > nowb [ARM] > > nox2apic [X86-64,APIC] Do not enable x2APIC mode. > diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h > index 5b41b0f..932f0f8 100644 > --- a/arch/x86/include/asm/nmi.h > +++ b/arch/x86/include/asm/nmi.h > @@ -17,7 +17,7 @@ int do_nmi_callback(struct pt_regs *regs, int cpu); > > extern void die_nmi(char *str, struct pt_regs *regs, int do_panic); > extern int check_nmi_watchdog(void); > -#if !defined(CONFIG_NMI_WATCHDOG) > +#if !defined(CONFIG_LOCKUP_DETECTOR) > extern int nmi_watchdog_enabled; > #endif > extern int avail_to_resrv_perfctr_nmi_bit(unsigned int); > diff --git a/arch/x86/kernel/apic/Makefile b/arch/x86/kernel/apic/Makefile > index 1a4512e..52f32e0 100644 > --- a/arch/x86/kernel/apic/Makefile > +++ b/arch/x86/kernel/apic/Makefile > @@ -3,10 +3,10 @@ > # > > obj-$(CONFIG_X86_LOCAL_APIC) += apic.o apic_noop.o probe_$(BITS).o ipi.o > -ifneq ($(CONFIG_NMI_WATCHDOG),y) > +ifneq ($(CONFIG_LOCKUP_DETECTOR),y) > obj-$(CONFIG_X86_LOCAL_APIC) += nmi.o > endif > -obj-$(CONFIG_NMI_WATCHDOG) += hw_nmi.o > +obj-$(CONFIG_LOCKUP_DETECTOR) += hw_nmi.o > > obj-$(CONFIG_X86_IO_APIC) += io_apic.o > obj-$(CONFIG_SMP) += ipi.o > diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c > index e8b78a0..79425f9 100644 > --- a/arch/x86/kernel/apic/hw_nmi.c > +++ b/arch/x86/kernel/apic/hw_nmi.c > @@ -89,7 +89,7 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *regs) > > u64 hw_nmi_get_sample_period(void) > { > - return cpu_khz * 1000; > + return (u64)(cpu_khz) * 1000 * 60; > } > > #ifdef ARCH_HAS_NMI_WATCHDOG > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c > index bdc7fab..bd347c2 100644 > --- a/arch/x86/kernel/traps.c > +++ b/arch/x86/kernel/traps.c > @@ -406,7 +406,7 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs) > == NOTIFY_STOP) > return; > > -#ifndef CONFIG_NMI_WATCHDOG > +#ifndef CONFIG_LOCKUP_DETECTOR > /* > * Ok, so this is none of the documented NMI sources, > * so it must be the NMI watchdog. > @@ -414,7 +414,7 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs) > if (nmi_watchdog_tick(regs, reason)) > return; > if (!do_nmi_callback(regs, cpu)) > -#endif /* !CONFIG_NMI_WATCHDOG */ > +#endif /* !CONFIG_LOCKUP_DETECTOR */ > unknown_nmi_error(reason, regs); > #else > unknown_nmi_error(reason, regs); > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > index 22cc796..abd48aa 100644 > --- a/include/linux/nmi.h > +++ b/include/linux/nmi.h > @@ -20,7 +20,7 @@ extern void touch_nmi_watchdog(void); > extern void acpi_nmi_disable(void); > extern void acpi_nmi_enable(void); > #else > -#ifndef CONFIG_NMI_WATCHDOG > +#ifndef CONFIG_LOCKUP_DETECTOR > static inline void touch_nmi_watchdog(void) > { > touch_softlockup_watchdog(); > @@ -51,12 +51,12 @@ static inline bool trigger_all_cpu_backtrace(void) > } > #endif > > -#ifdef CONFIG_NMI_WATCHDOG > +#ifdef CONFIG_LOCKUP_DETECTOR > int hw_nmi_is_cpu_stuck(struct pt_regs *); > u64 hw_nmi_get_sample_period(void); > -extern int nmi_watchdog_enabled; > +extern int watchdog_enabled; > struct ctl_table; > -extern int proc_nmi_enabled(struct ctl_table *, int , > +extern int proc_dowatchdog_enabled(struct ctl_table *, int , > void __user *, size_t *, loff_t *); > #endif > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 6f7bba9..2455ff5 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -338,6 +338,12 @@ extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int write, > size_t *lenp, loff_t *ppos); > #endif > > +#ifdef CONFIG_LOCKUP_DETECTOR > +extern int proc_dowatchdog_thresh(struct ctl_table *table, int write, > + void __user *buffer, > + size_t *lenp, loff_t *ppos); > +#endif > + > /* Attach to any functions which should be ignored in wchan output. */ > #define __sched __attribute__((__section__(".sched.text"))) > > diff --git a/init/Kconfig b/init/Kconfig > index 7331a16..c5ce8b7 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -948,8 +948,11 @@ config PERF_USE_VMALLOC > > config PERF_EVENTS_NMI > bool > + depends on PERF_EVENTS > help > - Arch has support for nmi_watchdog > + System hardware can generate an NMI using the perf event > + subsystem. Also has support for calculating CPU cycle events > + to determine how many clock cycles in a given period. > > menu "Kernel Performance Events And Counters" > > diff --git a/kernel/Makefile b/kernel/Makefile > index 8a5abe5..cc3acb3 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -75,9 +75,8 @@ obj-$(CONFIG_GCOV_KERNEL) += gcov/ > obj-$(CONFIG_AUDIT_TREE) += audit_tree.o > obj-$(CONFIG_KPROBES) += kprobes.o > obj-$(CONFIG_KGDB) += kgdb.o > -obj-$(CONFIG_DETECT_SOFTLOCKUP) += softlockup.o > -obj-$(CONFIG_NMI_WATCHDOG) += nmi_watchdog.o > obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o > +obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o > obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ > obj-$(CONFIG_SECCOMP) += seccomp.o > obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index ac72c9e..1083897 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -60,7 +60,7 @@ > #include <asm/io.h> > #endif > > -#ifdef CONFIG_NMI_WATCHDOG > +#ifdef CONFIG_LOCKUP_DETECTOR > #include <linux/nmi.h> > #endif > > @@ -696,16 +696,25 @@ static struct ctl_table kern_table[] = { > .mode = 0444, > .proc_handler = proc_dointvec, > }, > -#if defined(CONFIG_NMI_WATCHDOG) > +#if defined(CONFIG_LOCKUP_DETECTOR) > { > - .procname = "nmi_watchdog", > - .data = &nmi_watchdog_enabled, > + .procname = "watchdog", > + .data = &watchdog_enabled, I suspect this could break some userspace apps that rely on this sysctl option. May be you should keep the nmi_watchdog around and schedule its removal for later in the feature_removal_schedule.txt file. > .maxlen = sizeof (int), > .mode = 0644, > - .proc_handler = proc_nmi_enabled, > + .proc_handler = proc_dowatchdog_enabled, > + }, > + { > + .procname = "watchdog_thresh", > + .data = &softlockup_thresh, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dowatchdog_thresh, > + .extra1 = &neg_one, > + .extra2 = &sixty, > }, > #endif > -#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) && !defined(CONFIG_NMI_WATCHDOG) > +#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) && !defined(CONFIG_LOCKUP_DETECTOR) > { > .procname = "unknown_nmi_panic", > .data = &unknown_nmi_panic, > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > new file mode 100644 > index 0000000..2684e95 > --- /dev/null > +++ b/kernel/watchdog.c > @@ -0,0 +1,577 @@ > +/* > + * Detect hard and soft lockups on a system > + * > + * started by Don Zickus, Copyright (C) 2010 Red Hat, Inc. > + * > + * this code detects hard lockups: incidents in where on a CPU > + * the kernel does not respond to anything except NMI. > + * > + * Note: Most of this code is borrowed heavily from softlockup.c, > + * so thanks to Ingo for the initial implementation. > + * Some chunks also taken from arch/x86/kernel/apic/nmi.c, thanks > + * to those contributors as well. > + */ > + > +#include <linux/mm.h> > +#include <linux/cpu.h> > +#include <linux/nmi.h> > +#include <linux/init.h> > +#include <linux/delay.h> > +#include <linux/freezer.h> > +#include <linux/kthread.h> > +#include <linux/lockdep.h> > +#include <linux/notifier.h> > +#include <linux/module.h> > +#include <linux/sysctl.h> > + > +#include <asm/irq_regs.h> > +#include <linux/perf_event.h> > + > +int watchdog_enabled; > +int __read_mostly softlockup_thresh = 60; > + > +static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts); > +static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog); > +static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer); > +static DEFINE_PER_CPU(bool, hard_watchdog_warn); This one should be under CONFIG_PERF_EVENTS_NMI > +static DEFINE_PER_CPU(bool, soft_watchdog_warn); > +#ifdef CONFIG_PERF_EVENTS_NMI > +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); > +static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); > +#endif > + > +static int __read_mostly did_panic; > +static int __initdata no_watchdog; > + > + > +/* boot commands */ > +/* > + * Should we panic when a soft-lockup or hard-lockup occurs: > + */ > +#ifdef CONFIG_PERF_EVENTS_NMI > +static int hardlockup_panic; > + > +static int __init hardlockup_panic_setup(char *str) > +{ > + if (!strncmp(str, "panic", 5)) > + hardlockup_panic = 1; > + return 1; > +} > +__setup("nmi_watchdog=", hardlockup_panic_setup); If nmi_watchdog=0, this won't deactivate anymore the hardlockup detector. > +#endif > + > +unsigned int __read_mostly softlockup_panic = > + CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE; > + > +static int __init softlockup_panic_setup(char *str) > +{ > + softlockup_panic = simple_strtoul(str, NULL, 0); > + > + return 1; > +} > +__setup("softlockup_panic=", softlockup_panic_setup); > + > +static int __init nowatchdog_setup(char *str) > +{ > + no_watchdog = 1; > + return 1; > +} > +__setup("nowatchdog", nowatchdog_setup); > + > +/* deprecated */ > +static int __init nosoftlockup_setup(char *str) > +{ > + no_watchdog = 1; > + return 1; > +} > +__setup("nosoftlockup", nosoftlockup_setup); > +/* */ > + > + > +/* > + * Returns seconds, approximately. We don't need nanosecond > + * resolution, and we don't need to waste time with a big divide when > + * 2^30ns == 1.074s. > + */ > +static unsigned long get_timestamp(int this_cpu) > +{ > + return cpu_clock(this_cpu) >> 30LL; /* 2^30 ~= 10^9 */ > +} > + > +static unsigned long get_sample_period(void) > +{ > + /* > + * convert softlockup_thresh from seconds to ns > + * the divide by 5 is to give hrtimer 5 chances to > + * increment before the hardlockup detector generates > + * a warning > + */ > + return softlockup_thresh / 5 * NSEC_PER_SEC; > +} > + > +/* Commands for resetting the watchdog */ > +static void __touch_watchdog(void) > +{ > + int this_cpu = raw_smp_processor_id(); This must use smp_processor_id() for preemption disabled checks. > + > + __get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu); > +} > + > +void touch_watchdog(void) > +{ > + __get_cpu_var(watchdog_touch_ts) = 0; > +} > +EXPORT_SYMBOL(touch_watchdog); > + > +void touch_all_watchdog(void) > +{ > + int cpu; > + > + /* > + * this is done lockless > + * do we care if a 0 races with a timestamp? > + * all it means is the softlock check starts one cycle later > + */ > + for_each_online_cpu(cpu) > + per_cpu(watchdog_touch_ts, cpu) = 0; > +} > + > +void touch_nmi_watchdog(void) > +{ > + touch_watchdog(); > +} > +EXPORT_SYMBOL(touch_nmi_watchdog); > + > +void touch_all_nmi_watchdog(void) > +{ > + touch_all_watchdog(); > +} > + > +void touch_softlockup_watchdog(void) > +{ > + touch_watchdog(); > +} > + > +void touch_all_softlockup_watchdogs(void) > +{ > + touch_all_watchdog(); > +} > + > +void softlockup_tick(void) > +{ > +} > + > +#ifdef CONFIG_PERF_EVENTS_NMI > +/* watchdog detector functions */ > +static int is_hardlockup(int cpu) > +{ > + unsigned long hrint = per_cpu(hrtimer_interrupts, cpu); > + > + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) > + return 1; > + > + per_cpu(hrtimer_interrupts_saved, cpu) = hrint; All these per_cpu() should be __this_cpu_var() for readability, for the preemption disabled safety check, and may be even for optimization reasons: if an arch defines its own __my_cpu_offset, it may get it faster. > +static int is_softlockup(unsigned long touch_ts, int cpu) > +{ > + unsigned long now = get_timestamp(cpu); > + > + /* Warn about unreasonable delays: */ > + if (now > (touch_ts + softlockup_thresh)) > + return now - touch_ts; > + > + return 0; > +} > + > +static int > +watchdog_panic(struct notifier_block *this, unsigned long event, void *ptr) > +{ > + did_panic = 1; > + > + return NOTIFY_DONE; > +} > + > +static struct notifier_block panic_block = { > + .notifier_call = watchdog_panic, > +}; > + > +#ifdef CONFIG_PERF_EVENTS_NMI > +static struct perf_event_attr wd_hw_attr = { > + .type = PERF_TYPE_HARDWARE, > + .config = PERF_COUNT_HW_CPU_CYCLES, > + .size = sizeof(struct perf_event_attr), > + .pinned = 1, > + .disabled = 1, > +}; > + > +/* Callback function for perf event subsystem */ > +void watchdog_overflow_callback(struct perf_event *event, int nmi, > + struct perf_sample_data *data, > + struct pt_regs *regs) > +{ > + int this_cpu = smp_processor_id(); > + unsigned long touch_ts = per_cpu(watchdog_touch_ts, this_cpu); same here > + > + if (touch_ts == 0) { > + __touch_watchdog(); > + return; > + } > + > + /* check for a hardlockup > + * This is done by making sure our timer interrupt > + * is incrementing. The timer interrupt should have > + * fired multiple times before we overflow'd. If it hasn't > + * then this is a good indication the cpu is stuck > + */ > + if (is_hardlockup(this_cpu)) { > + /* only print hardlockups once */ > + if (__get_cpu_var(hard_watchdog_warn) == true) > + return; > + > + if (hardlockup_panic) > + panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu); > + else > + WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu); > + > + __get_cpu_var(hard_watchdog_warn) = true; > + return; > + } > + > + __get_cpu_var(hard_watchdog_warn) = false; > + return; > +} > +static void watchdog_interrupt_count(void) > +{ > + __get_cpu_var(hrtimer_interrupts)++; > +} > +#else > +static inline void watchdog_interrupt_count(void) { return; } > +#endif /* CONFIG_PERF_EVENTS_NMI */ > + > +/* watchdog kicker functions */ > +static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > +{ > + int this_cpu = smp_processor_id(); > + unsigned long touch_ts = __get_cpu_var(watchdog_touch_ts); > + struct pt_regs *regs = get_irq_regs(); > + int duration; > + > + /* kick the hardlockup detector */ > + watchdog_interrupt_count(); > + > + /* kick the softlockup detector */ > + wake_up_process(__get_cpu_var(softlockup_watchdog)); > + > + /* .. and repeat */ > + hrtimer_forward_now(hrtimer, ns_to_ktime(get_sample_period())); > + > + if (touch_ts == 0) { > + __touch_watchdog(); > + return HRTIMER_RESTART; > + } > + > + /* check for a softlockup > + * This is done by making sure a high priority task is > + * being scheduled. The task touches the watchdog to > + * indicate it is getting cpu time. If it hasn't then > + * this is a good indication some task is hogging the cpu > + */ > + duration = is_softlockup(touch_ts, this_cpu); > + if (unlikely(duration)) { > + /* only warn once */ > + if (__get_cpu_var(soft_watchdog_warn) == true) > + return HRTIMER_RESTART; > + > + printk(KERN_ERR "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > + this_cpu, duration, > + current->comm, task_pid_nr(current)); > + print_modules(); > + print_irqtrace_events(current); > + if (regs) > + show_regs(regs); > + else > + dump_stack(); > + > + if (softlockup_panic) > + panic("softlockup: hung tasks"); > + __get_cpu_var(soft_watchdog_warn) = true; > + } else > + __get_cpu_var(soft_watchdog_warn) = false; > + > + return HRTIMER_RESTART; > +} > + > + > +/* > + * The watchdog thread - touches the timestamp. > + */ > +static int watchdog(void *__bind_cpu) > +{ > + struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 }; > + struct hrtimer *hrtimer = &per_cpu(watchdog_hrtimer, (unsigned long)__bind_cpu); This is bound to a single cpu already: __raw_get_cpu_var() (because we don't need the preempt disabled check here). Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |