Prev: [PATCH 4/7] xen: The entrance for PV featured HVM
Next: i915 / PM: Fix crash while aborting hibernation (Re: [linux-pm] [regression] "drm/i915: implement new pm ops" disables irq on aborted s2disk)
From: Thomas Gleixner on 8 Feb 2010 04:20 On Mon, 8 Feb 2010, Andreas Mohr wrote: > > And then a cat current_clocksource managed to hang again. Well, that's not surprising at all. If one task is stuck on clocksource_mutex, then the next one will be stuck as well. > (NOTE that the - now complete! - SysRq-T list does NOT show any backtraces > of kwatchdog any more, only many other processes) > Could it be that the (rather disruptive) NMI watchdog confuses the current state at > change_clocksource and causes that stuff to get left with > clocksource_mutex remaining taken? Nope, the NMI watchdog is not involved. It merily tells us that the task is stuck. tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on 8 Feb 2010 05:10 On Mon, 8 Feb 2010, Andreas Mohr wrote: > > Nope, the NMI watchdog is not involved. It merily tells us that the > > task is stuck. > > OK. > And after that message debug_locks is zeroed and kwatchdog is gone > from the process list (probably during debug_locks change). Oh, no. kwatchdog is a run once thread. It always exits after work is done, but I'm pretty confused about the NMI watchdog output. EIP: 0060:[<c1045170>] EFLAGS: 00000082 CPU: 0 EIP is at timekeeping_forward_now+0x116/0x139 I don't see what might make the machine stuck here. Can you decode the source line with "addr2line -e vmlinux c1045170" please ? > I'll explain what I think might be happening: > bootup switches to acpi_pm, timekeeping gets borked, NMI watchdog complains > due to timekeeping issues, brutally yanks the waiting acpi_pm switchover > (thereby NOT releasing clocksource_mutex), No, the NMI watchdog does not yank anything. It just reports. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on 8 Feb 2010 15:30 On Mon, 8 Feb 2010, Andreas Mohr wrote: > Hi, > > On Mon, Feb 08, 2010 at 11:06:58AM +0100, Thomas Gleixner wrote: > > EIP: 0060:[<c1045170>] EFLAGS: 00000082 CPU: 0 > > EIP is at timekeeping_forward_now+0x116/0x139 > > > > I don't see what might make the machine stuck here. Can you decode the > > source line with "addr2line -e vmlinux c1045170" please ? > > And the winner is: > /usr/src/linux-2.6.33-rc7/include/linux/math64.h:91 > > static __always_inline u32 > __iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder) > { > u32 ret = 0; > > while (dividend >= divisor) { > /* The following asm() prevents the compiler from > optimising this loop into a modulo operation. */ > asm("" : "+rm"(dividend)); > > dividend -= divisor; > ret++; > } > > *remainder = dividend; > > return ret; > } > > > while ...... > > Do I see a divisor == 0 here?? ;) The only function which is calling __iter_div_u64_rem() from timekeeping_forward_now() is timespec_add_ns() which calls it with a constant divisor: static __always_inline void timespec_add_ns(struct timespec *a, u64 ns) { a->tv_sec += __iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns); a->tv_nsec = ns; } There goes the theory :) Which compiler version are you using ? Can you please provide the disassembly of kernel/time/timekeeping.o ? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on 8 Feb 2010 15:50 On Mon, 8 Feb 2010, Thomas Gleixner wrote: > On Mon, 8 Feb 2010, Andreas Mohr wrote: > Which compiler version are you using ? > > Can you please provide the disassembly of kernel/time/timekeeping.o ? Is that NMI watchdog hit fully reproducible ? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on 8 Feb 2010 16:20
On Mon, 8 Feb 2010, Andreas Mohr wrote: > On Mon, Feb 08, 2010 at 09:51:05PM +0100, Andreas Mohr wrote: > > Looks like it: > > - another bootup also had lockup message > > - all /var/log/dmesg* have lockup message, oldest is: > > 2010-02-07 20:00 dmesg.4.gz > > > > Linux version 2.6.33-rc6 (root(a)note) (gcc version 4.3.4 (Debian 4.3.4-6)) #3 Sun Jan 31 23:47:51 CET 2010 > > -rc4 and 2.6.32.3 don't show lockup message, instant bootup without any > visible delay. > > Don't tell me I'm now supposed to try -rc5 and also rebuild -rc6 ;) > If so, do tell early so that I have lots of time to get it built... Well, we better know, after which point that problem manifested itself. A bisect would be optimal. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |