From: Dave Jones on
Not sure what exactly happened here. Was running fsx on ext3 over 2 disk raid0,
and running a yum update. Box locked up solid after a few minutes..
http://www.codemonkey.org.uk/junk/DSC00747.JPG

The unwinder getting stuck meant I lost the top of the trace though.
(I have backporting the .19 fixes to .18 on my todo unless someone
beats me to it and they end up in -stable).

Will try to reproduce with a serial console hooked up.

Dave

--
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Jones on
<Cc'd Eric as he's been looking into this>

On Mon, Oct 02, 2006 at 03:47:11PM -0400, Dave Jones wrote:
> Not sure what exactly happened here. Was running fsx on ext3 over 2 disk raid0,
> and running a yum update. Box locked up solid after a few minutes..
> http://www.codemonkey.org.uk/junk/DSC00747.JPG
>
> The unwinder getting stuck meant I lost the top of the trace though.
> (I have backporting the .19 fixes to .18 on my todo unless someone
> beats me to it and they end up in -stable).
>
> Will try to reproduce with a serial console hooked up.

So I managed to reproduce it with an 'fsx foo' and a
'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from
a vanilla 2.6.18 with none of the Fedora patches..

I'll give 2.6.18-git a try next.

Dave

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/buffer.c:2791
invalid opcode: 0000 [1] SMP
CPU 1
Modules linked in: hidp l2cap bluetooth nfs lockd nfs_acl sunrpc ipv6 dm_mirror dm_mod video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq sr_mod snd_seq_device cdrom intel_rng snd_pcm_oss sg snd_mixer_oss snd_pcm shpchp floppy serio_raw pcspkr i2c_i801 ohci1394 ieee1394 snd_timer snd e1000 i2c_core soundcore snd_page_alloc sata_sil ahci libata sd_mod scsi_mod raid0 ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 408, comm: kjournald Not tainted 2.6.18 #1
RIP: 0010:[<ffffffff8021b425>] [<ffffffff8021b425>] submit_bh+0x29/0x124
RSP: 0018:ffff81007ebcbd40 EFLAGS: 00010246
RAX: 0000000000000005 RBX: ffff810063dd0ec8 RCX: 8000000000000000
RDX: ffff81007f1f5430 RSI: ffff810063dd0ec8 RDI: 0000000000000001
RBP: ffff81007ebcbd60 R08: 0000000000800000 R09: 0000000000000003
R10: ffff810068621510 R11: 0000000000000400 R12: ffff81007f7f48c8
R13: 0000000000000001 R14: 0000000000000033 R15: 0000000000000080
FS: 0000000000000000(0000) GS:ffff810037ff1cc0(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b76d5365000 CR3: 00000000658e3000 CR4: 00000000000006e0
Process kjournald (pid: 408, threadinfo ffff81007ebca000, task ffff810037f52040)
Stack: 0000000000000003 ffff810063dd0ec8 ffff81007f7f48c8 0000000000000003
ffff81007ebcbda0 ffffffff80217ca1 ffff81007aa572a0 ffff810063f8d400
ffff810064145478 ffff81007f1ea208 ffff81007aa572a0 0000000000000080
Call Trace:
[<ffffffff80217ca1>] ll_rw_block+0xa6/0xcd
[<ffffffff88035991>] :jbd:journal_commit_transaction+0x40b/0x10ce
[<ffffffff8803a033>] :jbd:kjournald+0xc7/0x222
[<ffffffff80236089>] kthread+0x100/0x136
[<ffffffff802624a0>] child_rip+0xa/0x12
DWARF2 unwinder stuck at child_rip+0xa/0x12
Leftover inexact backtrace:
[<ffffffff80268c22>] _spin_unlock_irq+0x2b/0x31
[<ffffffff80261adc>] restore_args+0x0/0x30
[<ffffffff80250ec3>] run_workqueue+0x19/0xfa
[<ffffffff80250ec3>] run_workqueue+0x19/0xfa
[<ffffffff80235f89>] kthread+0x0/0x136
[<ffffffff80262496>] child_rip+0x0/0x12


Code: 0f 0b 68 c8 86 49 80 c2 e7 0a 48 83 7b 38 00 75 0a 0f 0b 68
RIP [<ffffffff8021b425>] submit_bh+0x29/0x124
RSP <ffff81007ebcbd40>
<0>general protection fault: 0000 [2] SMP
CPU 1
Modules linked in: hidp l2cap bluetooth nfs lockd nfs_acl sunrpc ipv6 dm_mirror dm_mod video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq sr_mod snd_seq_device cdrom intel_rng snd_pcm_oss sg snd_mixer_oss snd_pcm shpchp floppy serio_raw pcspkr i2c_i801 ohci1394 ieee1394 snd_timer snd e1000 i2c_core soundcore snd_page_alloc sata_sil ahci libata sd_mod scsi_mod raid0 ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18 #1
RIP: 0010:[<ffffffff8028e1a1>] [<ffffffff8028e1a1>] task_rq_lock+0x2b/0x74
RSP: 0018:ffff810037e17df0 EFLAGS: 00010006
RAX: 6b6b6b6b6b6b6b6b RBX: ffffffff80aae480 RCX: ffff81007f1ea5a8
RDX: ffff81007ee71080 RSI: ffff810037e17e78 RDI: ffff810037f52040
RBP: ffff810037e17e10 R08: ffff810037e17eb0 R09: 0000000000000001
R10: 0000000000000001 R11: ffffffff8029995d R12: ffffffff80aae480
R13: ffff810037e17e78 R14: ffff810037f52040 R15: 0000000000000100
FS: 0000000000000000(0000) GS:ffff810037ff1cc0(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b76d535d000 CR3: 000000007e290000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff810037e10000, task ffff81007ee71080)
Stack: 000000000000000f ffff810037e08000 ffff810037f52040 ffffffff880398d9
ffff810037e17eb0 ffffffff80249e98 ffff810037e08000 000000007ee71080
ffffffff80268c22 0000000200000001 0000000000000000 0000000100000000
Call Trace:
[<ffffffff80249e98>] try_to_wake_up+0x27/0x421
[<ffffffff8028e3ce>] wake_up_process+0x10/0x12
[<ffffffff880398e2>] :jbd:commit_timeout+0x9/0xb
[<ffffffff80299a67>] run_timer_softirq+0x14c/0x1d5
[<ffffffff80212724>] __do_softirq+0x68/0xf5
[<ffffffff802627f8>] call_softirq+0x1c/0x28
DWARF2 unwinder stuck at call_softirq+0x1c/0x28
Leftover inexact backtrace:
<IRQ> [<ffffffff80270c65>] do_softirq+0x39/0x9f
[<ffffffff802962a3>] irq_exit+0x57/0x59
[<ffffffff8027b063>] smp_apic_timer_interrupt+0x5d/0x62
[<ffffffff8025b784>] mwait_idle+0x0/0x54
[<ffffffff8026216f>] apic_timer_interrupt+0x6b/0x70
<EOI> [<ffffffff80266026>] __sched_text_start+0xaa6/0xadd
[<ffffffff8025b7c3>] mwait_idle+0x3f/0x54
[<ffffffff8025b78d>] mwait_idle+0x9/0x54
[<ffffffff8024c916>] cpu_idle+0xa2/0xc5
[<ffffffff8027a674>] start_secondary+0x468/0x477


Code: 8b 40 18 48 8b 04 c5 c0 59 a6 80 4c 03 60 08 4c 89 e7 e8 5e
RIP [<ffffffff8028e1a1>] task_rq_lock+0x2b/0x74
RSP <ffff810037e17df0>
<3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():1, irqs_disabled():1

Call Trace:
[<ffffffff8026f956>] show_trace+0xae/0x336
[<ffffffff8026fbf3>] dump_stack+0x15/0x17
[<ffffffff8020bb01>] __might_sleep+0xb2/0xb4
[<ffffffff802a5160>] down_read+0x1d/0x4a
[<ffffffff8029cf62>] blocking_notifier_call_chain+0x1b/0x41
[<ffffffff80293e10>] profile_task_exit+0x15/0x17
[<ffffffff80215a74>] do_exit+0x25/0x96a
[<ffffffff8026fe21>] kernel_math_error+0x0/0x96
[<ffff810037e17d48>]
DWARF2 unwinder stuck at 0xffff810037e17d48
Leftover inexact backtrace:
<IRQ> [<ffffffff8026993f>] do_general_protection+0x10a/0x115
[<ffffffff8808bab2>] :scsi_mod:scsi_run_queue+0x1ab/0x1ba
[<ffffffff80
From: Eric Sandeen on
Dave Jones wrote:

> So I managed to reproduce it with an 'fsx foo' and a
> 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from
> a vanilla 2.6.18 with none of the Fedora patches..
>
> I'll give 2.6.18-git a try next.
>
> Dave
>
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at fs/buffer.c:2791

I had thought/hoped that this was fixed by Jan's patch at
http://lkml.org/lkml/2006/9/7/236 from the thread started at
http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug
first by going through that new codepath....

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Tue, 03 Oct 2006 00:43:01 -0500
Eric Sandeen <sandeen(a)sandeen.net> wrote:

> Dave Jones wrote:
>
> > So I managed to reproduce it with an 'fsx foo' and a
> > 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from
> > a vanilla 2.6.18 with none of the Fedora patches..
> >
> > I'll give 2.6.18-git a try next.
> >
> > Dave
> >
> > ----------- [cut here ] --------- [please bite here ] ---------
> > Kernel BUG at fs/buffer.c:2791
>
> I had thought/hoped that this was fixed by Jan's patch at
> http://lkml.org/lkml/2006/9/7/236 from the thread started at
> http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug
> first by going through that new codepath....

Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc,
Badari was hitting that BUG and was able to confirm that Jan's patch
(3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed
it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Jones on
On Mon, Oct 02, 2006 at 11:19:45PM -0700, Andrew Morton wrote:
> On Tue, 03 Oct 2006 00:43:01 -0500
> Eric Sandeen <sandeen(a)sandeen.net> wrote:
>
> > Dave Jones wrote:
> >
> > > So I managed to reproduce it with an 'fsx foo' and a
> > > 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from
> > > a vanilla 2.6.18 with none of the Fedora patches..
> > >
> > > I'll give 2.6.18-git a try next.
> > >
> > > Dave
> > >
> > > ----------- [cut here ] --------- [please bite here ] ---------
> > > Kernel BUG at fs/buffer.c:2791
> >
> > I had thought/hoped that this was fixed by Jan's patch at
> > http://lkml.org/lkml/2006/9/7/236 from the thread started at
> > http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug
> > first by going through that new codepath....
>
> Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc,
> Badari was hitting that BUG and was able to confirm that Jan's patch
> (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed
> it.

Ok, this afternoon I was definitly running a kernel with that patch in it,
and managed to get a trace (It was the one from the top of this thread
that unfortunatly got truncated).

Now, I can't reproduce it on a plain 2.6.18+that patch.
I'll leave the stress test running overnight, and see if anything
falls out in the morning.

Dave

--
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/