Prev: driver for mcs7830 (aka DeLOCK) USB ethernet adapter
Next: [PATCH] [MTD] CHIPS: Support for SST 49LF040B flash chip
From: Dave Jones on 3 Oct 2006 12:50 On Tue, Oct 03, 2006 at 02:40:30AM -0400, Dave Jones wrote: > > > > ----------- [cut here ] --------- [please bite here ] --------- > > > > Kernel BUG at fs/buffer.c:2791 > > > > > > I had thought/hoped that this was fixed by Jan's patch at > > > http://lkml.org/lkml/2006/9/7/236 from the thread started at > > > http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug > > > first by going through that new codepath.... > > > > Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, > > Badari was hitting that BUG and was able to confirm that Jan's patch > > (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed > > it. > > Ok, this afternoon I was definitly running a kernel with that patch in it, > and managed to get a trace (It was the one from the top of this thread > that unfortunatly got truncated). > > Now, I can't reproduce it on a plain 2.6.18+that patch. > I'll leave the stress test running overnight, and see if anything > falls out in the morning. Been chugging away for 10 hrs now without repeating that incident. Hmm. That patch looks like good -stable material. I'll keep digging to see if I can somehow reproduce the problem I saw with the patch applied, but in absense of something better, I think we should go with it. One thing that did happen in the 10hrs was fsx-over-NFS spewed some nasty looking trace. I'll post that separately next. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on 9 Oct 2006 15:50 Andrew Morton wrote: > On Tue, 03 Oct 2006 00:43:01 -0500 > Eric Sandeen <sandeen(a)sandeen.net> wrote: > >> Dave Jones wrote: >> >>> So I managed to reproduce it with an 'fsx foo' and a >>> 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from >>> a vanilla 2.6.18 with none of the Fedora patches.. >>> >>> I'll give 2.6.18-git a try next. >>> >>> Dave >>> >>> ----------- [cut here ] --------- [please bite here ] --------- >>> Kernel BUG at fs/buffer.c:2791 >> I had thought/hoped that this was fixed by Jan's patch at >> http://lkml.org/lkml/2006/9/7/236 from the thread started at >> http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug >> first by going through that new codepath.... > > Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, > Badari was hitting that BUG and was able to confirm that Jan's patch > (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed > it. Looking at some BH traces*, it appears that what Dave hit is a truncate racing with a sync... truncate ... ext3_invalidate_page journal_invalidatepage journal_unmap buffer going off at the same time as sync ... journal_dirty_data sync_dirty_buffer submit_bh <-- finds unmapped buffer, boom. I'm not sure what should be coordinating this, and I'm not sure why we've not yet seen it on a stock kernel, but only FC6... I haven't found anything in FC6 that looks like it may affect this. -Eric *http://people.redhat.com/esandeen/traces/davej_ext3_oops1.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on 9 Oct 2006 16:10 Eric Sandeen wrote: >>> I had thought/hoped that this was fixed by Jan's patch at >>> http://lkml.org/lkml/2006/9/7/236 from the thread started at >>> http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug >>> first by going through that new codepath.... >> Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, >> Badari was hitting that BUG and was able to confirm that Jan's patch >> (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed >> it. > > Looking at some BH traces*, it appears that what Dave hit is a truncate > racing with a sync... (oh btw this is -with the above patch from Jan in place...) -Eric > truncate ... > ext3_invalidate_page > journal_invalidatepage > journal_unmap buffer > > going off at the same time as > > sync ... > journal_dirty_data > sync_dirty_buffer > submit_bh <-- finds unmapped buffer, boom. > > I'm not sure what should be coordinating this, and I'm not sure why > we've not yet seen it on a stock kernel, but only FC6... I haven't found > anything in FC6 that looks like it may affect this. > > -Eric > > *http://people.redhat.com/esandeen/traces/davej_ext3_oops1.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Badari Pulavarty on 9 Oct 2006 18:10 On Mon, 2006-10-09 at 14:46 -0500, Eric Sandeen wrote: > Andrew Morton wrote: > > On Tue, 03 Oct 2006 00:43:01 -0500 > > Eric Sandeen <sandeen(a)sandeen.net> wrote: > > > >> Dave Jones wrote: > >> > >>> So I managed to reproduce it with an 'fsx foo' and a > >>> 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from > >>> a vanilla 2.6.18 with none of the Fedora patches.. > >>> > >>> I'll give 2.6.18-git a try next. > >>> > >>> Dave > >>> > >>> ----------- [cut here ] --------- [please bite here ] --------- > >>> Kernel BUG at fs/buffer.c:2791 > >> I had thought/hoped that this was fixed by Jan's patch at > >> http://lkml.org/lkml/2006/9/7/236 from the thread started at > >> http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug > >> first by going through that new codepath.... > > > > Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, > > Badari was hitting that BUG and was able to confirm that Jan's patch > > (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed > > it. > > Looking at some BH traces*, it appears that what Dave hit is a truncate > racing with a sync... > > truncate ... > ext3_invalidate_page > journal_invalidatepage > journal_unmap buffer > > going off at the same time as > > sync ... > journal_dirty_data > sync_dirty_buffer > submit_bh <-- finds unmapped buffer, boom. > I don't understand how this can happen .. journal_unmap_buffer() zapping the buffer since its not attached to any transaction. journal_unmap_buffer():[fs/jbd/transaction.c:1789] not on any transaction: zap b_state:0x10402f b_jlist:BJ_None cpu:0 b_count:3 b_blocknr:52735707 b_jbd:1 b_frozen_data:0000000000000000 b_committed_data:0000000000000000 b_transaction:0 b_next_transaction:0 b_cp_transaction:0 b_trans_is_running:0 b_trans_is_comitting:0 b_jcount:2 pg_dirty:1 journal_dirty_data() would do submit_bh() ONLY if its part of the older transaction. I need to take a closer look to understand the race. BTW, is this 1k or 2k filesystem ? How easy is to reproduce the problem ? Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jan-Benedict Glaw on 9 Oct 2006 18:50
On Mon, 2006-10-09 14:46:30 -0500, Eric Sandeen <sandeen(a)sandeen.net> wrote: > Andrew Morton wrote: > > On Tue, 03 Oct 2006 00:43:01 -0500 > > Eric Sandeen <sandeen(a)sandeen.net> wrote: > > > Dave Jones wrote: > > > > So I managed to reproduce it with an 'fsx foo' and a > > > > 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from > > > > a vanilla 2.6.18 with none of the Fedora patches.. > > > > > > > > I'll give 2.6.18-git a try next. > > > > > > > > ----------- [cut here ] --------- [please bite here ] --------- > > > > Kernel BUG at fs/buffer.c:2791 > > > I had thought/hoped that this was fixed by Jan's patch at > > > http://lkml.org/lkml/2006/9/7/236 from the thread started at > > > http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug > > > first by going through that new codepath.... > > > > Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, > > Badari was hitting that BUG and was able to confirm that Jan's patch > > (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed > > it. > > Looking at some BH traces*, it appears that what Dave hit is a truncate > racing with a sync... > > truncate ... > ext3_invalidate_page > journal_invalidatepage > journal_unmap buffer > > going off at the same time as > > sync ... > journal_dirty_data > sync_dirty_buffer > submit_bh <-- finds unmapped buffer, boom. Is this possibly related to the issues that are discussed in another thread? We're seeing problems while unlinking large files (usually get it within some hours with 200MB files, but couldn't yet reproduce it with 20MB.) MfG, JBG -- Jan-Benedict Glaw jbglaw(a)lug-owl.de +49-172-7608481 Signature of: Alles wird gut! ...und heute wirds schon ein biÃ?chen besser. the second : |