Prev: Phylib polling when doing mdio_read will cause system response and transfer speed drop
Next: [Bug #15610] fsck leads to swapper - BUG: unable to handle kernel NULL pointer dereference & panic
From: Jörn Engel on 22 Apr 2010 02:00 On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote: > > Thanks, we definitely should have put a debug statement to catch this in > from day 1, good debugging should be an important part of any new > infrastructure. Woke up early and had another look at this. Looks like a much more widespread problem. Based on a quick grep an uncaffeinated brain: 9p no s_bdi afs no s_bdi ceph creates its own s_bdi cifs no s_bdi coda no s_bdi ecryptfs no s_bdi exofs no s_bdi fuse creates its own s_bdi? gfs2 creates its own s_bdi? jffs2 patch exists logfs fixed now ncpfs no s_bdi nfs creates its own s_bdi ocfs2 no s_bdi smbfs no s_bdi ubifs creates its own s_bdi I excluded all filesystems that appear to be read-only, block device based or lack any sort of backing store. So there is a chance I have missed some as well. Jörn -- Simplicity is prerequisite for reliability. -- Edsger W. Dijkstra -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jörn Engel on 22 Apr 2010 02:30 Linus, I think this is bad enough that you should be involved. 32a88aa1 broke a number of filesystems in a way that sync() would return 0 without doing any work. Even politicians are better at keeping the promises. This is caused by the two-liner in __sync_filesystem: if (!sb->s_bdi) return 0; s_bdi is set implicitly for all filesystems using set_bdev_super(), so most block device based filesystems are safe. There are, however, a number of odd-balls around: On Thu, 22 April 2010 07:54:48 +0200, Jörn Engel wrote: > > 9p no s_bdi > afs no s_bdi > ceph creates its own s_bdi > cifs no s_bdi > coda no s_bdi > ecryptfs no s_bdi > exofs no s_bdi > fuse creates its own s_bdi? > gfs2 creates its own s_bdi? > jffs2 patch exists > logfs fixed now > ncpfs no s_bdi > nfs creates its own s_bdi > ocfs2 no s_bdi > smbfs no s_bdi > ubifs creates its own s_bdi Obviously this list should get checked and all affected filesystems get repaired. Additionally we should add an assertion and BUG() or refuse to mount or something. My original patch to that extend was this: diff --git a/fs/super.c b/fs/super.c index f35ac60..e8af253 100644 --- a/fs/super.c +++ b/fs/super.c @@ -954,6 +954,8 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void if (error < 0) goto out_free_secdata; BUG_ON(!mnt->mnt_sb); + BUG_ON(!mnt->mnt_sb->s_bdi && + (mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd)); error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata); if (error) goto out_sb; The real problem is finding a condition that has neither false positives nor false negatives. The "(mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd)" part takes care of false positives like tmpfs, but it would catch none of the network filesystems. Should we instead annotate tmpfs and friends with something like sb->s_dont_need_bdi? It is the only way I can think of not to miss something. Jörn -- People will accept your ideas much more readily if you tell them that Benjamin Franklin said it first. -- unknown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on 22 Apr 2010 05:10 On Thu, Apr 22 2010, J�rn Engel wrote: > On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote: > > > > Thanks, we definitely should have put a debug statement to catch this in > > from day 1, good debugging should be an important part of any new > > infrastructure. > > Woke up early and had another look at this. Looks like a much more > widespread problem. Based on a quick grep an uncaffeinated brain: > > 9p no s_bdi > afs no s_bdi > ceph creates its own s_bdi > cifs no s_bdi > coda no s_bdi > ecryptfs no s_bdi > exofs no s_bdi > fuse creates its own s_bdi? > gfs2 creates its own s_bdi? > jffs2 patch exists > logfs fixed now > ncpfs no s_bdi > nfs creates its own s_bdi > ocfs2 no s_bdi > smbfs no s_bdi > ubifs creates its own s_bdi > > I excluded all filesystems that appear to be read-only, block device > based or lack any sort of backing store. So there is a chance I have > missed some as well. It's funky, I was pretty sure there was/is code to set a default bdi for non-bdev file systems. It appears to be missing, that's not good. So options include: - Add the appropriate per-sb bdi for these file systems (right fix), or - Pre-fill default_backing_dev_info as a fallback ->s_bdi to at least ensure that data gets flushed (quick fix) I'll slap together a set of fixes for this. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on 22 Apr 2010 06:50 On Thu, Apr 22 2010, Jens Axboe wrote: > On Thu, Apr 22 2010, J�rn Engel wrote: > > On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote: > > > > > > Thanks, we definitely should have put a debug statement to catch this in > > > from day 1, good debugging should be an important part of any new > > > infrastructure. > > > > Woke up early and had another look at this. Looks like a much more > > widespread problem. Based on a quick grep an uncaffeinated brain: > > > > 9p no s_bdi > > afs no s_bdi > > ceph creates its own s_bdi > > cifs no s_bdi > > coda no s_bdi > > ecryptfs no s_bdi > > exofs no s_bdi > > fuse creates its own s_bdi? > > gfs2 creates its own s_bdi? > > jffs2 patch exists > > logfs fixed now > > ncpfs no s_bdi > > nfs creates its own s_bdi > > ocfs2 no s_bdi > > smbfs no s_bdi > > ubifs creates its own s_bdi > > > > I excluded all filesystems that appear to be read-only, block device > > based or lack any sort of backing store. So there is a chance I have > > missed some as well. > > It's funky, I was pretty sure there was/is code to set a default bdi for > non-bdev file systems. It appears to be missing, that's not good. So > options include: > > - Add the appropriate per-sb bdi for these file systems (right fix), or > - Pre-fill default_backing_dev_info as a fallback ->s_bdi to at least > ensure that data gets flushed (quick fix) > > I'll slap together a set of fixes for this. Here's a series for fixing these. At this point they are totally untested except that I did compile them. Note that your analysis appeared correct for all cases but ocfs2, which does use get_sb_bdev() and hence gets ->s_bdi assigned. You can see them here, I'll post the series soon: http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus The first patch is a helper addition, the rest are per-fs fixups. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Woodhouse on 22 Apr 2010 07:00
On Thu, 2010-04-22 at 12:39 +0200, Jens Axboe wrote: > > Here's a series for fixing these. At this point they are totally > untested except that I did compile them. Note that your analysis > appeared correct for all cases but ocfs2, which does use get_sb_bdev() > and hence gets ->s_bdi assigned. > > You can see them here, I'll post the series soon: > > http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus > > The first patch is a helper addition, the rest are per-fs fixups. Do you want to include Jörn's addition of same to get_sb_mtd_set(), with my Acked-By: David Woodhouse <David.Woodhouse(a)intel.com> ? -- David Woodhouse Open Source Technology Centre David.Woodhouse(a)intel.com Intel Corporation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |