Prev: ARM i.MX dma: implement wrapper for dma functions
Next: RTC: LPC32xx: Introduce RTC driver for the LPC32xx
From: Neil Brown on 9 Aug 2010 19:00 On Sun, 8 Aug 2010 11:52:31 -0400 Valerie Aurora <vaurora(a)redhat.com> wrote: > +A union mount layers one read-write file system over one or more > +read-only file systems, with all writes going to the writable file > +system. The namespace of both file systems appears as a combined > +whole to userland, with files and directories on the writable file > +system covering up any files or directories with matching pathnames on > +the read-only file system. The read-write file system is the > +"topmost" or "upper" file system and the read-only file systems are > +the "lower" file systems. A few use cases: > + > +- Root file system on CD with writes saved to hard drive (LiveCD) > +- Multiple virtual machines with the same starting root file system > +- Cluster with NFS mounted root on clients > + > +Most if not all of these problems could be solved with a COW block > +device or a clustered file system (include NFS mounts). However, for > +some use cases, sharing is more efficient and better performing if > +done at the file system namespace level. COW block devices only > +increase their divergence as time goes on, and a fully coherent > +writable file system is unnecessary synchronization overhead if no > +other client needs to see the writes. Thanks for including lots of documentation! Given how intrusive this patch set is, I would really like the see the justification above fleshed out a bit more. What would be particularly valuable would be real-life use cases where someone has put this to work and found that it genuinely meets a need. I realise there can be a bit of a chicken/egg issue there, but if you do have anything it would be good to include it. A particular need for this is that fact that a number of standard features are not going to be supported and it would be good to be sure that there are real cases that don't need those. .... > +Non-features > +------------ > + > +Features we do not currently plan to support in union mounts: > + > +Online upgrade: E.g., installing software on a file system NFS > +exported to clients while the clients are still up and running. > +Allowing the read-only bottom layer of a union mount to change > +invalidates our locking strategy. I wonder if the restriction is not more serious than this. Given the prevalence of "copy-up", particularly of directories, I would think that even off-line upgrade would not be supported. If the upgrade adds a file in a directory that has already been read (and hence copied-up), or changes a file that has been chmodded, then the upgrade will not be completely visible, which sounds dangerous. Don't you have to require (or strongly recommend) that the underlying filesystem remain unchanged while the on-top filesystem exists, not just while it is mounted ?? As a counter-position for you or others to write cogent arguments against, and to then include those arguments in the justification section, I would like to present my preferred approach, which is essentially that the problem is better solved at the block layer or the distro layer. A distro-layer solution would be appropriate when you want a common root filesystem with per-host configuration, whether in an NFS cluster of a virtual-machine cluster. This involved every file that might need configuration being made a symlink to e.g. /local, and every instance mounts some local directory on /local. e.g. mount --bind /local-`hostname` /local This is obviously less transparent, but it is also more predictable (you know exactly what can and cannot be changed by an upgraded on the shared filesystem). A convincing use case that required NFS sharing and required signficantly more customisation that just some config file would be a good counter-argument to this. I see two block-layer solutions. The obvious is a COW block device as you have mentioned. I am not convinced that it is as bad as you think. Particularly if the COW device could advertise that it handles small 'discard' requests efficiently, and if filesystems could then send small discard requests whenever appropriate, the wastage due to divergence need not be too great. In any case, some hard numbers like "Performing a kernel compile on a COW device requires N meg of space while using a union-mounted filesystem it requires M ( << N) meg of space" would help a lot. (of course that is a silly test as we would use "make O=/somewhere/else", not COW or Union for that task). The second solution would be filesystem specific, and hence a good selling point of a new up-and-coming filesystem. If a filesystem was comfortable with data on multiple devices, and was able to copy-on-write files, then it should be relatively easy to give it a read-only device and a clean read-write device, and tell it write all changes only to the second device (and never update even the filesystem metadata on the first device). The filesystem could then make effective use of any space available in the second device, without wastage. > +Thank you for reading! Thank you for writing! NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: J. R. Okajima on 10 Aug 2010 22:10
Neil Brown: > I wonder if the restriction is not more serious than this. > Given the prevalence of "copy-up", particularly of directories, I would think > that even off-line upgrade would not be supported. > If the upgrade adds a file in a directory that has already been read (and > hence copied-up), or changes a file that has been chmodded, then the upgrade > will not be completely visible, which sounds dangerous. ::: > I see two block-layer solutions. The obvious is a COW block device as you > have mentioned. I am not convinced that it is as bad as you think. ::: DM snapshot provides the COW block feature and it will match your idea since the size of COW device is much smaller genearally. But it doesn't support off-line upgrade either. If you do, it is equivalent to corrupt filesystem for DM snapshot device. Here is pros/cons of DM snapshot comparing a union. - the number of bytes to be copied between devices is much smaller. - the type of filesystem must be one and only. - the fs must be writable, no readonly fs, even for the lower original device. so the compression fs will not be usable. but if we use loopback mount, we may address this issue. for instance, mount /cdrom/squashfs.img /sq losetup /sq/ext2.img losetup /somewhere/cow dmsetup "snapshot /dev/loop0 /dev/loop1 ..." - it will be difficult (or needs more operations) to extract the difference between the original device and COW. - DM snapshot-merge may help a lot when users try merging. in the fs-layer union, users will use rsync(1). - in fs-based union, users can add/remove members(layer) dynamicall without unmounting. of course, all files on the removing layer should not be busy. Also here is my concern about UnionMount. All these issues have been reported before. - for users, the inode number may change silently. eg. copy-up. - link(2) may break by copy-up. - read(2) may get an obsoleted filedata (fstat(2) too). - fcntl(F_SETLK) may be broken by copy-up. - unnecessary copy-up may happen, for example mmap(MAP_PRIVATE) after open(O_RDWR). J. R. Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |