From: Bill Todd on 24 Sep 2006 14:42 Terje Mathisen wrote: > So what are the features that good file system should have, besides > never silently dropping updates, Unless, of course, the user chooses this possibility for performance reasons (e.g., by allowing it to employ write-back caching). > and never allowing an inconsistent state? To be precise, never allowing a *visibly* inconsistent state: journal-protected file systems enter inconsistent states all the time, and may even be caught in one by a crash - they just repair them before anyone can notice, as fsck & friends could do if they could run fast enough. > > a) Extent-based block addressing, ideally with a background task which > cleans up fragmented files. Anton is right: that's an implementation detail, not a feature per se. > > b) inode-like separation between filenames and storage allocation. That is also an implementation detail, even if hard links must be supported. If the underlying storage is sufficiently robust its impact on corruption-survivability lessens a lot, and (as NTFS found out) keeping at least *some* per-file metadata directory-resident can be a performance win. > > c) Room in the directory structure to skip the inode completely for > single-linked files with a small (1-3?) number of extents. Another implementation detail which can become dangerous if the underlying storage is *not* sufficiently robust to protect directory access paths. > > Any others? A good file system should be reliable (uncompromising in its data integrity: what comes out should always be precisely what went in), available (robust in the face of hardware failure, even in single-disk environments when possible), securely sharable (across both processes and network nodes), fast, efficient in its use of resources (across the full range of file sizes from zero bytes on up), incrementally scalable (or shrinkable) in size (and performance) from MB to EB, inexpensive to purchase and use, common (and interoperable) across all operating systems of interest, and simple to use and manage (this last including management of whatever trade-offs may be necessary among these features). Some might suggest including additional features that may be difficult to incorporate at higher (application) levels with comparable efficiency and/or standardization, such as audit trails (from snapshots to 'continuous data protection'), transactional semantics across multiple application operations, and record-oriented extensions - though not to the point where the file system starts looking like a full-fledged database (since that tends to compromise speed, efficiency, and simplicity of use). That's a start, anyway: I'd be interested to hear what others think I've missed. - bill
From: Anton Ertl on 24 Sep 2006 15:26 Terje Mathisen <terje.mathisen(a)hda.hydro.com> writes: >So what are the features that good file system should have You might be interested in the 2006 Linux File Systems Workshop. There's a summary of the workshop at <http://lwn.net/Articles/190222/>. - anton -- M. Anton Ertl Some things have to be seen to be believed anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html
From: Tarjei T. Jensen on 24 Sep 2006 16:18 Terje Mathisen wrote: > So what are the features that good file system should have, besides never > silently dropping updates, and never allowing an inconsistent state? > > a) Extent-based block addressing, ideally with a background task which > cleans up fragmented files. > > b) inode-like separation between filenames and storage allocation. > > c) Room in the directory structure to skip the inode completely for > single-linked files with a small (1-3?) number of extents. > > Any others? I have some wishes: Arbitrary length file names and files. File names should not be case sensitive. Support for access control lists, etc. Space for metadata. An ability to tell the underlying hardware the name of the file system if such exists. That eases administration of the system. It should have a management API which allows software to monitor it. Ideally the file system should be able to defragment itself. The file system should be able to migrate from one device to another. Or that may be the job of the volume manger. I don't care; I want the feature. The file system should allow for transparent compression of individual files. There should be an API for reading compressed files without uncompressing. E.g. for backup. The file system should support copying of files and compressing the content when in transit. This is particularly useful for copying files over a network. Personally I don't care about encrypted files. The file system or volumen manager should support mirroring. This should be based on blocks. Technology should be able to work across an IP network. greetings,
From: Bill Todd on 24 Sep 2006 22:47 Tarjei T. Jensen wrote: .... > Arbitrary length file names A length of much more than, say, 64 KB could start to become an implementation challenge in any reasonable directory approach that I can think of: would such a limit satisfy you, and if not, why not? and files. File names should not be case > sensitive. Out of curiosity, does anyone know of a good reason why file names should *ever* be case-sensitive (aside from the fact that Unix users and applications have become used to this)? > > Support for access control lists, etc. Space for metadata. Perhaps I should have included the latter feature in my list: flexible standardized and/or ad hoc annotation capability is something that I consider important, but I was trying to keep the list items at a fairly high level. .... > The file system should be able to migrate from one device to another. Or > that may be the job of the volume manger. I don't care; I want the feature. What user need are you attempting to satisfy with that feature? It sounds like a work-around for some assumed underlying deficiency in the storage system. > > The file system should allow for transparent compression of individual > files. There should be an API for reading compressed files without > uncompressing. E.g. for backup. And without decrypting as well: good point. The file system should support copying of > files I suspect you mean support an explicit 'copy file' operation (along the lines of NT's) which will handle any ancillary information that may not be present in the single main data stream: this is desirable as a user aid and performance enhancement even for simple files, and especially for 'decorated' files (whether such decoration is ad hoc or standardized) and for files with an internal organization that does not allow efficient application-level copying at all (e.g., B+ trees where the interior nodes are managed by the system rather than allocated from the virtual space of a simple byte-stream file) - plus facilitates copy-on-write sharing by multiple file 'copies' of a single instance if the system supports that (which I likely should have included in my list of possible extensions). and compressing the content when in transit. This is particularly > useful for copying files over a network. If you are copying within the same file system, any presence of the network should be transparent. If you are copying the files somewhere else, you could use the 'read compressed' API that you described above if the file was already compressed and the remote end understood that form of compression (e.g., was another instance of the same kind of file system); otherwise, I'd suggest that compressing for network transmission is not the job of the local file system but rather should be a feature of the underlying network mechanisms that the CopyFile() file system operation uses. But it is indeed a gray area as soon as one introduces the idea of a CopyFile() operation (that clearly needs to include network copying to be of general use). The recent introduction of 'bundles' ('files' that are actually more like directories in terms of containing a hierarchical multitude of parts - considerably richer IIRC than IBM's old 'partitioned data sets') as a means of handling multi-'fork' and/or attribute-enriched files in a manner that simple file systems can at least store (though applications then need to understand that form of storage to handle it effectively) may be applicable here. Personally I don't care about > encrypted files. > > The file system or volumen manager should support mirroring. This should be > based on blocks. Again, that last sounds like an implementation detail aimed at satisfying some perceived user-level need: what is that need? > Technology should be able to work across an IP network. What technology? Mirroring? The latter is internal to the file system, so perhaps you are stating that distributed aspects of the file system in general should function over an IP network - possibly implying something about the degree of such distribution (e.g., WAN vs. LAN)? Conventional mirroring is inherently synchronous, which causes significant update-performance impacts as line latencies increase with distance. Asynchronous replication (e.g., at disaster-tolerant separations sufficient to satisfy even the truly paranoid - I probably should have included at least limited disaster-tolerance in my list of extensions) requires ordering guarantees for the material applied to the remote site that can become complex when the main site comprises multiple file system nodes executing concurrently. - bill
From: Terje Mathisen on 24 Sep 2006 14:24
Anton Ertl wrote: > Terje Mathisen <terje.mathisen(a)hda.hydro.com> writes: >> So what are the features that good file system should have, besides >> never silently dropping updates, and never allowing an inconsistent state? > > Well, there are different kinds of consistency. > > Many file systems people only care for meta-data consistency; as long > as the fsck passes, everything is fine. Who needs data, anyway? Ouch! > > On the other extreme there is fully synchronous operation of the file > system (so you don't even lose a second of work in case of a crash), > but this usually results in too-slow implementations. > > I like the one that I call in-order semantics > <http://www.complang.tuwien.ac.at/papers/czezatke&ertl00/#sect-in-order>: > > |The state of the file system after recovery represents all write()s > |(or other changes) that occurred before a specific point in time, and > |no write() (or other change) that occurred afterwards. I.e., at most > |you lose a minute or so of work. > > Unfortunately, AFAIK all widely-used file systems provide this > guarantee only in fully-synchronous mode, if at all. Isn't this why you use a log? A sequential log file can be updated quickly, storing enough info to follow your guidelines above? >> a) Extent-based block addressing, ideally with a background task which >> cleans up fragmented files. >> >> b) inode-like separation between filenames and storage allocation. >> >> c) Room in the directory structure to skip the inode completely for >> single-linked files with a small (1-3?) number of extents. > > Features a and c seem to be low-level implementation details to me, > rather than what I would call features. Feature b is an architectural (a) is based on the presumption that you want the filesystem to be fast, i.e. any mechanism which allows equally fast access to any given part of the file is OK with me. :-) > implementation choice, but apart from the availability of hard links > (which is not su useful in my experience) it is not a user-visible > feature, either. OK, soft links are OK for me, you just need to make sure that they can be totally transparent to OS users. I.e. Win* 'link' files really don't count. :-( > Fup-To: comp.arch Oops, I forgot to check this myself. Sorry! Terje -- - <Terje.Mathisen(a)hda.hydro.com> "almost all programming can be viewed as an exercise in caching" |