From: Bill Todd on 25 Sep 2006 19:22 Eric P. wrote: .... > I have never built a file system, but it seems to me that the problem > with file compression is that a write in the middle of the file > will be recompressed and can cause changes to the files' physical > block mappings and meta data structures. This in turn updates file > system block allocation tables and meta transaction logs. But since most large files aren't updated in the middle (they're typically only appended to if modified at all after creation), that's not necessarily a significant problem regardless of the implementation (in fact, one could simply incrementally uncompress any such awkward files as they were updated). > > With normal non-compressed files this only happens when the file is > extended. With compressed files every write operation can do this > and could bog down the whole system by hammering these critical > common data structures. That's part of what caching and journaling are for: if any such structures are truly being hammered they'll just remain in cache while multiple compact (logical) updates accumulate in the log which will eventually be written back to disk in a single bulk structure update. True, that still requires a synchronous log write if the update itself is synchronous (though if the relevant data had to be written to the log anyway, any metadata changes just piggyback on the same log write), but if the system even starts to get bogged down by this then additional operations accumulate while waiting for the previous log write to complete and get batched together in the next log write. I.e., the eventual limit is the sequential bandwidth of the log disks, which at, say, 40 MB/sec works out to tens of thousands of updates per second before any serious 'bogging down' occurs (and if that's not enough, you can stripe the log across multiple disks to increase the bandwidth even more). > > It also serializes all file operations while the meta data is being > diddled. Not at all. However until you read the current data, decompress, update > new data and recompress, you cannot tell whether the compressed > buffer will expand or contract, and what mapping changes are needed. > If the file meta structure is affected it forces all operations > to serialize unless you want to go for a concurrent b+tree update > mechanism which is probably an order of magnitude more complicated. Hey, if you're going to support really large files you'll almost *always* need *something* like a b+ tree to map them, and you'll need to allow concurrent operations on it. So there's nothing to be saved here: just bite the bullet and go for it. - bill
From: Andrew Reilly on 25 Sep 2006 19:57 On Mon, 25 Sep 2006 18:14:39 +0200, Terje Mathisen wrote: > Andrew Reilly wrote: >> On Mon, 25 Sep 2006 12:52:48 +0200, Terje Mathisen wrote: >> >>> For every N MB of contiguous disk space, use an extra MB to store ECC >>> info for the current block. The block size needs to be large enough that >>> a local soft spot which straddles two sectors cannot overwhelm the ECC >>> coding. >> >> Isn't that just the same as having the drive manufacturer use longer >> reed-solomon (forward error correcting) codes? Errors at that level are > > No, because you need _huge_ lengths to avoid the problem where an areas > of the disk is going bad. This really interferes with random access > benchmark numbers. :-( Are you sure? Seems to work OK for CDs. Of course the sizes are vastly different, and I admit to not having done the analysis to say how well it scales. ECC seems to be in the same redundancy-space as RS codes to me. >> something that can be dialed-in or out by the manufacturer. If it's >> too high for comfort, they'll start to lose sales, won't they? >> >> Alternative approach to ECC sectors: store files in a fountain code >> pattern? > > Pointers? > > OK, I found some papers, but they really didn't tell my how/why they > would be suited to disk sector recovery. :-( Sorry, I don't know any papers, it's just a concept that I heard about around the water cooler. It's used (or at least been suggested for use) in communication systems to solve the same sort of forward error correction problem that read-solomon codes address, but at lower space cost, and specifically for the situation of packetized signals. Given the duality between comms systems and storage systems, it ought to help with the latter, too, but it certainly would get in the way of random access, which I'd forgotten about. Big sequential-only files (or at least ones where you would reasonably expect to want to read the whole thing) might be able to benefit, though. -- Andrew
From: prep on 25 Sep 2006 20:14 Bill Todd <billtodd(a)metrocast.net> writes: > Out of curiosity, does anyone know of a good reason why file names > should *ever* be case-sensitive (aside from the fact that Unix users > and applications have become used to this)? None of any worth IMO. But case smashing to provide a case blind name space takes code, and would not fit into a PDP7/11 address space. -- Paul Repacholi 1 Crescent Rd., +61 (08) 9257-1001 Kalamunda. West Australia 6076 comp.os.vms,- The Older, Grumpier Slashdot Raw, Cooked or Well-done, it's all half baked. EPIC, The Architecture of the future, always has been, always will be.
From: Dennis Ritchie on 25 Sep 2006 23:25 <prep(a)prep.synonet.com> wrote in message news:87mz8ncwlj.fsf(a)k9.prep.synonet.com... > None of any worth IMO. But case smashing to provide a case blind name > space takes code, and would not fit into a PDP7/11 address space. Nonsense. Keeping the case the user specified was a choice. Case-squashing would be a very few instructions. Dennis
From: Stephen Fuld on 26 Sep 2006 00:04
"Terje Mathisen" <terje.mathisen(a)hda.hydro.com> wrote in message news:sngiu3-iee.ln1(a)osl016lin.hda.hydro.com... > Bill Todd wrote: >> Something else that you can't do with a linked-list allocation map like >> FAT's (unless you build another virtual address map on top of it). >> Compression (which you mentioned later) is similarly difficult with a >> file of any real size. > > So what are the features that good file system should have, besides never > silently dropping updates, and never allowing an inconsistent state? I'll chime in here. Several people have taken a passing shot at metadata, but I would like to discuss this further. I think a file system needs a consistant, easily accessable, extensible mechanism for setting, retrieving and modifying metadata/attributes. Currently file systems use at least four different methods, frequently within the same file system! 1. Overloading part of the file name (the extension) to indicate what program is the default to process this file and perhaps implicitly something about the file format. 2. Various bits of the directory entry (loosly defined) for such things as read only status, ownership, time of last update, etc. 3. Extra streams. 4. An entry in another file altogether in who knows what format. This is used by, for example some backup systems. etc. for telling where the backup copy is, etc. These are each accessed by a program using a different mechanism for each other and have different characteristics in terms of ease of getting/changing the data, etc. There should be a single mechanism for creating and reading all such data. There must be a way for users to be able to define their own new attributes that are accessed in the same way as the other ones. The metadata should be backed up with the data so it can be restored in the event of an error. The mechanisms should be easy enough to use that no one will want to use any other one. There should be utilities for listing the attributes/metadata for a file as well as changing it (with appropriate permission). Once you have the mechanism, we can have a profitable discussion of what those attributes should be. Note that many of the "wish list" items mentioned already are perfect things to be stored in this manner. -- - Stephen Fuld e-mail address disguised to prevent spam |