From: Terje Mathisen on 26 Sep 2006 03:45 Eric P. wrote: > Terje Mathisen wrote: >> Tarjei T. Jensen wrote: >>> Niels J?rgen Kruse wrote: >>>> File formats are usually compressed already, and you need to know the >>>> kind of content to get the best compression. >>> Sorry, they are not yet compressed. It does not mean that we should not >>> prepare the file system to handle that. >>> >>> We'll have to see whether future word processing and spreadsheet formats >>> are compressed enough natively. >> I'd be quite happy if just one single app would stop storing all it's >> data pessimally: >> >> Microsoft Powerpoint. >> >> Any jpeg images you include in a PPT presentation will be decompressed >> into a 32-bit RGB bitmap, and stored that way in the file. >> >> This holds even if you resize the source image to a small thumbnail in >> your presentation. >> >> This one app is responsible for 10-20% of _all_ file space on most of >> our file server volumes. :-( > > Ok, well maybe compression still has a place. Of course you realize > that some peoples ability to do dumbass things far exceeds others > ability to compensate by adding compression. Would the compression > algorithm the file system uses work well on 32 bit RGB bitmaps? No, not at all: You get at best something like 2:1 compression using zlib/zip or a similar lossless approach, while the jpeg->BMP decompression gave a 10:1 expansion. I.e. this particular problem can _only_ be fixed inside the application. I'm guessing that at one point in time, lost in obscurity by now, the PowerPoint team decided that "let's add the capability to import BMP images!", than a little bit later someone else said: "Now that we have BMP import, why don't we write/borrow/steal a set of file format conversion routines, so that we can also import other non-vector image formats?" By doing all conversions at the import stage, it didn't matter if a specific image format required a relatively costly decompression stage, it would still be displayed just as quickly as a regular BMP file! A few years later we got to the stage where a JPEG actually loads a _lot_ faster from disk than a BMP, simply because it is much faster to decompress the jpeg than to read a 10X larger file. :-( Terje -- - <Terje.Mathisen(a)hda.hydro.com> "almost all programming can be viewed as an exercise in caching"
From: Terje Mathisen on 26 Sep 2006 03:55 Andrew Reilly wrote: > Quite a lot of meta-data is stored within files, in application-specific > formats, now. ID3 title/artist tags or sample rates in MP3 files, "meta" > attributes in HTML files, author information in office documents. > Alternate language soundtracks in DVD movies, perhaps (not meta-data, but > "extra stream" information). > > How could this reasonably be subsumed by a file system, when the > information must travel with the file, by the definition of the file > format? Perhaps it is reasonable for a "file system" to expose abstract > meta-data methods that operate on different file types through > type-specific plug-ins that access (and modify?) the information in > format-specific ways. Is that really a win? Is it what you are thinking > about, or would such meta-information be duplicated from the file into > file-system meta-data forks? How much effort would you go to to ensure > consistency in that case? The examples you're using here are all more or less of the 'file system within a single file' order. Until the least common denominator of file systems include all this stuff, we'll still see the need for file formats that effectively works as a limited/application specific file system: tar, zip, doc and probably a bunch of others. Java jar files are afaik just zip files with a modified extension and one or two added conventions for naming/content. Terje -- - <Terje.Mathisen(a)hda.hydro.com> "almost all programming can be viewed as an exercise in caching"
From: Jean-Marc Bourguet on 26 Sep 2006 04:02 Jan Vorbr?ggen <jvorbrueggen(a)not-mediasec.de> writes: >>>> Which language do you want to be case-insensitive in? What if two >>>> users of the same file system disagree on the choice? >>> That is not a matter of language. Or is there a character encoding that >>> says for language A, "X" and "x" are a pair while for language B, "X" and >>> "y" are a pair? >> Yes, afaik: >> The German 'double-s' is two letters in uppercase and a single letter in >> lowercase. > > No, that's not what I meant. I asked whether there are languages that use the > same letters, but for which the mapping between upper- and lower-case is in- > compatible. Turkish has two I, one with a dot and one without. If in a Turkish locale you ask for the lowercase of I, you get the dotless i. Yours, -- Jean-Marc
From: Benny Amorsen on 26 Sep 2006 04:49 >>>>> "JV" == Jan Vorbrüggen <jvorbrueggen(a)not-mediasec.de> writes: BT> Out of curiosity, does anyone know of a good reason why file names BT> should *ever* be case-sensitive (aside from the fact that Unix BT> users and applications have become used to this)? >> Which language do you want to be case-insensitive in? What if two >> users of the same file system disagree on the choice? JV> That is not a matter of language. Or is there a character encoding JV> that says for language A, "X" and "x" are a pair while for JV> language B, "X" and "y" are a pair? There are certainly languages which say that two letters are considered the same apart from case, where another language considers them different letters. So you risk having names which conflict in one language, but do not conflict in another. One special case is Å which can be alternatively spelled Aa in Danish. A case-insensitive file system really ought to forbid having both Aalborg and Ålborg as file names in the same directory. As far as I know, no system has gone that far. (I wonder if any system even gets the sorting right: a is before b is before aa is the same as å). I have the collation problem with http://generals.dk, a site I run for a friend of mine. There is no good universal collation, so collations in several languages are wrong on that site. I suppose I could fix it so that the list for each country is correct at least, but there is no good way to sort a list of names from different countries. JV> Case-blind case-preserving is the only variant which is acceptable JV> from the point of view of ergonomics, IMNSHO. Put it in user space, not the file system. /Benny
From: Nick Maclaren on 26 Sep 2006 05:07
In article <4ns2u4FbqceeU2(a)individual.net>, ?ISO-8859-1?Q?Jan_Vorbr=FCggen?= <jvorbrueggen(a)not-mediasec.de> writes: |> |> >>None of any worth IMO. But case smashing to provide a case blind name |> >>space takes code, and would not fit into a PDP7/11 address space. |> |> > Nonsense. Keeping the case the user specified was a choice. |> > Case-squashing would be a very few instructions. |> |> I'm all for keeping the user's choice of case, but making it irrelevant |> on compare. Would that still be "a very few instructions", in your opinion? I have used systems that did just that. It is a negligible number of instructions, but is rather confusing - consider putting a list of names into sort or uniq - should the default be case sensitive or insensitive? Regards, Nick Maclaren. |