From: Jonathan Thornburg -- remove -animal to reply on 26 Sep 2006 08:24 I wrote: >> In mathematics and physics quantities are *always* case-sensitive. >> That is, 'g' and 'G' are *always* distinct. Jan Vorbr?ggen <jvorbrueggen(a)not-mediasec.de> replied: > Quite. But would you extend this to making "thisisanimportantconstant" > and "thisIsAnImportantConstant" and "thisisanImportantConstant" distinct? > These are the cases that cause the problems. Yes, IMHO these should be distinct files at the OS level. Of course, if some _application_ (or suite of applications) wants to canonicalize all of these strings before forming a filename, that's fine. I just don't think the {filesystem,OS} should be in the business of imposing such a semantics on *all* applictions. I asked: >> Eg >> what happens if a backup from a system which allows creation of the >> distinct files /some/where/g.h5 and /some/where/G.h5 gets restored on >> a system which thinks those are two distinct names for the same file? > > IMO, that's a fundamental incompatibility that prohibits interoperability. > >> The fundamental problem is that different {users,applications} may >> have different ideas of how case should be handled... > > But they shouldn't, because they are all human beings that are subject to > the same (within some reasonable definition of "same") cognitive abilities > and, more importantly, disabilitities. Strange, 30 years of professional work in math, computing science, and physics, I've yet to mean anyone who had trouble distinguishing between 'r' and 'R' in an equation. And in my current work (numerical simulations in general relativity), I've yet to meet anyone who has trouble distinguishing (say) lower-case Greek gamma ($\gamma$ in TeX) and upper-case greek gamma ($\Gamma$) in TeX, even though our most common equation system contains both of these. ($\gamma$ is the spatial 3-metric $\Gamma$ is the spatial Christoffel symbols.) Admittedly, equations usually use reasonably short identifiers. Things aren't quite so pretty for your example of "thisisanimportantconstant" "thisIsAnImportantConstant" "thisisanImportantConstant" So where do you want to draw the threshold? By identifier length? By Hamming distance? Weighted by ordinal-position-in-identifier? I really, _really_ don't think any one-size-fits-all policy is going to be suitable for everyone here. This should be left to applications; a general-purpose filesystem should provide a clean primitive (filenames are uninterpreted byte [or some larger alphabet if that seems appropriate; I don't want to get into the i18n tarpit here] strings) and leave the rest to higher-level software. > But they shouldn't, because they are all human beings that are subject to > the same (within some reasonable definition of "same") cognitive abilities > and, more importantly, disabilitities. Since when are file names generated *only* by humans? Lots and lots of software generates file names... and software doesn't have any problems distinguishing 'r' from 'R', or even "thisisanimportantconstant" "thisIsAnImportantConstant" "thisisanImportantConstant" > The A320 crash at Strasbourg occured, in the final account, because in the > display for the descent rate it made a difference whether it showed "3" or > "3.". Nobody in the cockpit noticed this, and the crew likely didn't even > know what the presence or absence of the "." meant. That's just bad design > - as is allowing the case of letters in a filename to distinguish files. Of course, lousy GUI design is lousy GUI design. (And having the decimal point be small and not backlit in the LCD display didn't help, either!) But if we follow your line of reasoning, then we should design our {file systems, OSs, programming languages, etc} to always treat the strings "3" and "3." as being the same critter. Ick. We have several decades of experience with programming languages in which 'int' and 'floating point' are the same data type (APL and Perl come to mind), and also several decades of experience with programming langugaes in which these are distinct data types (eg the entire Algol-derived family, the entire Fortran family), and each has advantages and disadvantages. I will observe that almost everyone doing serious floating-point arithmetic has "voted with their feet" for software environments where "3" and "3." do indeed have different meanings. The nice thing about lower-level software *not* making this sort of decision is that it leaves the field free for higher-level software to experiment, and do what makes sense in a particular situation. As a rule of thumb, one size does *not* fit all! ciao, -- -- "Jonathan Thornburg -- remove -animal to reply" <jthorn(a)aei.mpg-zebra.de> Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut), Golm, Germany, "Old Europe" http://www.aei.mpg.de/~jthorn/home.html "Washing one's hands of the conflict between the powerful and the powerless means to side with the powerful, not to be neutral." -- quote by Freire / poster by Oxfam
From: Andrew Reilly on 26 Sep 2006 08:45 On Tue, 26 Sep 2006 04:43:13 -0500, Rob Warnock wrote: > Andrew Reilly <andrew-newspost(a)areilly.bpc-users.org> wrote: > +--------------- > | Aren't file extensions exclusively a file-type hint, rather than a > | preferred application hint? What are some common file extensions that can > | be used for different types of file? I can open .doc files with four or > | five different applications, these days (with varying degress of success, > | admittedly). > +--------------- > > But they're *only* a "hint", since they're often ambiguous. > > For example, long before MS Windows existed, ".DOC" was used used on > the PDP-10 (and elsewhere) to indicate that a file was "documentation", > that is, human-readable plaintext. Even today on Unix/Linux there are > several software packages that use the same convention -- that ".doc" > is human-readable plaintext, *not* MS Word format. Case in point: On my > FreeBSD laptop, of 226 files named "*.doc", 147 are plain ASCII text, > 30 are directories(!), and only 46 are Microsoft format files. > > And even when ".doc" *does* mean MS Word or Office format, which > *version*?!? There have been several compatibility breaks over > the years. Maybe common practice hints that "hint" is in fact the best answer. Who wants to have to maintain the dictionary of arbitrary distinctions introduced by version histories and platform variations that a strict mechanical (mathematical) "file type" indicator would require? Do you really want to fire up Word97-pc-release.1 when that's what created a specific .doc file? [Mind you, there's a better chance of that happening with something like the Unix "magic" system than some sort of manually ascribed file type system, IMO, maintenance nightmare though that obviously is.] Cheers, -- Andrew
From: Jan Vorbrüggen on 26 Sep 2006 09:21 > Strange, 30 years of professional work in math, computing science, > and physics, I've yet to mean anyone who had trouble distinguishing > between 'r' and 'R' in an equation. Strange - with similar experience, I definitely have. Oh, not after you've pointed it out - but it is a potential source for confusion. Yes, even in the one-letter case. > Admittedly, equations usually use reasonably short identifiers. > Things aren't quite so pretty for your example of > "thisisanimportantconstant" > "thisIsAnImportantConstant" > "thisisanImportantConstant" > So where do you want to draw the threshold? By identifier length? > By Hamming distance? Weighted by ordinal-position-in-identifier? As you can't, in a canonical way, you need to do away with the distinction for all lengths. > Since when are file names generated *only* by humans? Lots and lots > of software generates file names... and software doesn't have any problems > distinguishing 'r' from 'R', or even That's a strawman, and you know it. > But if we follow your line of reasoning, then we should design our > {file systems, OSs, programming languages, etc} to always treat the > strings "3" and "3." as being the same critter. Nope, that doesn't follow at all. What follows is that if "3" and "3." are different things, they should be displayed in such a way that the distinction is immediately visually apparent. This is similar to European preference for writing "0.1" instead of the American ".1". > The nice thing about lower-level software *not* making this sort of > decision is that it leaves the field free for higher-level software > to experiment, and do what makes sense in a particular situation. > As a rule of thumb, one size does *not* fit all! Unfortunately, experience has shown that if blade guards are not enforced, they are not used. And at least in Europe, it is illegal to sell dangerous equipment without blade guards. Jan
From: "Peter "Firefly" Lund" on 26 Sep 2006 12:43 On Mon, 25 Sep 2006, [ISO-8859-1] Jan Vorbr?ggen wrote: >> Yes. There's also annoying things like ligatures and diacritics. And >> perhaps many different codepoints that (more or less) share a glyph. > > How are those in any way relevant? Change the H to an A, then. -Peter
From: Bill Todd on 26 Sep 2006 12:46
Terje Mathisen wrote: > Bill Todd wrote: .... As for how metadata is presented to the outside world, bundles >> (which sound similar to what you may mean by 'meta-data forks') seem >> like one good option. > > If we must have this, then I would strongly prefer to have them visible > and accessible as virtual directory structures: > > I.e. attribute "creator" of file "foo" could be read by > > 'cat foo/creator' My very limited acquaintance with 'bundles' suggests that this is what they do. > > or possibly > > 'cat foo.meta/creator' And that's more like the ReiserFS V4 approach (IIRC they reserve the single subdirectory name 'metas' for this purpose, introducing another - syntactical - path element which does not in fact perform another actual disk look-up and in so doing eliminating all other potential naming collisions with *real* subdirectory names). Appending .meta to the file name would introduce path look-up ambiguity unless that ending was otherwise reserved. > > Allowing regular file/directory operations to create/read/write/modify > these attribute streams seems like the obviously Right Thing (tm) to do. > > It also has the great advantage of being almost transparently portable > to any hierarchical filesystem, modulo performance. Exactly: as long as the normal data 'stream' appears as its own lower-level entry rather than being syntactically associated with the 'container' parent directory, the entire structure can be represented - and accessed - in a conventional implementation, albeit without the performance optimizations available in an implementation which better understands the grouping. There are, however, remaining issues with the system-managed attributes, which in the conventional file system instance need to continue to be associated with the specific objects which they control. So while in bundle-aware implementations they can be accessed syntactically just as the other metadata elements are, in bundle-ignorant implementations they likely don't appear as separate objects at all but are managed according to the local idiom - and thus while they *appear* to be handled the same as the application-level metadata in the bundle-aware implementation, they may in fact be implemented quite differently in a way that's more easily transported to conventional environments and back again. Or, one could explore the approach of retaining traditional behavior where porting that metadata back and forth between new and traditional systems is awkward, and only treat the extended metadata that traditional systems don't already support specially - normalizing the application interface across both environments at the expense of normalizing the access mechanisms for *all* metadata in the new environments only. Either way, the resulting application interface (when considered across both old and new environments) is somewhat kludgier than it would have been had all this been designed in from the start, but may be 'good enough' to be useful. - bill |