Prev: percent encoding end decoding
Next: (tar -cf - /etc|gzip; dd if=/dev/zero count=1)...|rsh foo ddof=/dev/st0
From: Seebs on 23 Jan 2010 22:43 On 2010-01-24, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote: > You can do all that with the upthread mentioned globbing mechanisms in > Kornshell, in bash (with extended globbing), and I think in zsh as well; > use these constructs respectively: *(...) ?([ ]) ?(...) I'm not sure that even ksh can do everything posix REs can. > Your point seems to be that it's not possible in bourne shell and older > bash'es, and it's supposedly not defined in POSIX. Granted. Yeah. Which is to say, the standard shell glob mechanisms lack key components of what makes regexes what they are. I now also realize that you're probably making a point to do with the existance of the term "regular expression" both as a general term for a linguistic category, as well as the name for the POSIX pattern-matching used by awk/sed, etcetera. In the context of shell programming, usually "regular expression" is used to refer explicitly to the specific set of closely-related regular expression languages used by sed/awk, and the shell pattern list is not such a list. Since one occasionally encounters people who actually think that shell globs and "regular expressions" are the same thing, I didn't pick that up. -s -- Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam(a)seebs.net http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
From: Kaz Kylheku on 23 Jan 2010 22:48 On 2010-01-24, Sven Mascheck <mascheck(a)email.invalid> wrote: > PS: why are they characteristically different: > The motivation for globbing was *intuitive* handling of file names > - sometimes overlooked but important: globbing uses implicit anchors. In this regard, globs are faithful to the concept of a regular expression. Mathematical regular expressions do not search; they test set membership. A regex describes a finite automaton which must accept the given set of strings from beginning to end. I.e. for any string w in the language L(R) of the regular expression R, that string is accepted by the corresponding automaton, which means that when after the characters of w are fed to that automaton, it is in an acceptance state. A glob used on the command line does exactly this: does the filename, taken as a complete string, belong to the set of filenames described by the expression. The ability to search for a matching substring is an extended application of regular expressions. It's not what makes them regular expressions. Moreover, glob patterns /are/ in fact employed in a ``de anchored'' searching situation. Namely, the ${VAR%pattern} expansion syntax, in all its variations. If FOO contains "xyzabc" then ${VAR%a*} will trim off the "abc" part, yielding "xyz". Clearly, the "a" is not anchored.
From: Seebs on 23 Jan 2010 23:35 On 2010-01-24, Kaz Kylheku <kkylheku(a)gmail.com> wrote: > Moreover, glob patterns /are/ in fact employed in a ``de anchored'' > searching situation. Namely, the ${VAR%pattern} expansion syntax, in > all its variations. If FOO contains "xyzabc" then ${VAR%a*} will trim > off the "abc" part, yielding "xyz". Clearly, the "a" is not anchored. Actually, it is -- that's why you need a * after it. %foo is anchored at the end, #foo at the beginning. But yeah, globs can be used without anchoring, and regexes can be used anchored -- expr regexes, as I recall, are anchored on the left... -s -- Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam(a)seebs.net http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
From: Janis Papanagnou on 24 Jan 2010 00:11 Seebs wrote: > > I'm not sure that even ksh can do everything posix REs can. I am confident and quite sure it does. Vice versa; I think the regexp library will at least have problems emulating ksh's !(...) construct. Ever tried? In general you'll get extremely bulky results here! But the class of languages (regular expressions) is the same, anyway.[*] [*] N.B. Newer ksh's also support back-references in their expressions, so strictly speeking, with that feature, they exceed the Chomsky-3 grammar class as well (analogous to other libraries with backreference extensions). > > I now also realize that you're probably making a point to do with the > existance of the term "regular expression" both as a general term for > a linguistic category, as well as the name for the POSIX pattern-matching > used by awk/sed, etcetera. Right. Not only the regexp library that you mention here; the (extended) globbing as well. Both can be categorized under that term. In other words; in Unix context the regexp library "demands" (pars pro toto) being the sole real regular expression parser, but that's not justified in the context of the existing (extended) globbing. Janis
From: Seebs on 24 Jan 2010 01:01
On 2010-01-24, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote: > Not only the regexp library that you mention here; the (extended) globbing > as well. Both can be categorized under that term. In other words; in Unix > context the regexp library "demands" (pars pro toto) being the sole real > regular expression parser, but that's not justified in the context of the > existing (extended) globbing. Mostly, it's that "regexp" doesn't really mean "the formal computer science term regular expression" but "this particular set of closely related instances of that term". -s -- Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam(a)seebs.net http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated! |