Prev: percent encoding end decoding
Next: (tar -cf - /etc|gzip; dd if=/dev/zero count=1)...|rsh foo ddof=/dev/st0
From: Sven Mascheck on 23 Jan 2010 21:02 Janis Papanagnou wrote: > [...] In Kornshell, just to name one example, you > have the @(...|...), *(...|...), +(...|...), and even the > powerful !(...|...) regular expression meta-constructs, in addition > to the more primitive * ? [...] [^...] regexps that can be used in > file globbing (i.e. regexps coupled to file object search), as well. > > Globbing is the use of a regular expression to select the subset > of matching files; i.e. regular expressions coupled to a concrete > set ob objects on the file system. In shell you can disable file > globbing and stay with the regular expressions alone, for example > in case statements or some implementation's if [[...]] constructs. It just doesn't make sense to use the term regular expression for globbing, because it's not called like that in documentation (even if there is no fixed term but some variations in practice like globbing, wildcards, pattern matching) Only few applications use globbing, e.g., shell, find, Debian dpkg. Regular expression implementations show even more variations in practice than globbing, but they are sufficiently different from globbing on unix, so that it doesn't make sense to mix terms. PS: why are they characteristically different: The motivation for globbing was *intuitive* handling of file names - sometimes overlooked but important: globbing uses implicit anchors. I believe globbing was simply "recycled" in the other places, that is, the case condition and pattern matching parameter expansion.
From: Seebs on 23 Jan 2010 21:21 On 2010-01-24, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote: > Seebs wrote: >> On 2010-01-23, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote: >>> Of course it does. File globbing uses regular expressions. >> No, it doesn't. File globbing uses shell patterns. > Yes, and "patterns are often described using regular expressions" [Wikipedia] But "shell patterns" (aka globs) are a special case, because the set of things they can represent is absolutely a TINY proper subset of what POSIX regular expressions can do. >> They're confusingly similar in a few ways, but quite different. > In which way different? (Beyond differences in usability that I mentioned > upthread.) It would be helpful to elaborate that beyond a simple statement. > Can you provide a standard grep(1) example that we cannot implement with > pattern matching capabilities (globbing without files) of a typical shell? Yes. Okay, quick summary: The key is that shell patterns have no grouping and no repetition operators. You can map RE '.' onto glob '?', and RE '.*' onto glob '*'. There is nothing you can write in shell glob that corresponds to 'a*', or even to '.?'. You can handle the anchoring/no-anchoring thing -- you can wrap a regex in '^$' or a pattern in '**'. But you can't make up for the lack of grouping and repetition. re: foo(bar)? glob: ... you can't do that. Note that case statements are closer, because you could do: foo | foobar ) -s -- Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam(a)seebs.net http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
From: Janis Papanagnou on 23 Jan 2010 22:02 Sven Mascheck wrote: > [...] > > Regular expression implementations show even more variations in > practice than globbing, but they are sufficiently different from > globbing on unix, so that it doesn't make sense to mix terms. The terminology is already foobar'ed. People now seem to associate the concepts of regular expressions (as defined by formal language theory) bound to the regexp library on Unix (BRE and ERE). And they call every extension of those functions still as regular expressions, whether they are or not. Usability extensions (like the \d and many others) are okay, but even backreferences are introduced in some tools and the libs and expressions are still called regular, even it they are not. OTOH, a language that is conforming to a Chomsky-3 grammar seems not to be recognized any more as such. It's probably worth to adapt to that fuzzy terminology from a practical point of view, but if you're viewing that from a formal language theory point of view it's not that clear any more what's the preferable way. > > PS: why are they characteristically different: > The motivation for globbing was *intuitive* handling of file names Certainly, because before System 7 (AFAIK) there was no built-in globbing in bourne shell, rather there was an external program for that expansion. > - sometimes overlooked but important: globbing uses implicit anchors. This detail is important (and well known to me) but doesn't change anything; you can convert unanchored regexp's to anchored "globbing-pattern" and vice versa. > I believe globbing was simply "recycled" in the other places, that is, > the case condition and pattern matching parameter expansion. Frankly, I've never used an old UNIX edition 6 bourne shell and don't know how the case statement worked at that time, or whether the case statement was existing at all. Sven, wasn't that you who had access to old bourne shells? You may want to inspect whether the case statement was there, and if so, what patterns and regular expression metacharacters it had suported. Janis
From: Ben Finney on 23 Jan 2010 22:01 Seebs <usenet-nospam(a)seebs.net> writes: > On 2010-01-24, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote: > > Seebs wrote: > >> On 2010-01-23, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote: > >>> Of course it does. File globbing uses regular expressions. > >> No, it doesn't. File globbing uses shell patterns. > > Yes, and "patterns are often described using regular expressions" > > [Wikipedia] > > But "shell patterns" (aka globs) are a special case, because the set > of things they can represent is absolutely a TINY proper subset of > what POSIX regular expressions can do. As the Posix specification says: Historically, pattern matching notation is related to, but slightly different from, the regular expression notation described in XBD Regular Expressions. <URL:http://www.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13> That “slightly different” is rather an understatement, when one looks at the severely minimal (compared to Posix regular expressions) set of pattern-matching operations that can be done with pathname globs. > Okay, quick summary: The key is that shell patterns have no grouping > and no repetition operators. You can map RE '.' onto glob '?', and RE > '.*' onto glob '*'. There is nothing you can write in shell glob that > corresponds to 'a*', or even to '.?'. You can handle the > anchoring/no-anchoring thing -- you can wrap a regex in '^$' or a > pattern in '**'. But you can't make up for the lack of grouping and > repetition. Some shells, of course, go outside the Posix standard and do provide such facilities for globs. > re: foo(bar)? > glob: ... you can't do that. For example, in Bash, the above could be written as the pathname glob 'foo{,bar}'. There's no such capability in Posix AFAIK, though. -- \ “Natural catastrophes are rare, but they come often enough. We | `\ need not force the hand of nature.” —Carl Sagan, _Cosmos_, 1980 | _o__) | Ben Finney
From: Janis Papanagnou on 23 Jan 2010 22:22
Seebs wrote: > On 2010-01-24, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote: >> Seebs wrote: >>> On 2010-01-23, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote: >>>> Of course it does. File globbing uses regular expressions. >>> No, it doesn't. File globbing uses shell patterns. > >> Yes, and "patterns are often described using regular expressions" [Wikipedia] > > But "shell patterns" (aka globs) are a special case, because the set of things > they can represent is absolutely a TINY proper subset of what POSIX regular > expressions can do. > >>> They're confusingly similar in a few ways, but quite different. > >> In which way different? (Beyond differences in usability that I mentioned >> upthread.) It would be helpful to elaborate that beyond a simple statement. >> Can you provide a standard grep(1) example that we cannot implement with >> pattern matching capabilities (globbing without files) of a typical shell? > > Yes. > > Okay, quick summary: The key is that shell patterns have no grouping and no > repetition operators. You can map RE '.' onto glob '?', and RE '.*' onto > glob '*'. There is nothing you can write in shell glob that corresponds > to 'a*', or even to '.?'. You can handle the anchoring/no-anchoring thing -- > you can wrap a regex in '^$' or a pattern in '**'. But you can't make up > for the lack of grouping and repetition. > > re: foo(bar)? > glob: ... you can't do that. > > Note that case statements are closer, because you could do: > foo | foobar ) You can do all that with the upthread mentioned globbing mechanisms in Kornshell, in bash (with extended globbing), and I think in zsh as well; use these constructs respectively: *(...) ?([ ]) ?(...) Your point seems to be that it's not possible in bourne shell and older bash'es, and it's supposedly not defined in POSIX. Granted. Janis > > -s |