From: William Ahern on 20 Dec 2009 03:07 David Combs <dkcombs(a)panix.com> wrote: <snip> > Trust me, friend -- regular expressions are what make unix/linux > so useful. And no, they're not all that trivially simple, > but man are they POWERFUL. One line can transform stuff > that would take a complicated MULTI-line program NOT using them. Regular expressions are very useful, but also very limited. For one thing, they can only represent regular languages, which means among other things that you can't express nested structures. The greediness and lookbehind operators do help, but if you look in the appendix of Mastering Regular Expressions you'll find a regex which can parse any valid e-mail address; it's two pages long! (At least, that's how I remember, but I haven't opened that book in over 5 years.) Perl 6 will come with something called Parsing Expression Grammars, which are much more powerful. (Though Perl 6 didn't invent them.) I think this will probably be the future, but it will obviously takes many years for the rest of the world to catch up. Lua currently has one of the better implementations, in terms of language integration. For C I use Ragel for regular expressions. In Ragel you can handled nested structures--and many other issues--easily because it let's you jump to different [state] machines explicitly, and allows the use of a state stack. Using Ragel I've discovered ambiguities in several RFC ABNF specifications which are silently papered over by most common regular expression engines. The big problem with regex's is that people just slop them together, and never notice the bugs. They have been, and will continue to be, one of the major sources of bugs and security issues. Sometimes they just get used too much. For instance, the following Perl basename implementation reads much better to me than any regex would: #!/usr/bin/env perl print STDOUT (split "/", shift)[-1], "\n"
From: Janis Papanagnou on 20 Dec 2009 04:12 Chris F.A. Johnson wrote: > On 2009-12-20, David Combs wrote: >> [...] > > I very rarely use anything more than very simple regular > expressions. Complex REs are more trouble than they're worth, > especially when they need to be modified. In some languages (awk, for example, where you can compose them in strings[*]) you can define them in a way that looks similar to a quite good readable BNF notation. Being able to compose them that way and reuse all parts in many places of the regexp definitions, reduces a lot of their complexity and crypticality and makes them a pleasure to use. Janis [*] The usual caveats apply.
From: Janis Papanagnou on 20 Dec 2009 04:16 William Ahern wrote: > David Combs <dkcombs(a)panix.com> wrote: > <snip> >> [...] > > Regular expressions are very useful, but also very limited. For one thing, > they can only represent regular languages, which means among other things > that you can't express nested structures. And back-references, to name another prominent example, which also do not belong to the class of regular languages, but are nonetheless added to some programming languages and libraries. > [...] > > For C I use Ragel for regular expressions. In Ragel you can handled nested > structures--and many other issues--easily because it let's you jump to > different [state] machines explicitly, and allows the use of a state stack. Thanks for that useful hint. Janis > [...]
First
|
Prev
|
Pages: 1 2 3 4 5 Prev: Nail mailrc save/write 'no active mailbox' Next: ls with creation time |