From: Kaz Kylheku on
On 2010-03-01, Ron Garret <rNOSPAMon(a)flownet.com> wrote:
> In article
><3bb8c3f7-ba67-4944-b236-e998684adb56(a)g19g2000yqe.googlegroups.com>,
> ccc31807 <cartercc(a)gmail.com> wrote:
>
>> On Feb 27, 3:08 pm, Ron Garret <rNOSPA...(a)flownet.com> wrote:
>> > It boggles my mind that the same people who
>> > complain about the aesthetics (or lack thereof) of parens in
>> > S-expressions will accept something like this:
>> >
>> > (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\
>> > x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e
>> > -\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*
>> > [a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|
>> > 2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e
>> > -\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
> ...
>> I read through the RE example above, and it would take a bit of time
>> to figure out exactly what it did
>
> It detects valid email addresses. I got it from this page:
>
> http://regexlib.com/DisplayPatterns.aspx

The above quoted example is a gross misuse of regexes, because a single
regex is applied to a problem which is more properly treated as a
scanning and parsing problem.

Regular expressions are intended for lexical scanning: decomposing input
into tokens. They fit into (and do not substitute for)
scanner-generator languages.

In a scanner generator language like lex, you can define macros
representing regex fragments, so you can write {id} instead of repeating
[A-Za-z_][A-Za-z0-9_]+ . Thus an ugly category of characters like
[a-z0-9!#$%&'*+/=?^_`{|}~-] can be given a meaningful name and invoked
by that name.

Moreover, in a scanner generator language, you would define a set of
tokens to be extracted, using parallel regular expressions. The scanner
determines which token is to be extracted using some heuristic, like
which of the expressions constitutes the longest match (with tie-breaker
resolution by which token rule occurs first).

The higher level syntax is recognized over the token categories, not
at the character level.

A misuse of regexes as a big hammer to strike any size nail does not
damn regexes.
From: Giovanni Gigante on


> do three completely different things, none of which involve computing a
> quotient. And this is just a single example of a long, long, long list
> of syntactic and semantic insanity.


True. I suspect that one may like perl in the same way as one can love
anything horribly complex enough in which he's proficient.

I am also wondering about another thing. If things like parenscript
exist, it should be possible to write a parenperl. Would it be useful?
Would it still be "perl"?
From: Giovanni Gigante on
Tim Bradshaw wrote:
>
> Like English

Btw, I've never understood the traditional "Larry is a linguist"
argument. I think that ambiguities are a source of richness and fun in
human language, but not in computer ones (for most definitions of "fun").

From: Giovanni Gigante on
Giovanni Gigante wrote:
>
> Would it still be "perl"?

Uhm.
This one sounds unexpectedly gavinesque. Please alert if that's the case.
From: ccc31807 on
On Mar 1, 6:15 pm, Tim Bradshaw <t...(a)tfeb.org> wrote:
> > Personally, I use something like /[\w.-]+@[\w]+\.[\w]{2,4}/

> People who do this sort of thing should just be prevented from
> pforramming.
....
>  It is emphatically not "good enough".

I guess you didn't read the post where I explained the particular
usage. This is to bread apart data read from a database, which might
contain a line like this:
"999-999-9999","email(a)addy.com","10/SP","ACAD"
where a telephone number contains at least 9 digits, an email address
contains 1 '@', a term contains 1 '/'. and a restriction contains all
UC alpha characters.

In this task, please explain to me why the RE isn't "good enough?" In
particular, why shouldn't this RE, /@/, be "good enough" to select an
email address?

CC.