From: Zach Beane on
ccc31807 <cartercc(a)gmail.com> writes:

> On Mar 1, 1:46 pm, Zach Beane <x...(a)xach.com> wrote:
>> This is a terrible pattern for email addresses.
>>
>> What would you do if you found a false negative or false positive? Would
>> you patch it up to be good enough again (until the next problem), or
>> would you try to get it right?
>
> Depends on what your need is. If you are validating an HTML form
> submission, all you may care about is three groups of letters divided
> by an '@' and a '.'.

I'm wondering about the specific regular expression you offered. It
rejects a large class of valid email addresses. In what situation would
you use a regular expression like that? If you used it in a situation
where rejecting valid email addresses caused a problem, how would you
fix it?

Zach
From: ccc31807 on
On Mar 1, 4:21 pm, Zach Beane <x...(a)xach.com> wrote:
> I'm wondering about the specific regular expression you offered. It
> rejects a large class of valid email addresses. In what situation would
> you use a regular expression like that? If you used it in a situation
> where rejecting valid email addresses caused a problem, how would you
> fix it?

I apologize for the confusion. This is something I made up on the fly,
untested, and not used in code, at least by me. I just wanted to show
something 'like' I would use without finding an actual example in
code.

Probably the most common task in my job is to take apart a datafile
wherein some of the values are email addresses. I know the values
contained in the database from past experience, so I can get away with
testing for the @ symbol for an email address, and this is sufficient.

Actually, this file contains a lot of information, in a particular
order, with multiple telephone numbers, email addresses, terms, and
restrictions. I break up the values with REs like these:

=~ /@/ -- it's an email address
=~ /\// -- it's a term (terms contain the solidus)
=~ /\d{3}.?\d{3}.?\d{4}/ -- it's a telephone number
=~ /[A-Z]{3,4}/ -- it's a restriction (contains 3 or 4 UC chars)
If the token doesn't match any of these, it's a error, and this is
ALWAYS true.

This works because I know the data values, it would not work in
another application. There are multiple ways to do a job like this,
for example, using index() would work for some values. I use REs
because they are simple, uncomplicated, and easy. I am also well aware
that people who have not seen REs before are mystified, and to be
quite honest with you, this is part of their charm. Some of the
comments in this thread about the insanity of Perl have a reverse
psychological effect, in that the more horrible they say Perl is, the
more it makes me want to rub it in their faces.

And to be totally honest, I find this to be part of the charm of Lisp
as well. I am a graduate student in Software Engineering at a large
public university, and sometimes use Lisp for projects and
assignments. I enjoy some of the comments from some of the professors
(and I mean well published and respected names) that "Lisp programmers
should be shot," or "Lisp should be made illegal." The fact that my
first exposure to a Lisp program produced in me a feeling that Lisp
was absolute opaque also made me want to learn Lisp, just as REs did.
It also contributed to my interest in Perl (as "unintelligible line
noise").

Maybe those things that people complain about the loudest are the most
valuable. Anyway, I have found both Perl and REs understandable and
practical, and I trust that I'll have the same experience with Lisp.

CC.
From: Tim Bradshaw on
On 2010-03-01 19:17:35 +0000, Ron Garret said:

> The worst part is that the design of Perl makes it nearly impossible to
> figure out what a piece of code does unless you're already intimately
> familiar with the language. Consider the above three lines of code, and
> suppose you didn't know what they did. How would you find out? What
> would you look up?

Like English

From: Tim Bradshaw on
On 2010-03-01 18:24:28 +0000, ccc31807 said:

> Personally, I use something like /[\w.-]+@[\w]+\.[\w]{2,4}/
> which matches as follows:
> - at least one alphanumeric character, dot, or dash
> - exactly one "@"
> - at least one alphanumeric character
> - exactly one "."
> - from two to four alphanumeric characters
> and fits into the category of "good enough"

People who do this sort of thing should just be prevented from
pforramming. There's a standard (which is still essentially RFC822)
for what is a valid mail address. I repeatedly run into systems which
have implemented their own, deficient, parser (such as yours) and
reject perfectly valid addresses. It is emphatically not "good enough".

From: Kaz Kylheku on
On 2010-03-01, ccc31807 <cartercc(a)gmail.com> wrote:
> On Feb 27, 3:08 pm, Ron Garret <rNOSPA...(a)flownet.com> wrote:
>> It boggles my mind that the same people who
>> complain about the aesthetics (or lack thereof) of parens in
>> S-expressions will accept something like this:
>>
>> (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\
>> x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e
>> -\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*
>> [a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|
>> 2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e
>> -\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
>
> I'd like to offer some perspective to this, without insulting or
> casting aspersions.

FIRST,
[ ... ]
SECOND,
[ ... ]
SIXTH,
[ ... ]

Ronnie pushed some buttons here, tee hee.