Prev: Anybody mind to translate this Felleisen quote from German to English
Next: Macros and anonymous functions
From: Kaz Kylheku on 1 Mar 2010 20:51 On 2010-03-01, Ron Garret <rNOSPAMon(a)flownet.com> wrote: > In article ><3bb8c3f7-ba67-4944-b236-e998684adb56(a)g19g2000yqe.googlegroups.com>, > ccc31807 <cartercc(a)gmail.com> wrote: > >> On Feb 27, 3:08 pm, Ron Garret <rNOSPA...(a)flownet.com> wrote: >> > It boggles my mind that the same people who >> > complain about the aesthetics (or lack thereof) of parens in >> > S-expressions will accept something like this: >> > >> > (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\ >> > x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e >> > -\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]* >> > [a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]| >> > 2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e >> > -\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]) > ... >> I read through the RE example above, and it would take a bit of time >> to figure out exactly what it did > > It detects valid email addresses. I got it from this page: > > http://regexlib.com/DisplayPatterns.aspx The above quoted example is a gross misuse of regexes, because a single regex is applied to a problem which is more properly treated as a scanning and parsing problem. Regular expressions are intended for lexical scanning: decomposing input into tokens. They fit into (and do not substitute for) scanner-generator languages. In a scanner generator language like lex, you can define macros representing regex fragments, so you can write {id} instead of repeating [A-Za-z_][A-Za-z0-9_]+ . Thus an ugly category of characters like [a-z0-9!#$%&'*+/=?^_`{|}~-] can be given a meaningful name and invoked by that name. Moreover, in a scanner generator language, you would define a set of tokens to be extracted, using parallel regular expressions. The scanner determines which token is to be extracted using some heuristic, like which of the expressions constitutes the longest match (with tie-breaker resolution by which token rule occurs first). The higher level syntax is recognized over the token categories, not at the character level. A misuse of regexes as a big hammer to strike any size nail does not damn regexes.
From: Giovanni Gigante on 2 Mar 2010 09:25 > do three completely different things, none of which involve computing a > quotient. And this is just a single example of a long, long, long list > of syntactic and semantic insanity. True. I suspect that one may like perl in the same way as one can love anything horribly complex enough in which he's proficient. I am also wondering about another thing. If things like parenscript exist, it should be possible to write a parenperl. Would it be useful? Would it still be "perl"?
From: Giovanni Gigante on 2 Mar 2010 09:34 Tim Bradshaw wrote: > > Like English Btw, I've never understood the traditional "Larry is a linguist" argument. I think that ambiguities are a source of richness and fun in human language, but not in computer ones (for most definitions of "fun").
From: Giovanni Gigante on 2 Mar 2010 09:36 Giovanni Gigante wrote: > > Would it still be "perl"? Uhm. This one sounds unexpectedly gavinesque. Please alert if that's the case.
From: ccc31807 on 2 Mar 2010 09:41
On Mar 1, 6:15 pm, Tim Bradshaw <t...(a)tfeb.org> wrote: > > Personally, I use something like /[\w.-]+@[\w]+\.[\w]{2,4}/ > People who do this sort of thing should just be prevented from > pforramming. .... > It is emphatically not "good enough". I guess you didn't read the post where I explained the particular usage. This is to bread apart data read from a database, which might contain a line like this: "999-999-9999","email(a)addy.com","10/SP","ACAD" where a telephone number contains at least 9 digits, an email address contains 1 '@', a term contains 1 '/'. and a restriction contains all UC alpha characters. In this task, please explain to me why the RE isn't "good enough?" In particular, why shouldn't this RE, /@/, be "good enough" to select an email address? CC. |