Prev: NYC LOCAL: Tuesday 30 March 2010 NYLUG Hackfest
Next: NYC LOCAL: Friday 2 April 2010 Student and Startup Hackathon NYC
From: Robert Bonomi on 10 Apr 2010 17:40 In article <hp4752$loi$2(a)speranza.aioe.org>, Mike Scott <usenet.12(a)spam.stopper.scottsonline.org.uk> wrote: >Robert Bonomi wrote: >> In article <hp2b70$l18$1(a)speranza.aioe.org>, >> Mike Scott <usenet.12(a)spam.stopper.scottsonline.org.uk> wrote: >>> Oh, and using 'old-style' re's, >>> \(.*\)\1 >>> matches >>> 123abcabc456 >>> but returns a null string as the match! Wierd. >> >> If you don't understand why that is happening, then you do *NOT* understand >> regular expressions. >> >> explanation: >> '.' means 'match any character' >> '*' means 'match ZERO OR MORE of the previous character' >> >> Thus '.*' does match a null string (zero characters, before the first '1') >> and there is a second null string, following the first one, (still before >> the first '1') -- hence the search criteria _is_ satisfied. >> >> Wildcard RE matches look for the match that starts _earliest_ in the string, >> and has the longest length. >> >> The null string match occurs before the 'abcabc' match, and thus is selected >> even though the second pattern match is longer. >> >> >Ok, thanks, point taken. In mitigation, I did find an almost exactly >similar example on the net, making the exact same mistake....... It is a _common_ mistake. I've made it myself, _more_ than once. <wry grin> '.*' without something anchoring it on at least one side is almost *never* what the author intended, for exactly that reason. Usually, the intent is '.+' (or '..*', if you don't have the '+' wildcard available -- as in some obselete, pre-POSIX, implementations) which imposes a minimum length of 1.
From: Mike Scott on 11 Apr 2010 02:51
Mike Scott wrote: ...... > Unfortunately, people being what they are, I also get things like > "All Musicians" <musicians(a)mydomain> > appearing - note the capitals. > Unfortunately, the backref /always/ seems to honour the capitalization, > so the above re will not match, even with REG_ICASE set. The behaviour > seems debatable and the man page unclear. I assume there's no way out of > this using re's??? > > (Sorry for following up my own post) A quick check this week showed perl behaved sensibly - the case-independent flag makes even back ref's ignore case, unlike the C library re routines. Has anyone already hacked milter-regex to use pcre instead?? Maybe I'll ask too on the sendmail group when time allows. -- Mike Scott (unet2 <at> [deletethis] scottsonline.org.uk) Harlow Essex England |