Prev: clock format causes infinite loop? with its use of "dict exists"
Next: What is the best way to handle file names passed to a TCL script?
From: Bezoar on 5 Aug 2010 13:14 I've been trying to get a regexp that will match lines that match a particular first part but do not end and two specified last parts. For instance if I have the following input: a abc dkkdkdkdChuck; a abc oeriererer; a abc oeriereXY; my regexp must only match line 2 ( a abc oeriererer;) which means I need a regexp that means: "line begins with a, then contains abc then any number or characters but not ending in Chuck or XY followed by ; and the end of line". My first attempt was the following where the only NOT operator is ^ but is only effective in [] which allow only character groups not alternation. set reg {^a abc .*[^(Chuck|XY)];$} ; # not valid How can I get the opposite of the alternation?
From: Bezoar on 5 Aug 2010 14:03 On Aug 5, 12:14 pm, Bezoar <cwjo...(a)gmail.com> wrote: > I've been trying to get a regexp that will match lines that match a > particular first part but do not end and two specified last parts. For > instance if I have the following input: > > a abc dkkdkdkdChuck; > a abc oeriererer; > a abc oeriereXY; > > my regexp must only match line 2 ( a abc oeriererer;) which means I > need a regexp that means: > "line begins with a, then contains abc then any number or characters > but not ending in Chuck or XY followed by ; and the end of line". My > first attempt was the following where the only NOT operator is ^ but > is only effective in [] which allow only character groups not > alternation. > > set reg {^a abc .*[^(Chuck|XY)];$} ; # not valid > > How can I get the opposite of the alternation? Well after some more digging and experimentation I found the answer: set reg {^a abc(?!.*(Chuck|XY);$).*;$} this uses the negative lookahead constraint which says : match up to a abc then look head to see if the line ends in Chuck;$ or XY;$ then if it does not then continue to match any character 0 or more times followed by ; and end of line, otherwise it does not match. Whew tough one
From: Jonathan Bromley on 5 Aug 2010 14:22 On Thu, 5 Aug 2010 11:03:01 -0700 (PDT), Bezoar wrote: [...] >> How can I get the opposite of the alternation? > >Well after some more digging and experimentation I found the answer: > >set reg {^a abc(?!.*(Chuck|XY);$).*;$} > >this uses the negative lookahead constraint which says : >match up to a abc then look head to see if the line ends in Chuck;$ or >XY;$ then if it does not then continue to match any character 0 or >more times followed by ; and end of line, otherwise it does not >match. Beware negative lookahead constraints. They take a HUGE performance hit in the regexp engine. I don't know exactly why this is so, but I've seen 50x performance degradation with even simple constraints (a fixed string of about 6 characters, nothing clever). If you're scanning large input texts, this can make all the difference between satisfactory and unacceptable performance. Generally it is far faster to get ALL the candidate matches with a first RE, then use some filtering (possibly another RE) to reject the unwanted ones. [regexp -all -inline] is your friend (possibly with -indices too). -- Jonathan Bromley
From: Gerald W. Lester on 5 Aug 2010 14:31 Bezoar wrote: > On Aug 5, 12:14 pm, Bezoar <cwjo...(a)gmail.com> wrote: >> I've been trying to get a regexp that will match lines that match a >> particular first part but do not end and two specified last parts. For >> instance if I have the following input: >> >> a abc dkkdkdkdChuck; >> a abc oeriererer; >> a abc oeriereXY; >> >> my regexp must only match line 2 ( a abc oeriererer;) which means I >> need a regexp that means: >> "line begins with a, then contains abc then any number or characters >> but not ending in Chuck or XY followed by ; and the end of line". My >> first attempt was the following where the only NOT operator is ^ but >> is only effective in [] which allow only character groups not >> alternation. >> >> set reg {^a abc .*[^(Chuck|XY)];$} ; # not valid >> >> How can I get the opposite of the alternation? > > Well after some more digging and experimentation I found the answer: > > set reg {^a abc(?!.*(Chuck|XY);$).*;$} > > this uses the negative lookahead constraint which says : > match up to a abc then look head to see if the line ends in Chuck;$ or > XY;$ then if it does not then continue to match any character 0 or > more times followed by ; and end of line, otherwise it does not > match. > > Whew tough one You could also have done (which IMHO is a lot easier to read): if {[string match {a*Chuck;} $line] || [string match {a*XY;} $line]} { ## ## Line does not match ## } else { ## ## Line does match ## } -- +------------------------------------------------------------------------+ | Gerald W. Lester, President, KNG Consulting LLC | | Email: Gerald.Lester(a)kng-consulting.net | +------------------------------------------------------------------------+
From: Uwe Klein on 5 Aug 2010 16:04
Bezoar wrote: > On Aug 5, 12:14 pm, Bezoar <cwjo...(a)gmail.com> wrote: > >>I've been trying to get a regexp that will match lines that match a >>particular first part but do not end and two specified last parts. For >>instance if I have the following input: >> >>a abc dkkdkdkdChuck; >>a abc oeriererer; >>a abc oeriereXY; > > Whew tough one > switch -regexp -- $pattern \ {^a abc .*Chuck;$} - {^a abc .*XY;$} { # nop } ^a abc .*;$} { # hit puts "found it" } default { # everything else } uwe |