Prev: FAQ 6.13 What does it mean that regexes are greedy? How can I get around it?
Next: FAQ 7.4 How do I skip some return values?
From: Tad McClellan on 9 Mar 2010 22:20 Jason Carlton <jwcarlton(a)gmail.com> wrote: > On Mar 9, 8:30 pm, Tad McClellan <ta...(a)seesig.invalid> wrote: >> Jason Carlton <jwcarl...(a)gmail.com> wrote: >> > Sorry if I made that too much to read. >> >> You've shown in the past that anything you write is too much to read. >> >> :-( >> >> -- >> Tad McClellan >> email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/" >> The above message is a Usenet post. >> I don't recall having given anyone permission to use it on a Web site. It is bad netiquette to quote .sigs. > So, you're saying that you don't know the answer? No, I'm saying that I am withholding the answer. -- Tad McClellan email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/" The above message is a Usenet post. I don't recall having given anyone permission to use it on a Web site.
From: Jason Carlton on 9 Mar 2010 23:49 On Mar 9, 9:21 pm, s...(a)netherlands.com wrote: > On Mon, 8 Mar 2010 19:03:03 -0800 (PST), Jason Carlton <jwcarl...(a)gmail.com> wrote: > >Every once in awhile, someone will copy and paste into my message > >board from Word. After it submits through my Perl script, I'll have > >something like this plugged in: > > >Normal 0 false false false EN-US X-NONE X-NONE > >MicrosoftInternetExplorer4 /* Style Definitions */ > >table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle- > >rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso- > >style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso- > >padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para- > >margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left: > >0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; > >font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso- > >ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New > >Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font- > >family:Calibri; mso-hansi-theme-font:minor-latin;} > > >The fonts and all that are different for each post; the only > >consistency seems to be that it starts with "Normal 0 false false > >false", and it ends with a "}". > > >Would something as simple as this be enough to consistently remove it? > > >$comment =~ s/Normal 0 false false false.*?}//gsi; > > >Or is there more to it than I'm thinking? > > $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//; Thanks, s.
From: Jason Carlton on 25 Mar 2010 13:41 On Mar 9, 11:49 pm, Jason Carlton <jwcarl...(a)gmail.com> wrote: > On Mar 9, 9:21 pm, s...(a)netherlands.com wrote: > > > > > > > On Mon, 8 Mar 2010 19:03:03 -0800 (PST),JasonCarlton<jwcarl...(a)gmail.com> wrote: > > >Every once in awhile, someone will copy and paste into my message > > >board from Word. After it submits through my Perl script, I'll have > > >something like this plugged in: > > > >Normal 0 false false false EN-US X-NONE X-NONE > > >MicrosoftInternetExplorer4 /* Style Definitions */ > > >table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle- > > >rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso- > > >style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso- > > >padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para- > > >margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left: > > >0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; > > >font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso- > > >ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New > > >Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font- > > >family:Calibri; mso-hansi-theme-font:minor-latin;} > > > >The fonts and all that are different for each post; the only > > >consistency seems to be that it starts with "Normal 0 false false > > >false", and it ends with a "}". > > > >Would something as simple as this be enough to consistently remove it? > > > >$comment =~ s/Normal 0 false false false.*?}//gsi; > > > >Or is there more to it than I'm thinking? > > > $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//; > > Thanks, s. Unfortunately, neither of these are working the way I expected: $comment =~ s/Normal 0 false false false.*?}//gsi; $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//; It's catching the "Normal 0 false false false", but not everything else that comes after, and before the "}". How do I make it remove everything from "Normal 0 false false false" until it finds the first "}"? TIA, Jason
From: sln on 25 Mar 2010 14:52 On Thu, 25 Mar 2010 10:41:09 -0700 (PDT), Jason Carlton <jwcarlton(a)gmail.com> wrote: >On Mar 9, 11:49�pm, Jason Carlton <jwcarl...(a)gmail.com> wrote: >> On Mar 9, 9:21�pm, s...(a)netherlands.com wrote: >> >> >> >> >> >> > On Mon, 8 Mar 2010 19:03:03 -0800 (PST),JasonCarlton<jwcarl...(a)gmail.com> wrote: >> > >Every once in awhile, someone will copy and paste into my message >> > >board from Word. After it submits through my Perl script, I'll have >> > >something like this plugged in: >> >> > >Normal 0 false false false EN-US X-NONE X-NONE >> > >MicrosoftInternetExplorer4 /* Style Definitions */ >> > >table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle- >> > >rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso- >> > >style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso- >> > >padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para- >> > >margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left: >> > >0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; >> > >font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso- >> > >ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New >> > >Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font- >> > >family:Calibri; mso-hansi-theme-font:minor-latin;} >> >> > >The fonts and all that are different for each post; the only >> > >consistency seems to be that it starts with "Normal 0 false false >> > >false", and it ends with a "}". >> >> > >Would something as simple as this be enough to consistently remove it? >> >> > >$comment =~ s/Normal 0 false false false.*?}//gsi; >> >> > >Or is there more to it than I'm thinking? >> >> > $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//; >> >> Thanks, s. > >Unfortunately, neither of these are working the way I expected: > >$comment =~ s/Normal 0 false false false.*?}//gsi; >$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//; > >It's catching the "Normal 0 false false false", but not everything >else that comes after, and before the "}". > >How do I make it remove everything from "Normal 0 false false false" >until it finds the first "}"? > >TIA, > >Jason You can generalize it more: $comment =~ s/Normal \s* \d+ \s* false \s* false \s* false [^}]* \} //xig; But, its probably not matching, so the format is different, maybe there is no terminating '}' in the real text. You don't need /s if you don't have a '.' in the pattern, thats why [^}]* \} Its not a good idea to get everything between the the "Normal" to "}" as thats not really enough info to make a pattern. It looks like this: Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 is a space delimited set of variable settings, followed by a '{' block '}' delimeted set of style definitions: You could use alternation to flag the start the definition if you know the possible values (the slots look constant), so: $comment =~ s/ (?:Normal|<something else>) \s* \d+ \s* (?:false|true) \s* (?:false|true) \s* (?:false|true) [^}]* \} //xig; But, I don't know this format and it possibly can't be relied upon. Also, the regex has a requirement that it have a style block (or at least something with a '}' as the terminator. -sln
From: J. Gleixner on 25 Mar 2010 18:45
Jason Carlton wrote: > On Mar 9, 11:49 pm, Jason Carlton <jwcarl...(a)gmail.com> wrote: >> On Mar 9, 9:21 pm, s...(a)netherlands.com wrote: >> >> >> >> >> >>> On Mon, 8 Mar 2010 19:03:03 -0800 (PST),JasonCarlton<jwcarl...(a)gmail.com> wrote: >>>> Every once in awhile, someone will copy and paste into my message >>>> board from Word. After it submits through my Perl script, I'll have >>>> something like this plugged in: >>>> Normal 0 false false false EN-US X-NONE X-NONE >>>> MicrosoftInternetExplorer4 /* Style Definitions */ >>>> table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle- >>>> rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso- >>>> style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso- >>>> padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para- >>>> margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left: >>>> 0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; >>>> font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso- >>>> ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New >>>> Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font- >>>> family:Calibri; mso-hansi-theme-font:minor-latin;} >>>> The fonts and all that are different for each post; the only >>>> consistency seems to be that it starts with "Normal 0 false false >>>> false", and it ends with a "}". >>>> Would something as simple as this be enough to consistently remove it? >>>> $comment =~ s/Normal 0 false false false.*?}//gsi; >>>> Or is there more to it than I'm thinking? >>> $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//; >> Thanks, s. > > Unfortunately, neither of these are working the way I expected: > > $comment =~ s/Normal 0 false false false.*?}//gsi; > $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//; > > It's catching the "Normal 0 false false false", but not everything > else that comes after, and before the "}". > > How do I make it remove everything from "Normal 0 false false false" > until it finds the first "}"? $comment =~ s/Normal 0 false false false[^}]*}//gsi; my $str = 'Start Normal 0 false false false blah blah { more blah } Starting second match Normal 0 false false false blah blah { more blah } The End'; $str =~ s/Normal 0 false false false[^}]*}//gsi; print $str; Start Starting second match The End |