Prev: FAQ 6.13 What does it mean that regexes are greedy? How can I get around it?
Next: FAQ 7.4 How do I skip some return values?
From: Jason Carlton on 25 Mar 2010 19:23 On Mar 25, 5:45 pm, "J. Gleixner" <glex_no-s...(a)qwest-spam-no.invalid> wrote: > JasonCarltonwrote: > > On Mar 9, 11:49 pm,JasonCarlton<jwcarl...(a)gmail.com> wrote: > >> On Mar 9, 9:21 pm, s...(a)netherlands.com wrote: > > >>> On Mon, 8 Mar 2010 19:03:03 -0800 (PST),JasonCarlton<jwcarl...(a)gmail.com> wrote: > >>>> Every once in awhile, someone will copy and paste into my message > >>>> board from Word. After it submits through my Perl script, I'll have > >>>> something like this plugged in: > >>>> Normal 0 false false false EN-US X-NONE X-NONE > >>>> MicrosoftInternetExplorer4 /* Style Definitions */ > >>>> table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle- > >>>> rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso- > >>>> style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso- > >>>> padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para- > >>>> margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left: > >>>> 0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; > >>>> font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso- > >>>> ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New > >>>> Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font- > >>>> family:Calibri; mso-hansi-theme-font:minor-latin;} > >>>> The fonts and all that are different for each post; the only > >>>> consistency seems to be that it starts with "Normal 0 false false > >>>> false", and it ends with a "}". > >>>> Would something as simple as this be enough to consistently remove it? > >>>> $comment =~ s/Normal 0 false false false.*?}//gsi; > >>>> Or is there more to it than I'm thinking? > >>> $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//; > >> Thanks, s. > > > Unfortunately, neither of these are working the way I expected: > > > $comment =~ s/Normal 0 false false false.*?}//gsi; > > $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//; > > > It's catching the "Normal 0 false false false", but not everything > > else that comes after, and before the "}". > > > How do I make it remove everything from "Normal 0 false false false" > > until it finds the first "}"? > > $comment =~ s/Normal 0 false false false[^}]*}//gsi; > > my $str = 'Start Normal 0 false false false blah blah { more blah } > Starting second match Normal 0 false false false blah blah { more blah } > The End'; > $str =~ s/Normal 0 false false false[^}]*}//gsi; > print $str; > > Start Starting second match The End J, should that first "}" be a "{"? Like: $str =~ s/Normal 0 false false false[^{]*}//gsi;
From: J. Gleixner on 26 Mar 2010 11:42
Jason Carlton wrote: [...] >>>>>> The fonts and all that are different for each post; the only >>>>>> consistency seems to be that it starts with "Normal 0 false false >>>>>> false", and it ends with a "}". >>>>>> Would something as simple as this be enough to consistently remove it? [...] > J, should that first "}" be a "{"? Like: > $str =~ s/Normal 0 false false false[^{]*}//gsi; Before asking if it's not correct, why not try it? [^}]* - match everything until it sees '}' } - include '}' in the pattern. -- without that you'll have '}' in your results. I gave example text, and the output it generates, if that doesn't match what you want, then please be a little more verbose. Provide a -short- example of the text before, and what you want the text to be after doing something to it. |