Prev: passing variable as default value to procedure parameter
Next: how to spawn and expect multiple processes "in parallel"
From: gtb on 3 Sep 2009 15:16 if {[regexp {^\s*/[/*]} $line]} { The line above seems to find C style comments. I know that the caret anchors it but am not sure what \s means. thanks
From: Jonathan Bromley on 3 Sep 2009 15:18 On Thu, 3 Sep 2009 12:16:49 -0700 (PDT), gtb <goodTweetieBird(a)hotmail.com> wrote: >if {[regexp {^\s*/[/*]} $line]} { > > >The line above seems to find C style comments. It will not match trailing comments: if (p == q) // this won't match > I know that the caret >anchors it but am not sure what \s means. It matches any white-space character (tab, space and a few others). See the "re_syntax" man page for more details. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley(a)MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
From: goodTweetieBird on 3 Sep 2009 15:24 Thanks but I found that it is whitespace. So the search is for beginning of line, possible white space, then /* or //. Right?
From: Donald Arseneau on 4 Sep 2009 00:14
On Sep 3, 12:18 pm, Jonathan Bromley <jonathan.brom...(a)MYCOMPANY.com> wrote: > <goodTweetieB...(a)hotmail.com> wrote: > >if {[regexp {^\s*/[/*]} $line]} { > > >The line above seems to find C style comments. > > It will not match trailing comments: > > if (p == q) // this won't match Those are not "C style comments" but "C++ style comments", and the particular regexp seems to locate whole-line comments of either style, but makes a mess of multi-line C-style comments. regexp {/\*.*?\*/} $wholetext will get simple C-style comments (ignoring the many possible complications such as quoted "/*" in strings); Note $wholetext instead of $line to indicate that it has to scan the whole file, not line-by-line. regexp -line {//.*$} $wholetext regexp {//.*$} $line both find C++ style comments. > not sure what \s means. A white-space character. (Tcl man re_syntax.) If you do want to collect all preceding space for regexps that take the whole line, then you could separately do: regexp {^\s*?/\*.*?\*/} $wholetext regexp -line {^\s*?//.*?$} $wholetext which brings up a nasty catch that, err, catches me all the time! Since I need a non-greedy qualifier ".*?" later in the regexp, I have made the first one non-greedy also -- there can be only one style! Furthermore, this makes 4 regexps to do consecutively. If you want to combine them, then the greediness requirement becomes unmanageable: the .*? in /\*.*?\*/ must become .*, which then matches too much; and the fix looks like gibberish: regexp {(?:^\s*)?/\*[^*]*(?:\*(?!/)[^*]*)*\*/} $wholetext OK, let's explain.... First, (?: ) is non-capturing grouping. Too bad it is uglier than ( ). # (grouped) pattern for spaces at the beginning of a line: set indentation {(?:^\s*)} But let's omit that from the beginning of the C pattern (see below) # literal slash-star: set slashstar {/\*} # literal star-slash: set star-slash {\*/} # all non-star characters: set allnonstar {[^*]*} # star character not followed by a slash ("negative lookahead" for slash): set lonestar {\*(?!/)} # A lone star plus ensuing non-star characters, in a group: set starmore "(?:$lonestar$allnonstar)" # non-star characters plus any ensuing (groups of) lone-star plus more: set nonstarwithlonestars "$allnonstar$starmore*" # All together: # a /*, all non-star characters as well as # stars not followed by slash, and a */ at the end: set Cpattern "$slashstar$allnonstar$starmore*$starslash" The other obstacle to combining the C and C++ patterns is the line-by-line matching for the C++ \\ style. That is easily changed though set untilendline {[^\n]*} set Cpppattern "//$untilendline" And the full pattern is: # Optional indentation plus (C comment OR C++ comment) set pattern "$indentation?(?:$Cpattern|$Cpppattern)" regexp -all $pattern $wholetext {} Eeeek! I hope that was useful for information purposes. It is not useful for an engine to capture or remove all comments from C programs because it ignores the peculiarities ofspecial cases like comments in quoted strings and, if the application treats /* /* */ */ as nested, nested comments in comments. I recommend the wiki page http://wiki.tcl.tk/14658 Donald Arseneau |