Prev: locale and Unicode [Was: How to switch floating decimal...]
Next: FAQ 8.3 How do I do fancy stuff with the keyboard/screen/mouse?
From: PerlFAQ Server on 8 Jun 2010 06:00 This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest version of the complete perlfaq is at http://faq.perl.org . -------------------------------------------------------------------- 6.11: How do I use a regular expression to strip C style comments from a file? While this actually can be done, it's much harder than you'd think. For example, this one-liner perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c will work in many but not all cases. You see, it's too simple-minded for certain kinds of C programs, in particular, those with what appear to be comments in quoted strings. For that, you'd need something like this, created by Jeffrey Friedl and later modified by Fred Curtis. $/ = undef; $_ = <>; s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse; print; This could, of course, be more legibly written with the "/x" modifier, adding whitespace and comments. Here it is expanded, courtesy of Fred Curtis. s{ /\* ## Start of /* ... */ comment [^*]*\*+ ## Non-* followed by 1-or-more *'s ( [^/*][^*]*\*+ )* ## 0-or-more things which don't start with / ## but do end with '*' / ## End of /* ... */ comment | ## OR various things which aren't comments: ( " ## Start of " ... " string ( \\. ## Escaped char | ## OR [^"\\] ## Non "\ )* " ## End of " ... " string | ## OR ' ## Start of ' ... ' string ( \\. ## Escaped char | ## OR [^'\\] ## Non '\ )* ' ## End of ' ... ' string | ## OR . ## Anything other char [^/"'\\]* ## Chars which doesn't start a comment, string or escape ) }{defined $2 ? $2 : ""}gxse; A slight modification also removes C++ comments, possibly spanning multiple lines using a continuation character: s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse; -------------------------------------------------------------------- The perlfaq-workers, a group of volunteers, maintain the perlfaq. They are not necessarily experts in every domain where Perl might show up, so please include as much information as possible and relevant in any corrections. The perlfaq-workers also don't have access to every operating system or platform, so please include relevant details for corrections to examples that do not work on particular platforms. Working code is greatly appreciated. If you'd like to help maintain the perlfaq, see the details in perlfaq.pod. |