Prev: FAQ 4.54 Why does defined() return true on empty arrays and hashes?
Next: FAQ 7.5 How do I temporarily block warnings?
From: PerlFAQ Server on 24 Jul 2010 06:00 This is an excerpt from the latest version perlfaq6.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest version of the complete perlfaq is at http://faq.perl.org . -------------------------------------------------------------------- 6.20: What good is "\G" in a regular expression? You use the "\G" anchor to start the next match on the same string where the last match left off. The regular expression engine cannot skip over any characters to find the next match with this anchor, so "\G" is similar to the beginning of string anchor, "^". The "\G" anchor is typically used with the "g" flag. It uses the value of "pos()" as the position to start the next match. As the match operator makes successive matches, it updates "pos()" with the position of the next character past the last match (or the first character of the next match, depending on how you like to look at it). Each string has its own "pos()" value. Suppose you want to match all of consecutive pairs of digits in a string like "1122a44" and stop matching when you encounter non-digits. You want to match 11 and 22 but the letter <a> shows up between 22 and 44 and you want to stop at "a". Simply matching pairs of digits skips over the "a" and still matches 44. $_ = "1122a44"; my @pairs = m/(\d\d)/g; # qw( 11 22 44 ) If you use the "\G" anchor, you force the match after 22 to start with the "a". The regular expression cannot match there since it does not find a digit, so the next match fails and the match operator returns the pairs it already found. $_ = "1122a44"; my @pairs = m/\G(\d\d)/g; # qw( 11 22 ) You can also use the "\G" anchor in scalar context. You still need the "g" flag. $_ = "1122a44"; while( m/\G(\d\d)/g ) { print "Found $1\n"; } After the match fails at the letter "a", perl resets "pos()" and the next match on the same string starts at the beginning. $_ = "1122a44"; while( m/\G(\d\d)/g ) { print "Found $1\n"; } print "Found $1 after while" if m/(\d\d)/g; # finds "11" You can disable "pos()" resets on fail with the "c" flag, documented in perlop and perlreref. Subsequent matches start where the last successful match ended (the value of "pos()") even if a match on the same string has failed in the meantime. In this case, the match after the "while()" loop starts at the "a" (where the last match stopped), and since it does not use any anchor it can skip over the "a" to find 44. $_ = "1122a44"; while( m/\G(\d\d)/gc ) { print "Found $1\n"; } print "Found $1 after while" if m/(\d\d)/g; # finds "44" Typically you use the "\G" anchor with the "c" flag when you want to try a different match if one fails, such as in a tokenizer. Jeffrey Friedl offers this example which works in 5.004 or later. while (<>) { chomp; PARSER: { m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; }; m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; }; m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; }; m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; }; } } For each line, the "PARSER" loop first tries to match a series of digits followed by a word boundary. This match has to start at the place the last match left off (or the beginning of the string on the first match). Since "m/ \G( \d+\b )/gcx" uses the "c" flag, if the string does not match that regular expression, perl does not reset pos() and the next match starts at the same position to try a different pattern. -------------------------------------------------------------------- The perlfaq-workers, a group of volunteers, maintain the perlfaq. They are not necessarily experts in every domain where Perl might show up, so please include as much information as possible and relevant in any corrections. The perlfaq-workers also don't have access to every operating system or platform, so please include relevant details for corrections to examples that do not work on particular platforms. Working code is greatly appreciated. If you'd like to help maintain the perlfaq, see the details in perlfaq.pod. |