Prev: Complie-time polymorphsim and Run-time polymorpshism
Next: Write/Read File having Japanese characters as file name
From: Iain Barnett on 8 Jul 2010 07:20 I'm trying to emulate something I've done in .Net many moons ago, which is capture a named group, but not just once, get all it's repetitions and then be able to see all those repetitions. I think they call them GroupCollections in C#. This is the kind of code I'm trying to emulate with Ruby(1.9.1): using System; using System.Text.RegularExpressions; public class Test { public static void Main () { // Define a regular expression for repeated words. Regex rx = new Regex(@"\b(?<word>\w+)\s+(\k<word>)\b", RegexOptions.Compiled | RegexOptions.IgnoreCase); // Define a test string. string text = "The the quick brown fox fox jumped over the lazy dog dog."; // Find matches. MatchCollection matches = rx.Matches(text); // Report the number of matches found. Console.WriteLine("{0} matches found in:\n {1}", matches.Count, text); // Report on each match. foreach (Match match in matches) { GroupCollection groups = match.Groups; Console.WriteLine("'{0}' repeated at positions {1} and {2}", groups["word"].Value, groups[0].Index, groups[1].Index); } } } // The example produces the following output to the console: // 3 matches found in: // The the quick brown fox fox jumped over the lazy dog dog. // 'The' repeated at positions 0 and 4 // 'fox' repeated at positions 20 and 25 // 'dog' repeated at positions 50 and 54 For example, if I had the string "11 12" I could have a regex like / (?<first> \d+ ) \s \g<first> /x that captured "11" and then the repetition "12" and put them in an array (or some kind of collection) referenced by the name. I think my attempts to get this to work are better explanations. What I want is the result #<MatchData "11 12" first:["11", "12"]> or something like it. At the moment all my attempts end with the named capture only keeping the last match it made i.e. 12 with no mention of 11. I know I could do this a different way, perhaps with split or something, but I'd like to know if it's possible with just regex. I understand the Oniguruma engine is used now but I can't find any good docs for it. These are my attempts, $ is my prompt. $ md1 = / (?<first> \d+ ) \s \g<first> /x.match( "11 12" ) #<MatchData "11 12" first:"12"> $ md1[:first] "12" $ md1 = / (?<first> \d+ ) (?: \s \g<first> )? /x.match( "11 12" ) #<MatchData "11 12" first:"12"> $ md1[:first] "12" $ md1 = / (?<first> \d+ ) (?: \s (?<second> \g<first> ) )? /x.match( "11 12" ) #<MatchData "11 12" first:"12" second:"12"> $ md1[:first] "12" $ md1[:second] "12" $ md1 = / (?: (?<first> \d+ )\s* )+ /x.match( "11 12" ) #<MatchData "11 12" first:"12"> $ md1[:first] "12" Iain
From: w_a_x_man on 8 Jul 2010 11:13 On Jul 8, 6:20 am, Iain Barnett <iainsp...(a)gmail.com> wrote: > I'm trying to emulate something I've done in .Net many moons ago, which is capture a named group, but not just once, get all it's repetitions and then be able to see all those repetitions. I think they call them GroupCollections in C#. This is the kind of code I'm trying to emulate with Ruby(1.9.1): > > using System; > using System.Text.RegularExpressions; > > public class Test > { > > public static void Main () > { > > // Define a regular expression for repeated words. > Regex rx = new Regex(@"\b(?<word>\w+)\s+(\k<word>)\b", > RegexOptions.Compiled | RegexOptions.IgnoreCase); > > // Define a test string. > string text = "The the quick brown fox fox jumped over the lazy dog dog."; > > // Find matches. > MatchCollection matches = rx.Matches(text); > > // Report the number of matches found. > Console.WriteLine("{0} matches found in:\n {1}", > matches.Count, > text); > > // Report on each match. > foreach (Match match in matches) > { > GroupCollection groups = match.Groups; > Console.WriteLine("'{0}' repeated at positions {1} and {2}", > groups["word"].Value, > groups[0].Index, > groups[1].Index); > } > > } > > } > > // The example produces the following output to the console: > // 3 matches found in: > // The the quick brown fox fox jumped over the lazy dog dog. > // 'The' repeated at positions 0 and 4 > // 'fox' repeated at positions 20 and 25 > // 'dog' repeated at positions 50 and 54 > > For example, if I had the string "11 12" I could have a regex like > / > (?<first> \d+ ) \s \g<first> > /x > that captured "11" and then the repetition "12" and put them in an array (or some kind of collection) referenced by the name. > > I think my attempts to get this to work are better explanations. What I want is the result > #<MatchData "11 12" first:["11", "12"]> or something like it. At the moment all my attempts end with the named capture only keeping the last match it made i.e. 12 with no mention of 11. > > I know I could do this a different way, perhaps with split or something, but I'd like to know if it's possible with just regex. I understand the Oniguruma engine is used now but I can't find any good docs for it. > > These are my attempts, $ is my prompt. > > $ md1 = / > (?<first> \d+ ) > \s \g<first> > /x.match( "11 12" ) > #<MatchData "11 12" first:"12"> > > $ md1[:first] > "12" > > $ md1 = / > (?<first> \d+ ) > (?: \s \g<first> )? > /x.match( "11 12" ) > #<MatchData "11 12" first:"12"> > > $ md1[:first] > "12" > > $ md1 = / > (?<first> \d+ ) > (?: \s > (?<second> \g<first> ) > )? > /x.match( "11 12" ) > #<MatchData "11 12" first:"12" second:"12"> > > $ md1[:first] > "12" > > $ md1[:second] > "12" > > $ md1 = / > (?: (?<first> \d+ )\s* )+ > /x.match( "11 12" ) > #<MatchData "11 12" first:"12"> > > $ md1[:first] > "12" > > Iain "The the quick brown fox fox jumped over the lazy dog dog.". scan(/((\w+) +\2)/i){|x| puts "#{ x[0] } #{ $~.offset(0)[0]}"} The the 0 fox fox 20 dog dog 50
From: Iain Barnett on 8 Jul 2010 12:38 On 8 Jul 2010, at 16:15, w_a_x_man wrote: >> > > "The the quick brown fox fox jumped over the lazy dog dog.". > scan(/((\w+) +\2)/i){|x| puts "#{ x[0] } #{ $~.offset(0)[0]}"} > The the 0 > fox fox 20 > dog dog 50 > Thanks for that. That would certainly work to a degree, much better than my current alternative, but it nullifies the usefulness of named captures. For example, I can't call $ md1[:first] and get back all the matches for the (?<first> ) grouping, which would be phenomenally useful, because scan returns arrays of strings and not matchdata. Iain
From: botp on 8 Jul 2010 13:01 On Fri, Jul 9, 2010 at 12:38 AM, Iain Barnett <iainspeed(a)gmail.com> wrote: > Thanks for that. That would certainly work to a degree, much better than my current alternative, but it nullifies the usefulness of named captures. For example, I can't call > > $ md1[:first] wait till you call the 21st ;-) > and get back all the matches for the (?<first> ) grouping, which would be phenomenally useful, because scan returns arrays of strings and not matchdata. > waxman hinted the $~ try eg, s #=> "The the quick brown fox fox jumped over the lazy dog dog." m=[] #=> [] s.scan(/((\w+) +\2)/i){|x| m << $~} #=> "The the quick brown fox fox jumped over the lazy dog dog." m.size #=> 3 m[0] #=> #<MatchData "The the" 1:"The the" 2:"The"> m[0].offset 0 #=> [0, 7] m[0].offset .... and so fort.. best regards -botp
From: Iain Barnett on 8 Jul 2010 13:30
On 8 Jul 2010, at 18:01, botp wrote: > >> and get back all the matches for the (?<first> ) grouping, which would be phenomenally useful, because scan returns arrays of strings and not matchdata. >> > > waxman hinted the $~ > ... > > best regards -botp > Ok, I get it now. Thanks for the extra nudge (bang on the head:) Iain |