From: Iain Barnett on
I'm trying to emulate something I've done in .Net many moons ago, which is capture a named group, but not just once, get all it's repetitions and then be able to see all those repetitions. I think they call them GroupCollections in C#. This is the kind of code I'm trying to emulate with Ruby(1.9.1):

using System;
using System.Text.RegularExpressions;

public class Test
{

public static void Main ()
{

// Define a regular expression for repeated words.
Regex rx = new Regex(@"\b(?<word>\w+)\s+(\k<word>)\b",
RegexOptions.Compiled | RegexOptions.IgnoreCase);

// Define a test string.
string text = "The the quick brown fox fox jumped over the lazy dog dog.";

// Find matches.
MatchCollection matches = rx.Matches(text);

// Report the number of matches found.
Console.WriteLine("{0} matches found in:\n {1}",
matches.Count,
text);

// Report on each match.
foreach (Match match in matches)
{
GroupCollection groups = match.Groups;
Console.WriteLine("'{0}' repeated at positions {1} and {2}",
groups["word"].Value,
groups[0].Index,
groups[1].Index);
}

}

}
// The example produces the following output to the console:
// 3 matches found in:
// The the quick brown fox fox jumped over the lazy dog dog.
// 'The' repeated at positions 0 and 4
// 'fox' repeated at positions 20 and 25
// 'dog' repeated at positions 50 and 54


For example, if I had the string "11 12" I could have a regex like
/
(?<first> \d+ ) \s \g<first>
/x
that captured "11" and then the repetition "12" and put them in an array (or some kind of collection) referenced by the name.

I think my attempts to get this to work are better explanations. What I want is the result
#<MatchData "11 12" first:["11", "12"]> or something like it. At the moment all my attempts end with the named capture only keeping the last match it made i.e. 12 with no mention of 11.

I know I could do this a different way, perhaps with split or something, but I'd like to know if it's possible with just regex. I understand the Oniguruma engine is used now but I can't find any good docs for it.


These are my attempts, $ is my prompt.

$ md1 = /
(?<first> \d+ )
\s \g<first>
/x.match( "11 12" )
#<MatchData "11 12" first:"12">

$ md1[:first]
"12"


$ md1 = /
(?<first> \d+ )
(?: \s \g<first> )?
/x.match( "11 12" )
#<MatchData "11 12" first:"12">

$ md1[:first]
"12"


$ md1 = /
(?<first> \d+ )
(?: \s
(?<second> \g<first> )
)?
/x.match( "11 12" )
#<MatchData "11 12" first:"12" second:"12">


$ md1[:first]
"12"

$ md1[:second]
"12"


$ md1 = /
(?: (?<first> \d+ )\s* )+
/x.match( "11 12" )
#<MatchData "11 12" first:"12">

$ md1[:first]
"12"

Iain



From: w_a_x_man on
On Jul 8, 6:20 am, Iain Barnett <iainsp...(a)gmail.com> wrote:
> I'm trying to emulate something I've done in .Net many moons ago, which is capture a named group, but not just once, get all it's repetitions and then be able to see all those repetitions. I think they call them GroupCollections in C#. This is the kind of code I'm trying to emulate with Ruby(1.9.1):
>
> using System;
> using System.Text.RegularExpressions;
>
> public class Test
> {
>
>     public static void Main ()
>     {
>
>         // Define a regular expression for repeated words.
>         Regex rx = new Regex(@"\b(?<word>\w+)\s+(\k<word>)\b",
>           RegexOptions.Compiled | RegexOptions.IgnoreCase);
>
>         // Define a test string.        
>         string text = "The the quick brown fox  fox jumped over the lazy dog dog.";
>
>         // Find matches.
>         MatchCollection matches = rx.Matches(text);
>
>         // Report the number of matches found.
>         Console.WriteLine("{0} matches found in:\n   {1}",
>                           matches.Count,
>                           text);
>
>         // Report on each match.
>         foreach (Match match in matches)
>         {
>             GroupCollection groups = match.Groups;
>             Console.WriteLine("'{0}' repeated at positions {1} and {2}",  
>                               groups["word"].Value,
>                               groups[0].Index,
>                               groups[1].Index);
>         }
>
>     }
>
> }
>
> // The example produces the following output to the console:
> //       3 matches found in:
> //          The the quick brown fox  fox jumped over the lazy dog dog.
> //       'The' repeated at positions 0 and 4
> //       'fox' repeated at positions 20 and 25
> //       'dog' repeated at positions 50 and 54
>
> For example, if I had the string "11 12" I could have a regex like
> /
>  (?<first> \d+ ) \s \g<first>
> /x
>  that captured "11" and then the repetition "12" and put them in an array (or some kind of collection) referenced by the name.
>
> I think my attempts to get this to work are better explanations. What I want is the result
> #<MatchData "11 12" first:["11", "12"]> or something like it. At the moment all my attempts end with the named capture only keeping the last match it made i.e. 12 with no mention of 11.
>
> I know I could do this a different way, perhaps with split or something, but I'd like to know if it's possible with just regex. I understand the Oniguruma engine is used now but I can't find any good docs for it.
>
> These are my attempts, $ is my prompt.
>
> $ md1 = /
>                 (?<first> \d+ )
>                 \s \g<first>
>             /x.match( "11 12" )
> #<MatchData "11 12" first:"12">
>
> $ md1[:first]
> "12"
>
> $ md1 = /
>                 (?<first> \d+ )
>                 (?: \s \g<first> )?
>         /x.match( "11 12" )
> #<MatchData "11 12" first:"12">
>
> $ md1[:first]
> "12"
>
> $ md1 = /
>                 (?<first> \d+ )
>                 (?: \s
>                         (?<second> \g<first> )
>                 )?
>         /x.match( "11 12" )
> #<MatchData "11 12" first:"12" second:"12">
>
> $ md1[:first]
> "12"
>
> $ md1[:second]
> "12"
>
> $ md1 = /
>         (?: (?<first> \d+ )\s* )+
>       /x.match( "11 12" )
> #<MatchData "11 12" first:"12">
>
> $ md1[:first]
> "12"
>
> Iain

"The the quick brown fox fox jumped over the lazy dog dog.".
scan(/((\w+) +\2)/i){|x| puts "#{ x[0] } #{ $~.offset(0)[0]}"}
The the 0
fox fox 20
dog dog 50
From: Iain Barnett on

On 8 Jul 2010, at 16:15, w_a_x_man wrote:
>>
>
> "The the quick brown fox fox jumped over the lazy dog dog.".
> scan(/((\w+) +\2)/i){|x| puts "#{ x[0] } #{ $~.offset(0)[0]}"}
> The the 0
> fox fox 20
> dog dog 50
>

Thanks for that. That would certainly work to a degree, much better than my current alternative, but it nullifies the usefulness of named captures. For example, I can't call

$ md1[:first]

and get back all the matches for the (?<first> ) grouping, which would be phenomenally useful, because scan returns arrays of strings and not matchdata.


Iain
From: botp on
On Fri, Jul 9, 2010 at 12:38 AM, Iain Barnett <iainspeed(a)gmail.com> wrote:
> Thanks for that. That would certainly work to a degree, much better than my current alternative, but it nullifies the usefulness of named captures. For example, I can't call
>
> $ md1[:first]

wait till you call the 21st ;-)

> and get back all the matches for the (?<first> ) grouping, which would be phenomenally useful, because scan returns arrays of strings and not matchdata.
>

waxman hinted the $~

try eg,


s
#=> "The the quick brown fox fox jumped over the lazy dog dog."
m=[]
#=> []
s.scan(/((\w+) +\2)/i){|x| m << $~}
#=> "The the quick brown fox fox jumped over the lazy dog dog."
m.size
#=> 3
m[0]
#=> #<MatchData "The the" 1:"The the" 2:"The">
m[0].offset 0
#=> [0, 7]
m[0].offset

.... and so fort..

best regards -botp

From: Iain Barnett on

On 8 Jul 2010, at 18:01, botp wrote:
>
>> and get back all the matches for the (?<first> ) grouping, which would be phenomenally useful, because scan returns arrays of strings and not matchdata.
>>
>
> waxman hinted the $~
> ...
>
> best regards -botp
>

Ok, I get it now. Thanks for the extra nudge (bang on the head:)

Iain