From: Nathan Harmston on
Thanks everyone, the invRegexInf is perfect.

Thanks again,

Nathan

On 1 April 2010 10:17, Gabriel Genellina <gagsl-py2(a)yahoo.com.ar> wrote:
> En Wed, 31 Mar 2010 12:23:48 -0300, Paul McGuire <ptmcg(a)austin.rr.com>
> escribió:
>>
>> On Mar 31, 5:49 am, Nathan Harmston <iwanttobeabad...(a)googlemail.com>
>> wrote:
>>>
>>> I have a slightly complicated/medium sized regular expression and I
>>> want to generate all possible words that it can match (to compare
>>> performance of regex against an acora based matcher).
>>
>> The pyparsing wiki Examples page includes this regex inverter:
>> http://pyparsing.wikispaces.com/file/view/invRegex.py
>>
>>> From the module header:
>>
>> # Supports:
>> # - {n} and {m,n} repetition, but not unbounded + or * repetition
>> # - ? optional elements
>> # - [] character ranges
>> # - () grouping
>> # - | alternation
>
> I took the liberty of modifying your invRegex.py example, adding support
> for infinite repeaters. It depends on two other modules:
>
> mergeinf.py (from http://code.activestate.com/recipes/577041) provides the
> infinite merge operation.
>
> enumre.py provides the basic functions (merge, prod, repeat, closure)
> necessary to enumerate the language generated by a given regular
> expression, even if it contains unbounded repeaters like *,+.  The key is
> to generate shorter strings before longer ones, so in 'a*|b*' it doesn't
> get stuck generating infinite a's before any b.
>
> By example, "(a|bc)*d" corresponds to this code:
>
>      prod(
>        closure(
>          merge(
>            'a',
>             prod('b','c'))),
>        'd')
>
> which returns an infinite generator starting with:
>
> d
> ad
> aad
> bcd
> aaad
> abcd
> bcad
> aaaad
> aabcd
> abcad
> bcaad
> bcbcd
> aaaaad
> aaabcd
> aabcad
> ...
>
>
> I got the idea from
> http://userweb.cs.utexas.edu/users/misra/Notes.dir/RegExp.pdf
>
> Finally, invRegexInf.py is based on your original regex parser. I only
> modified the generation part, taking advantage of the above
> infrastructure; the parser itself remains almost the same. It essentially
> saves oneself the very tedious work of converting a regular expression
> into the equivalent sequence of function calls as shown above. (I hope I
> got it right: I like pyparsing a lot and use it whenever I feel it's
> appropriate, but not as often as to remember the details...)
>
> --
> Gabriel Genellina
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>