Prev: intolerant HTML parser
Next: how to fix bugs (python)
From: hzhuo1 on 8 Feb 2010 00:17 > > Please check out this example on the pyparsing wiki, invRegex.py:http://pyparsing.wikispaces.com/file/view/invRegex.py. This code > implements a generator that returns successive matching strings for > the given regex. Running it, I see that you actually have a typo in > your example. > > >>> print list(invert("[1|2|3]{2}")) > > ['11', '1|', '12', '13', '|1', '||', '|2', '|3', '21', '2|', '22', > '23', '31', '3|', '32', '33'] > > I think you meant either "[123]{2}" or "(1|2|3){2}". > > >>> print list(invert("[123]{2}")) > > ['11', '12', '13', '21', '22', '23', '31', '32', '33'] > > >>> print list(invert("(1|2|3){2}")) > > ['11', '12', '13', '21', '22', '23', '31', '32', '33'] > > Of course, as other posters have pointed out, this inverter does not > accept regexen with unbounded multiple characters '+' or '*', but '?' > and "{min,max}" notation will work. Even '.' is supported, although > this can generate a large number of return values. > > Of course, you'll also have to install pyparsing to get this to work. > > -- Paul Hi Paul, Thanks very much. This is exactly what I need now. I will check this function. Zhuo
From: Gabriel Genellina on 8 Feb 2010 05:19
En Mon, 08 Feb 2010 02:17:59 -0300, hzhuo1(a)gmail.com <hzhuo1(a)gmail.com> escribi�: >> Please check out this example on the pyparsing wiki, >> invRegex.py:http://pyparsing.wikispaces.com/file/view/invRegex.py. >> This code >> implements a generator that returns successive matching strings for >> the given regex. [...] >> Of course, as other posters have pointed out, this inverter does not >> accept regexen with unbounded multiple characters '+' or '*', but '?' >> and "{min,max}" notation will work. Even '.' is supported, although >> this can generate a large number of return values. > > Thanks very much. This is exactly what I need now. I will check this > function. > Here you have another approach based on [1]. This is a generator-based approach, yielding all strings in increasing length order. In principle it can handle unbounded repetitions, except as written the maximum recursion limit is shortly reached (the original code is in Haskell, I almost blindly translated it into Python; certainly it can be rewritten more efficiently) You have to parse the R.E. and generate the corresponding function calls to the merge/prod/closure functions -- pyparsing certainly can help with that. "ab" becomes prod(a,b), "a|b" becomes merge(a,b), and "a*" becomes closure(a) By example, to find the language defined by this expression "(a|bc)*d", one has to evaluate: prod( closure( merge(['a'], prod(['b'],['c']))), ['d'] ) wich yields these strings: d ad aad bcd aaad abcd bcad .... bcbcbcbcaad bcbcbcbcbcd aaaaaaaaaaad and after 234 results aborts with a recursion error :( [1] http://www.cs.utexas.edu/users/misra/Notes.dir/RegExp.pdf -- Gabriel Genellina |