From: Nathan Harmston on 31 Mar 2010 06:49 Hi everyone, I have a slightly complicated/medium sized regular expression and I want to generate all possible words that it can match (to compare performance of regex against an acora based matcher). Using the regular expression as a grammar to generate all words in its language. I was wondering if this possible in Python or possible using anything. Google doesnt seem to give any obvious answers. Many thanks in advance, Nathan
From: Gabriel Genellina on 31 Mar 2010 07:58 En Wed, 31 Mar 2010 07:49:14 -0300, Nathan Harmston <iwanttobeabadger(a)googlemail.com> escribi�: > I have a slightly complicated/medium sized regular expression and I > want to generate all possible words that it can match (to compare > performance of regex against an acora based matcher). Using the > regular expression as a grammar to generate all words in its language. > I was wondering if this possible in Python or possible using anything. > Google doesnt seem to give any obvious answers. I've done this some time ago. This recipe http://code.activestate.com/recipes/577041-merge-multiple-potentially-infinite-sorted-inputs-/ provides an infinite merge operation, required for effectively enumerating a regular language (with shorter strings first, else 'a*b' would get stuck repeating "a"s and never yielding any "b"). The example at the end shows the basics of how to use it (by hand) in a very simple case. To enumerate all strings matching '(a|bc)*' one should invoke closure(product(['a'],['b','c'])), which gives: '', 'a', 'aa', 'bc', 'aaa', 'abc', 'bca', 'aaaa', 'aabc', 'abca', 'bcaa', 'bcbc', 'aaaaa', 'aaabc', 'aabca', 'abcaa', 'abcbc', 'bcaaa', 'bcabc', 'bcbca', .... Converting the former expression into the later function calls requires a parser (not shown -- and I won't be able to find it until Friday) -- Gabriel Genellina
From: Tim Chase on 31 Mar 2010 08:01 Nathan Harmston wrote: > I have a slightly complicated/medium sized regular expression > and I want to generate all possible words that it can match > (to compare performance of regex against an acora based > matcher). Using the regular expression as a grammar to > generate all words in its language. I was wondering if this > possible in Python or possible using anything. Google doesnt > seem to give any obvious answers. Unless you limit your regexp to bounded regular expressions (you don't use the "*", "+" and unbounded-top-end "{...}" tokens), it has an infinite number of possible words that can match. As a simple example, the regexp "a*" matches "" (the empty string), "a", "aa", "aaa", "aaaa"...and so on up to the limit of a string length. There was a discussion on this a while back: http://mail.python.org/pipermail/python-list/2010-February/1235120.html so you might chat with the person who started that thread. You basically have to write a regexp parser that generates code to generate the expressions, and even simple expressions such as "[a-z]{1,30}" can create an astronomical number of possible inputs. -tkc
From: Grant Edwards on 31 Mar 2010 10:13 On 2010-03-31, Nathan Harmston <iwanttobeabadger(a)googlemail.com> wrote: > I have a slightly complicated/medium sized regular expression and I > want to generate all possible words that it can match > > I was wondering if this possible in Python or possible using > anything. Google doesnt seem to give any obvious answers. We did this one a couple weeks ago. It's not possible in the general case (there are an infinite number of matching words for many/most regular expressions). -- Grant Edwards grant.b.edwards Yow! ... the MYSTERIANS are at in here with my CORDUROY gmail.com SOAP DISH!!
From: Paul McGuire on 31 Mar 2010 11:23 On Mar 31, 5:49 am, Nathan Harmston <iwanttobeabad...(a)googlemail.com> wrote: > Hi everyone, > > I have a slightly complicated/medium sized regular expression and I > want to generate all possible words that it can match (to compare > performance of regex against an acora based matcher). The pyparsing wiki Examples page includes this regex inverter: http://pyparsing.wikispaces.com/file/view/invRegex.py From the module header: # Supports: # - {n} and {m,n} repetition, but not unbounded + or * repetition # - ? optional elements # - [] character ranges # - () grouping # - | alternation -- Paul
|
Next
|
Last
Pages: 1 2 Prev: No editbox () functionality in python-dialog module? Next: Video is very funny..hhhhhhhhhhhh |