From: Nathan Harmston on 15 Mar 2010 08:21 Hi, So I m trying to use a very large regular expression, basically I have a list of items I want to find in text, its kind of a conjunction of two regular expressions and a big list......not pretty. However everytime I try to run my code I get this exception: OverflowError: regular expression code size limit exceeded I understand that there is a Python imposed limit on the size of the regular expression. And although its not nice I have a machine with 12Gb of RAM just waiting to be used, is there anyway I can alter Python to allow big regular expressions? Could anyone suggest other methods of these kind of string matching in Python? I m trying to see if my swigged alphabet trie is faster than whats possible in Python! Many thanks, Nathan
From: Stefan Behnel on 15 Mar 2010 08:45 Nathan Harmston, 15.03.2010 13:21: > So I m trying to use a very large regular expression, basically I have > a list of items I want to find in text, its kind of a conjunction of > two regular expressions and a big list......not pretty. However > everytime I try to run my code I get this exception: > > OverflowError: regular expression code size limit exceeded > > I understand that there is a Python imposed limit on the size of the > regular expression. And although its not nice I have a machine with > 12Gb of RAM just waiting to be used, is there anyway I can alter > Python to allow big regular expressions? > > Could anyone suggest other methods of these kind of string matching in > Python? If what you are trying to match is in fact a set of strings instead of a set of regular expressions, you might find this useful: http://pypi.python.org/pypi/acora Stefan
From: Alain Ketterlin on 15 Mar 2010 08:50 Nathan Harmston <iwanttobeabadger(a)googlemail.com> writes: [...] > Could anyone suggest other methods of these kind of string matching in > Python? I m trying to see if my swigged alphabet trie is faster than > whats possible in Python! Since you mention using a trie, I guess it's just a big alternative of fixed strings. You may want to try using the Aho-Corasick variant. It looks like there are several implementations (google finds at least two). I would be surprised if any pure python solution were faster than tries implemented in C. Don't forget to tell us your findings. -- Alain.
From: MRAB on 15 Mar 2010 11:51 Nathan Harmston wrote: > Hi, > > So I m trying to use a very large regular expression, basically I have > a list of items I want to find in text, its kind of a conjunction of > two regular expressions and a big list......not pretty. However > everytime I try to run my code I get this exception: > > OverflowError: regular expression code size limit exceeded > > I understand that there is a Python imposed limit on the size of the > regular expression. And although its not nice I have a machine with > 12Gb of RAM just waiting to be used, is there anyway I can alter > Python to allow big regular expressions? > > Could anyone suggest other methods of these kind of string matching in > Python? I m trying to see if my swigged alphabet trie is faster than > whats possible in Python! > There's the regex module at http://pypi.python.org/pypi/regex. It'll even release the GIL while matching on strings! :-)
|
Pages: 1 Prev: Hacker News, Xahlee.Org, and What is Politics? Next: Dreaming of new generation IDE |