From: Chris McDonald on 21 Jan 2010 12:02 Hello All, [please excuse the Subject: line, as I'm unsure of the best description] I'm seeking pointers to a C library that provides basic-regular-expression (BRE) pattern matching *and* permits me to define the equality of atoms. C's standard qsort() function is able to sort vectors of objects by calling-back to the user to ask about the relative order of two objects. To permit sorting of arbitrary objects, the caller passes to qsort() the length of each object, and two pointers are passed back to the user-provided comparison function. I'm seeking something similar for regular-expressions, appreciating that some features (such as back-patterns) may become impossible. At the heart of RE implementations are hundreds of inline comparisons to check if char1 == char2. However, I would like: - char1 and char2 to be my objects, not characters, and - for my comparison function to be called each time == is required. Because we're no longer comparing characters, we can't simply provide them in the RE to be matched. Thus I'm imagining a mechanism where identifiers in the pattern represent members of the input alphabet. For example, we have an alphabet vector of alphabet[] = {obj1, obj2, obj3}, and we're seeking the regular expression "1.*[23]" where the 1,2, and 3 represent the 1st, 2nd, 3rd objects from the alphabet. A call to int match( const char *pattern, size_t objectSize, void *inputAlphabet, size_t nAlphabet, void *inputVector, size_t lenInput, int(*compareObjects)(const void *, const void *)); will call my compareObjects() function many times, and it will return 0/1. Even better (for my application) would be if the user-code retained its own alphabet, reducing this to: int match( const char *pattern, void *inputVector, size_t lenInput, size_t objectSize, int(*compareObjects)(int alphabetIndex, const void *element)); or even, best of all: int match( const char *pattern, size_t lenInput, int(*compareObjects)(int alphabetIndex, int inputIndex)); Using this last approach, the value of alphabetIndex could represent a private function/predicate requiring evaluation (which match() certainly doesn't care about) e.g. (ignoring errors): int match(...) { return (predicates[alphabetIndex])(inputVector[inputIndex]); } Googling has uncovered - the TRE library (http://laurikari.net/tre/documentation/reguexec/) - and Ragel (http://www.complang.org/ragel/), promising but neither quite, or easily, meet my requirements. I'm not expecting the perfect library, and am quite willing to investigate and modify. Does anyone know of any suitable/similar library? Thanks in advance, ______________________________________________________________________________ Dr Chris McDonald E: chris(a)csse.uwa.edu.au Computer Science & Software Engineering W: http://www.csse.uwa.edu.au/~chris The University of Western Australia, M002 T: +618 6488 2533 Crawley, Western Australia, 6009 F: +618 6488 1089 -- comp.lang.c.moderated - moderation address: clcm(a)plethora.net -- you must have an appropriate newsgroups line in your header for your mail to be seen, or the newsgroup name in square brackets in the subject line. Sorry.
From: Jasen Betts on 24 Jan 2010 13:27 On 2010-01-21, Chris McDonald <chris(a)csse.uwa.edu.au> wrote: > Hello All, > > [please excuse the Subject: line, as I'm unsure of the best description] > > I'm seeking pointers to a C library that provides basic-regular-expression (BRE) > pattern matching *and* permits me to define the equality of atoms. > Dr Chris McDonald E: chris(a)csse.uwa.edu.au > Computer Science & Software Engineering W: http://www.csse.uwa.edu.au/~chris maybe get a postgrad student to write you one :) --- news://freenews.netfront.net/ - complaints: news(a)netfront.net --- -- comp.lang.c.moderated - moderation address: clcm(a)plethora.net -- you must have an appropriate newsgroups line in your header for your mail to be seen, or the newsgroup name in square brackets in the subject line. Sorry.
From: Pascal J. Bourguignon on 24 Jan 2010 13:50 Chris McDonald <chris(a)csse.uwa.edu.au> writes: > I'm seeking pointers to a C library that provides basic-regular-expression (BRE) > pattern matching *and* permits me to define the equality of atoms. By the way, using regex(3), you can easily define the "equality" of your atoms and match regular expressions for any kind of objects, as long as you have less than 256 classes of objects. -- __Pascal Bourguignon__ http://www.informatimago.com/ -- comp.lang.c.moderated - moderation address: clcm(a)plethora.net -- you must have an appropriate newsgroups line in your header for your mail to be seen, or the newsgroup name in square brackets in the subject line. Sorry.
From: Victor Porton on 24 Jan 2010 15:20 On Jan 24, 8:50 pm, p...(a)informatimago.com (Pascal J. Bourguignon) wrote: > Chris McDonald <ch...(a)csse.uwa.edu.au> writes: > > I'm seeking pointers to a C library that provides basic-regular-expression (BRE) > > pattern matching *and* permits me to define the equality of atoms. > > By the way, using regex(3), you can easily define the "equality" of > your atoms and match regular expressions for any kind of objects, as > long as you have less than 256 classes of objects. One more (maybe stupid) idea: Use UTF-8 to encode more than 256 objects. -- comp.lang.c.moderated - moderation address: clcm(a)plethora.net -- you must have an appropriate newsgroups line in your header for your mail to be seen, or the newsgroup name in square brackets in the subject line. Sorry.
From: Pascal J. Bourguignon on 25 Jan 2010 13:49 Victor Porton <porton.victor(a)gmail.com> writes: > On Jan 24, 8:50�pm, p...(a)informatimago.com (Pascal J. Bourguignon) > wrote: >> Chris McDonald <ch...(a)csse.uwa.edu.au> writes: >> > I'm seeking pointers to a C library that provides basic-regular-expression (BRE) >> > pattern matching *and* permits me to define the equality of atoms. >> >> By the way, using regex(3), you can easily define the "equality" of >> your atoms and match regular expressions for any kind of objects, as >> long as you have less than 256 classes of objects. > > One more (maybe stupid) idea: Use UTF-8 to encode more than 256 > objects. That could work, but you have to be extra careful when composing the regular expression. This could be done since to generate the regular expression from random objects you would have to have an API in any case. Specifically, If you want to match "�*" you actually get the UTF-8 string: char regexp[]={195,169,42,0}; which doesn't mean the same thing. Unfortunately, it's not a simple matter of using groups: "\\(�\\)*" {92, 40, 195, 169, 92, 41, 42, 0}, since adding a group shifts the numbers of all the following the groups, so you have to compensate. You also have similar problems in brackets, "[e�]" doesn't mean what you want, you have to convert it to an alternative: "\\(e\\|�\\)". -- __Pascal Bourguignon__ http://www.informatimago.com/ -- comp.lang.c.moderated - moderation address: clcm(a)plethora.net -- you must have an appropriate newsgroups line in your header for your mail to be seen, or the newsgroup name in square brackets in the subject line. Sorry.
|
Next
|
Last
Pages: 1 2 Prev: إلحقووووو حقيقه مش خيال إكسب لاب توب مجانى المقدم من شركة EZLapTop Next: spam this guy, please |