Prev: FAQ 4.1 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
Next: FAQ 3.24 Why don't Perl one-liners work on my DOS/Mac/VMS system?
From: sln on 3 Aug 2010 14:20 On Sun, 1 Aug 2010 19:47:18 +0000 (UTC), Mark Hobley <markhobley(a)yahoo.donottypethisbit.co> wrote: >On Sun, 25 Jul 2010 22:04:25 +0200, Peter J. Holzer wrote: > >> Is this a match? >> >> (((1 + 2) * (3 +4))) > >Yes. That is a match. Then ((3 * bar) + ((foo))) - This is a match ((3 * bar) + ((foo))bar) - This is a match. is matched by ((foo)) only ... It can't be both ways. You have 2 requirements, balanced double parenths that does not include balanced single parenths. Pretty easy actually. Do you know regular expressions? -sln -------------------- use strict; use warnings; # Require (()) my $x = quotemeta '(('; my $y = quotemeta '))'; # Require not () my $m = quotemeta '('; my $n = quotemeta ')'; my $regex = qr/ ( $x (?: ( $m (?: (?>(?:(?!$x|$y|$m|$n).)+) | (?2) )* $n ) | (?>(?:(?!$x|$y|$m|$n).)+) | (?1) )* $y ) /xs; while (<DATA>) { if (/$regex/) { print "$1\n"; } } __DATA__ I need a regular expression with the following properties. I need to match text (typically, though not necessarily expressions) enclosed within double parentheses. However, I do not want to match nested single parentheses enclosed text. So ((*)) is a match, but ((*)*(*)) is not a match. Here are some examples to illustrate this. ((FOO)) - This is a match (()) - This is a match ((3 + 2)) - This is a match ((3 + 2) + (2 * foo)) - This is not a match ((3 * bar) + ((foo))) - This is a match ((3 * bar) + ((foo))bar) - This is a match. I hope that lot makes sense. Thanks in advance to anyone who can help. -- Mark Hobley Linux User: #370818 http://markhobley.yi.org/ --- news://freenews.netfront.net/ - complaints: news(a)netfront.net --- On Sun, 25 Jul 2010 22:04:25 +0200, Peter J. Holzer wrote: > Is this a match? > > (((1 + 2) * (3 +4))) Yes. That is a match. -- Mark Hobley Linux User: #370818 http://markhobley.yi.org/ --- news://freenews.netfront.net/ - complaints: news(a)netfront.net ---
From: Ted Zlatanov on 3 Aug 2010 14:42 On Tue, 3 Aug 2010 18:40:09 +0200 Helmut Richter <hhr-m(a)web.de> wrote: HR> On Tue, 3 Aug 2010, Ted Zlatanov wrote: >> I don't think even a grammar will help. The requirements are >> fundamentally broken because there's more than one way to interpret >> nested parens. HR> I do not think so: HR> Let X be the regular language of nonempty words not containing any HR> parentheses. Then the language L of words that are double-parenthesis HR> enclosed is: HR> L -> (( inside )) HR> inside -> inside1 | inside2 HR> inside1 -> X | inside1 single-paren | inside1 X HR> inside2 -> single-paren X | single-paren single-paren | inside2 X HR> | inside2 single-paren HR> single-paren -> ( inside ) | ( ) HR> "inside1" should be the language of all properly nested strings that do not HR> begin with "(", and "inside2" the language of all properly nested strings that HR> begin with "(" except when the last token is the matching ")". HR> Not that I find that grammar pretty or easy to parse -- but at least it is not HR> ambiguous. I didn't parse the requirements that way, but that's probably my error. Thanks for explaining. Ted
From: sln on 3 Aug 2010 14:57 On Tue, 03 Aug 2010 10:54:26 -0500, Ted Zlatanov <tzz(a)lifelogs.com> wrote: >On Mon, 2 Aug 2010 00:10:44 +0200 "Peter J. Holzer" <hjp-usenet2(a)hjp.at> wrote: > >PJH> Then the problem cannot be solved with a real regular expression. >... >PJH> I agree with Eric: Write a proper grammar and use that to parse your >PJH> expressions. If you've ever heard of BNF, using Parse::Yapp or >PJH> Parse::RecDescent shouldn't be too hard (I prefer the former, although >PJH> the docs assume that you are already familiar with yacc). > >I don't think even a grammar will help. The requirements are >fundamentally broken because there's more than one way to interpret >nested parens. The OP should explain what he's trying to do and give >real-world examples he needs parsed. > >Also, a grammar is pretty slow compared to regular expressions. So I >always hesitate before recommending it for anything except >low-throughput situations, e.g. input submitted by a user or small >files. > >Ted I don't think the requirements are broken at all. As soon as the OP said that neither of the double parenths can be part of an inner single parenth, it pretty much made it complete. Trivial or not I believe its a complete req, that can be done with a simple regular expression. The match will satisfy the requirements, however, outlier parenths may not be balanced relative to the match. Though, additional expressions could be added to balance the complete text. -sln
From: Helmut Richter on 3 Aug 2010 16:11 On Tue, 3 Aug 2010, Ted Zlatanov wrote: > On Tue, 3 Aug 2010 18:40:09 +0200 Helmut Richter <hhr-m(a)web.de> wrote: > > HR> On Tue, 3 Aug 2010, Ted Zlatanov wrote: > >> I don't think even a grammar will help. The requirements are > >> fundamentally broken because there's more than one way to interpret > >> nested parens. > > HR> I do not think so: > > HR> Let X be the regular language of nonempty words not containing any > HR> parentheses. No, this is an error, albeit easy to fix. X should be the language of one-token words where the token is not a parenthesis. Otherwise concatenating the X would introduce an ambiguity. > HR> Then the language L of words that are double-parenthesis > HR> enclosed is: > > HR> L -> (( inside )) > HR> inside -> inside1 | inside2 > HR> inside1 -> X | inside1 single-paren | inside1 X > HR> inside2 -> single-paren X | single-paren single-paren | inside2 X > HR> | inside2 single-paren > HR> single-paren -> ( inside ) | ( ) > > HR> "inside1" should be the language of all properly nested strings that do not > HR> begin with "(", and "inside2" the language of all properly nested strings that > HR> begin with "(" except when the last token is the matching ")". > > HR> Not that I find that grammar pretty or easy to parse -- but at least it is not > HR> ambiguous. Well, it *is* easy to parse: nearly LR(0) with the only exception that there is a minor shift-reduce conflict when ")" is encountered. So it is certainly LR(1). Writing the L rule as two rules makes it even a precedence grammar for several notions of precedence. This allows writing a parser by hand, whereas LR parsers should better be generated. > I didn't parse the requirements that way, but that's probably my error. Well, when I set up the grammar, there were ambiguities of interpretation of the requirements. This is just my interpretation. But at least I chose it because I found it to be the most plausible, and not in order that parsing be possible. I am not sure the extended notion of regexp in perl, which goes beyond regular languages, cannot be used to parse such a thing. After all, regexp handling involves backtracking, which is not normally considered a good technique in context-free parsing. -- Helmut Richter
From: sln on 3 Aug 2010 17:03
On Tue, 3 Aug 2010 22:11:16 +0200, Helmut Richter <hhr-m(a)web.de> wrote: >I am not sure the extended notion of regexp in perl, which goes beyond >regular languages, cannot be used to parse such a thing. After all, regexp >handling involves backtracking, which is not normally considered a good >technique in context-free parsing. But mixing expressions without backtracking with expressions with back tracking is a feature of extended notation. -sln |