Help with regular expression [Perl]

Prev: FAQ 4.1 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
Next: FAQ 3.24 Why don't Perl one-liners work on my DOS/Mac/VMS system?

From: Eric Pozharski on 2 Aug 2010 03:06

with <slrni5bs74.bff.hjp-usenet2(a)hrunkner.hjp.at> Peter J. Holzer wrote:
*SKIP*
> I agree with Eric: Write a proper grammar and use that to parse your
> expressions. If you've ever heard of BNF, using Parse::Yapp or
> Parse::RecDescent shouldn't be too hard (I prefer the former, although
> the docs assume that you are already familiar with yacc).

Passed to Ted (Zlatanov). I've learned that from him.

*CUT*

--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom

From: Ted Zlatanov on 2 Aug 2010 11:05

On Mon, 02 Aug 2010 10:06:12 +0300 Eric Pozharski <whynot(a)pozharski.name> wrote:

EP> with <slrni5bs74.bff.hjp-usenet2(a)hrunkner.hjp.at> Peter J. Holzer wrote:
EP> *SKIP*
>> I agree with Eric: Write a proper grammar and use that to parse your
>> expressions. If you've ever heard of BNF, using Parse::Yapp or
>> Parse::RecDescent shouldn't be too hard (I prefer the former, although
>> the docs assume that you are already familiar with yacc).

EP> Passed to Ted (Zlatanov). I've learned that from him.

I don't understand, do you mean I should answer? I haven't been
following this thread carefully.

Ted

From: Eric Pozharski on 3 Aug 2010 03:22

with <87y6cpf41o.fsf(a)lifelogs.com> Ted Zlatanov wrote:
> On Mon, 02 Aug 2010 10:06:12 +0300 Eric Pozharski <whynot(a)pozharski.name> wrote:
>
> EP> with <slrni5bs74.bff.hjp-usenet2(a)hrunkner.hjp.at> Peter J. Holzer wrote:
> EP> *SKIP*
>>> I agree with Eric: Write a proper grammar and use that to parse your
>>> expressions. If you've ever heard of BNF, using Parse::Yapp or
>>> Parse::RecDescent shouldn't be too hard (I prefer the former,
>>> although the docs assume that you are already familiar with yacc).
>
> EP> Passed to Ted (Zlatanov). I've learned that from him.
>
> I don't understand, do you mean I should answer?

Probably it's too late -- the thread is dead, RIP.

> I haven't been following this thread carefully.

Shortly. I believe everyone is grnated right to have his/her
contribution to be admitted. In this case, I can exactly say from whom
I've learned the wisdom of grammar (before that thread, almost two yars
ago, I believed that the only thing that wouldn't knee in face RE and
split() is HTML). I was wrong.

--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom

From: Ted Zlatanov on 3 Aug 2010 11:54

On Mon, 2 Aug 2010 00:10:44 +0200 "Peter J. Holzer" <hjp-usenet2(a)hjp.at> wrote:

PJH> Then the problem cannot be solved with a real regular expression.
....
PJH> I agree with Eric: Write a proper grammar and use that to parse your
PJH> expressions. If you've ever heard of BNF, using Parse::Yapp or
PJH> Parse::RecDescent shouldn't be too hard (I prefer the former, although
PJH> the docs assume that you are already familiar with yacc).

I don't think even a grammar will help. The requirements are
fundamentally broken because there's more than one way to interpret
nested parens. The OP should explain what he's trying to do and give
real-world examples he needs parsed.

Also, a grammar is pretty slow compared to regular expressions. So I
always hesitate before recommending it for anything except
low-throughput situations, e.g. input submitted by a user or small
files.

Ted

From: Helmut Richter on 3 Aug 2010 12:40

On Tue, 3 Aug 2010, Ted Zlatanov wrote:

> I don't think even a grammar will help. The requirements are
> fundamentally broken because there's more than one way to interpret
> nested parens.

I do not think so:

Let X be the regular language of nonempty words not containing any
parentheses. Then the language L of words that are double-parenthesis
enclosed is:

L -> (( inside ))
inside -> inside1 | inside2
inside1 -> X | inside1 single-paren | inside1 X
inside2 -> single-paren X | single-paren single-paren | inside2 X
| inside2 single-paren
single-paren -> ( inside ) | ( )

"inside1" should be the language of all properly nested strings that do not
begin with "(", and "inside2" the language of all properly nested strings that
begin with "(" except when the last token is the matching ")".

Not that I find that grammar pretty or easy to parse -- but at least it is not
ambiguous.

> The OP should explain what he's trying to do and give
> real-world examples he needs parsed.

That should be a requirement for such weird questions.

--
Helmut Richter

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: FAQ 4.1 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
Next: FAQ 3.24 Why don't Perl one-liners work on my DOS/Mac/VMS system?