Parser generator (+ Lexer) that actually works? [TCL]

Prev: BLT
Next: how can i bind page up/down to a large option menu

From: tom.rmadilo on 14 Jun 2010 12:30

On Jun 14, 12:20 am, JHJL <j...(a)hippospace.com> wrote:
> On Jun 13, 5:58 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
> > On Jun 13, 5:37 am, JHJL <j...(a)hippospace.com> wrote:

> Many thanks Tom for the above kick in the right direction and to
> Andreas, for what is an incredible body of work, greatly appreciated.
>
> Now that I can generate an AST from the example I just have towrk out
> how to cook it :)

Note that I was unable to get the "oo" output to work. One of the
required packages was/is missing. Maybe the work hasn't been done yet.

However, the "snit" output works. There are comments in the generated
"oo" parser indicating something needs to be inherited from snit or
written.

I'll post a test script which helped me print out information on the
result of parsing a text (and how to do the parse once you have the
snit script).

I'd like to put out one example for snit that uses something other
than the calculator example.

If anyone has a working example of this code which they can release,
please do so. I think a reasonable goal would be to provide an example
of parsing and transforming JSON since it is actually used in the pt
package, but the parser is hand written.

From: tom.rmadilo on 14 Jun 2010 13:00

On Jun 14, 9:30 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
> On Jun 14, 12:20 am, JHJL <j...(a)hippospace.com> wrote:
>
> > On Jun 13, 5:58 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
> > > On Jun 13, 5:37 am, JHJL <j...(a)hippospace.com> wrote:
> > Many thanks Tom for the above kick in the right direction and to
> > Andreas, for what is an incredible body of work, greatly appreciated.
>
> > Now that I can generate an AST from the example I just have towrk out
> > how to cook it :)
>
> Note that I was unable to get the "oo" output to work. One of the
> required packages was/is missing. Maybe the work hasn't been done yet.
>
> However, the "snit" output works. There are comments in the generated
> "oo" parser indicating something needs to be inherited from snit or
> written.
>
> I'll post a test script which helped me print out information on the
> result of parsing a text (and how to do the parse once you have the
> snit script).
>
> I'd like to put out one example for snit that uses something other
> than the calculator example.
>
> If anyone has a working example of this code which they can release,
> please do so. I think a reasonable goal would be to provide an example
> of parsing and transforming JSON since it is actually used in the pt
> package, but the parser is hand written.

Okay, here is the example script I used (grammar.peg is the calculator
example):

# calculator-snit.tcl
package require pt::pgen
package require fileutil
set script [pt::pgen peg [fileutil::cat grammar.peg] snit -class
calc]

puts "$script"

tom(a)boron:~/activetcl/test$ ../bin/tclsh calculator-snit.tcl >
calc.snit.tcl
tom(a)boron:~/activetcl/test$ ../bin/tclsh
% source calc.snit.tcl
% calc constructor
::constructor
% ::constructor parset "120+5"
Expression 0 4 {Factor 0 4 {Term 0 2 {Number 0 2 {Digit 0 0} {Digit 1
1} {Digit 2 2}}} {AddOp 3 3} {Term 4 4 {Number 4 4 {Digit 4 4}}}}

To get info about the ast, look at the object commands in pt::rde:

http://docs.activestate.com/activetcl/8.5/tcllib/pt/pt_rdengine.html

I edited the ::calc constructor method:

constructor {} {
# Create the runtime supporting the parsing process.
set myparser [pt::rde ${selfns}::ENGINE]
puts stderr "myparser = '$myparser'"
return
}

which gave me the parser object name (::calc::Snit_inst1::ENGINE)

Then you can do:

% ::calc::Snit_inst1::ENGINE ast (or other pt::rde method).

From: tom.rmadilo on 14 Jun 2010 21:42

On Jun 14, 9:30 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:

> I'd like to put out one example for snit that uses something other
> than the calculator example.
>
> If anyone has a working example of this code which they can release,
> please do so. I think a reasonable goal would be to provide an example
> of parsing and transforming JSON since it is actually used in the pt
> package, but the parser is hand written.

Here is an additional grammar for the simple data language rmadilo.
The grammar doesn't capture all the edge cases, which really only
amounts to extra whitespace and restricted byte sequences:

# rmadilo2.peg

PEG rmadilo (Document)
Document <- ( ATTRIBUTE / ELEMENT )+ ;
void: Apostroph <- "'" ;

Char <- [-+0-9a-zA-Z:()_.\]\[ /$%&*!@#];
SpecialChars <- "\\" [nrt'\\];
leaf: CHARS <- (Char / SpecialChars)+;
leaf: NAME <- (Char / SpecialChars)+;
leaf: VALUE <- (Char / SpecialChars)+;
leaf: ELNAME <- (Char / SpecialChars)+;

ATTRIBUTE <- WHITESPACE Apostroph NAME Apostroph VALUE
Apostroph EOL;
ELEMENTBEGIN <- WHITESPACE Apostroph ELNAME Apostroph EOL;
ELEMENTEND <- WHITESPACE Apostroph Apostroph EOL;
ELEMENT <- ELEMENTBEGIN ( ATTRIBUTE / ELEMENT)* ELEMENTEND;

void: WHITESPACE <- (" " / "\t" / EOL)*;
void: EOL <- "\n\r" / "\n" / "\r";
void: EOF <- !. ;
END;

# rmadilo2-snit.tcl

package require pt::pgen
package require fileutil
set script [pt::pgen peg [fileutil::cat rmadilo2.peg] snit -class
rmadilo]

puts "$script"

Then do:

tom(a)boron:~$ ../../bin/tclsh rmadilo2-snit.tcl > rmadilo.snit.tcl

And then parse your text like so:
# rmadilo.tcl
source rmadilo.snit.tcl

rmadilo constructor

set data {'top'
'abcd'
'a(g\\\')'b'
'e'f:t'
'time'10:22:13.25'
'code'[set x y]'
'url'http://www.google.com/#top'
'email'tom(a)google.com'
'pattern'[a-z]'
''
'weird'

''
''
}
set ast [::constructor parset $data ]

puts $ast

The result looks something like this:

Document 0 179 {ELEMENT 0 179 {ELEMENTBEGIN 0 5 {ELNAME 1 3}} {ELEMENT
6 162 {ELEMENTBEGIN 6 13 {ELNAME 8 11}} {ATTRIBUTE 14 28 {NAME 17 24}
{VALUE 26 26}} {ATTRIBUTE 29 38 {NAME 32 32} {VALUE 34 36}} {ATTRIBUTE
39 59 {NAME 42 45} {VALUE 47 57}} {ATTRIBUTE 60 78 {NAME 63 66} {VALUE
68 76}} {ATTRIBUTE 79 113 {NAME 82 84} {VALUE 86 111}} {ATTRIBUTE 114
138 {NAME 117 121} {VALUE 123 136}} {ATTRIBUTE 139 158 {NAME 144 150}
{VALUE 152 156}} {ELEMENTEND 159 162}} {ELEMENT 163 176 {ELEMENTBEGIN
163 171 {ELNAME 165 169}} {ELEMENTEND 172 176}} {ELEMENTEND 177 179}}

From: Fredrik Karlsson on 15 Jun 2010 04:38

Hi,

I've played with it a bit now, and I've gotten a snit parser which
parses according to the grammar I gave it. I gave up on TclOO as the
parser produced seems to depend on two packages I cannot get(?).

However, the legacy language I have to parse has one part of the
syntax which I think causes left recursion.
"Module=a" is a valid expression, as is
"Module=a & Module=b..."
which should be interpreted as "(Module=a) & (Module=b)..." in the
parsing. For some reason, I get "Module=a" parsed in both examples,
but the rest of the expression in the second expression is left out
and not parsed at all.

Is there a solution, you think?

/Fredrik

On Jun 10, 2:09 am, Andreas Kupries <akupr...(a)shaw.ca> wrote:
> Fredrik Karlsson <dargo...(a)gmail.com> writes:
> > Hi,
>
> > This looks very promising indeed, but still at bit opaque (at least
> > for me). Given that I have a grammar in PEG format in the file
> > grammar.peg, how do I get a TclOO that would be a parser for that
> > grammar?
>
> > What I have is:
> > ----
> > set peg [read [open grammar.peg r] ]
> > set gram [pt::peg::import $peg]
> > ---
>
> > , but how to get the parser?
>
> And here I thought that I had an example of that in the
> docs. Apparently I confused myself with the pt_parser_api manpage,
> which has an example of the use, when you have the class.
>
> Ok.
> ==============================================
> package require pt::pgen
> package require fileutil
>
> set script [pt::pgen \
> peg [fileutil::cat grammar.peg]
> oo \
> -file grammar.peg \
> -name name-of-grammar \
> -user $tcl_platform(user) \
> -class name-of-oo-class]
> # Options are specific to the output format.
>
> fileutil::writeFile your-tcl-file.tcl $script
> ==============================================
>
> There is also the 'pt' application, a light wrapper around pt::pgen.
>
> ==============================================
> pt generate \
> peg grammar.peg
> oo your-tcl-file.tcl \
> -name name-of-grammar \
> -class name-of-oo-clas
> ==============================================
>
> HTH.
>
> > /Fredrik
>
> > On Jun 8, 7:46 am, Andreas Kupries <akupr...(a)shaw.ca> wrote:
> >> I recently created a replacement, 'pt', also in tcllib.
>
> >> Documentation starts here
>
> >> http://docs.activestate.com/activetcl/8.5/tcllib/pt/pt_introduction.html
> >> and http://docs.activestate.com/activetcl/8.5/tcllib/pt/pt.html
>
> >> Hopefully that is better.
>
> --
> So long,
> Andreas Kupries <akupr...(a)shaw.ca>
> <http://www.purl.org/NET/akupries/>
> Developer @ <http://www.activestate.com/>
> --------------------------------------------------------------------------- ----

From: tom.rmadilo on 15 Jun 2010 12:40

On Jun 15, 1:38 am, Fredrik Karlsson <dargo...(a)gmail.com> wrote:
> Hi,
>
> I've played with it a bit now, and I've gotten a snit parser which
> parses according to the grammar I gave it. I gave up on TclOO as the
> parser produced seems to depend on two packages I cannot get(?).
>
> However, the legacy language I have to parse has one part of the
> syntax which I think causes left recursion.
> "Module=a" is a valid expression, as is
> "Module=a & Module=b..."
> which should be interpreted as "(Module=a) & (Module=b)..." in the
> parsing. For some reason, I get "Module=a" parsed in both examples,
> but the rest of the expression in the second expression is left out
> and not parsed at all.
>
> Is there a solution, you think?

The hard part is always convincing the parser what is valid and what
isn't.

However, it finally dawned on me that the better example of writing a
PEG might be the one used for parsing a PEG. It has more information.
The interesting thing about this type of parsing is that you don't
have a separate tokenizer/lexer. Then you have language grammar
components which have different associative rules, like shown in the
calculator example with +/- and *. Another important difference with
this type of parser is that it must validate the entire document at
once. This seems to make tracking down parsing errors somewhat
difficult.

I noticed two different types of errors in development: is the PEG
valid? Don't know exactly how to prove this, there seem to be some API
for creating a canonical representation of the PEG, but they don't
seem to work with my example, which correctly parses rmadilo
documents. But if the PEG itself has an error, at least this can be
used to fix the PEG. The other error is when your parser doesn't parse
what you think is a valid document.

How to test this?

What I am doing is to put the document into a variable:

% set data "....."

Then from a tclsh prompt parse the document. You get an error with a
number indicating the index of where the error occurred. Then you can
do:

% string range $data (some index prior to the error) (some index after
the error)

This could help you identify the exact character which is throwing off
your parser. Stack traces don't offer me much help since they come
from an auto-generated parser, but

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: BLT
Next: how can i bind page up/down to a large option menu