From: tom.rmadilo on 14 Jun 2010 12:30 On Jun 14, 12:20 am, JHJL <j...(a)hippospace.com> wrote: > On Jun 13, 5:58 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > > On Jun 13, 5:37 am, JHJL <j...(a)hippospace.com> wrote: > Many thanks Tom for the above kick in the right direction and to > Andreas, for what is an incredible body of work, greatly appreciated. > > Now that I can generate an AST from the example I just have towrk out > how to cook it :) Note that I was unable to get the "oo" output to work. One of the required packages was/is missing. Maybe the work hasn't been done yet. However, the "snit" output works. There are comments in the generated "oo" parser indicating something needs to be inherited from snit or written. I'll post a test script which helped me print out information on the result of parsing a text (and how to do the parse once you have the snit script). I'd like to put out one example for snit that uses something other than the calculator example. If anyone has a working example of this code which they can release, please do so. I think a reasonable goal would be to provide an example of parsing and transforming JSON since it is actually used in the pt package, but the parser is hand written.
From: tom.rmadilo on 14 Jun 2010 13:00 On Jun 14, 9:30 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > On Jun 14, 12:20 am, JHJL <j...(a)hippospace.com> wrote: > > > On Jun 13, 5:58 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > > > On Jun 13, 5:37 am, JHJL <j...(a)hippospace.com> wrote: > > Many thanks Tom for the above kick in the right direction and to > > Andreas, for what is an incredible body of work, greatly appreciated. > > > Now that I can generate an AST from the example I just have towrk out > > how to cook it :) > > Note that I was unable to get the "oo" output to work. One of the > required packages was/is missing. Maybe the work hasn't been done yet. > > However, the "snit" output works. There are comments in the generated > "oo" parser indicating something needs to be inherited from snit or > written. > > I'll post a test script which helped me print out information on the > result of parsing a text (and how to do the parse once you have the > snit script). > > I'd like to put out one example for snit that uses something other > than the calculator example. > > If anyone has a working example of this code which they can release, > please do so. I think a reasonable goal would be to provide an example > of parsing and transforming JSON since it is actually used in the pt > package, but the parser is hand written. Okay, here is the example script I used (grammar.peg is the calculator example): # calculator-snit.tcl package require pt::pgen package require fileutil set script [pt::pgen peg [fileutil::cat grammar.peg] snit -class calc] puts "$script" tom(a)boron:~/activetcl/test$ ../bin/tclsh calculator-snit.tcl > calc.snit.tcl tom(a)boron:~/activetcl/test$ ../bin/tclsh % source calc.snit.tcl % calc constructor ::constructor % ::constructor parset "120+5" Expression 0 4 {Factor 0 4 {Term 0 2 {Number 0 2 {Digit 0 0} {Digit 1 1} {Digit 2 2}}} {AddOp 3 3} {Term 4 4 {Number 4 4 {Digit 4 4}}}} To get info about the ast, look at the object commands in pt::rde: http://docs.activestate.com/activetcl/8.5/tcllib/pt/pt_rdengine.html I edited the ::calc constructor method: constructor {} { # Create the runtime supporting the parsing process. set myparser [pt::rde ${selfns}::ENGINE] puts stderr "myparser = '$myparser'" return } which gave me the parser object name (::calc::Snit_inst1::ENGINE) Then you can do: % ::calc::Snit_inst1::ENGINE ast (or other pt::rde method).
From: tom.rmadilo on 14 Jun 2010 21:42 On Jun 14, 9:30 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > I'd like to put out one example for snit that uses something other > than the calculator example. > > If anyone has a working example of this code which they can release, > please do so. I think a reasonable goal would be to provide an example > of parsing and transforming JSON since it is actually used in the pt > package, but the parser is hand written. Here is an additional grammar for the simple data language rmadilo. The grammar doesn't capture all the edge cases, which really only amounts to extra whitespace and restricted byte sequences: # rmadilo2.peg PEG rmadilo (Document) Document <- ( ATTRIBUTE / ELEMENT )+ ; void: Apostroph <- "'" ; Char <- [-+0-9a-zA-Z:()_.\]\[ /$%&*!@#]; SpecialChars <- "\\" [nrt'\\]; leaf: CHARS <- (Char / SpecialChars)+; leaf: NAME <- (Char / SpecialChars)+; leaf: VALUE <- (Char / SpecialChars)+; leaf: ELNAME <- (Char / SpecialChars)+; ATTRIBUTE <- WHITESPACE Apostroph NAME Apostroph VALUE Apostroph EOL; ELEMENTBEGIN <- WHITESPACE Apostroph ELNAME Apostroph EOL; ELEMENTEND <- WHITESPACE Apostroph Apostroph EOL; ELEMENT <- ELEMENTBEGIN ( ATTRIBUTE / ELEMENT)* ELEMENTEND; void: WHITESPACE <- (" " / "\t" / EOL)*; void: EOL <- "\n\r" / "\n" / "\r"; void: EOF <- !. ; END; # rmadilo2-snit.tcl package require pt::pgen package require fileutil set script [pt::pgen peg [fileutil::cat rmadilo2.peg] snit -class rmadilo] puts "$script" Then do: tom(a)boron:~$ ../../bin/tclsh rmadilo2-snit.tcl > rmadilo.snit.tcl And then parse your text like so: # rmadilo.tcl source rmadilo.snit.tcl rmadilo constructor set data {'top' 'abcd' 'a(g\\\')'b' 'e'f:t' 'time'10:22:13.25' 'code'[set x y]' 'url'http://www.google.com/#top' 'email'tom(a)google.com' 'pattern'[a-z]' '' 'weird' '' '' } set ast [::constructor parset $data ] puts $ast The result looks something like this: Document 0 179 {ELEMENT 0 179 {ELEMENTBEGIN 0 5 {ELNAME 1 3}} {ELEMENT 6 162 {ELEMENTBEGIN 6 13 {ELNAME 8 11}} {ATTRIBUTE 14 28 {NAME 17 24} {VALUE 26 26}} {ATTRIBUTE 29 38 {NAME 32 32} {VALUE 34 36}} {ATTRIBUTE 39 59 {NAME 42 45} {VALUE 47 57}} {ATTRIBUTE 60 78 {NAME 63 66} {VALUE 68 76}} {ATTRIBUTE 79 113 {NAME 82 84} {VALUE 86 111}} {ATTRIBUTE 114 138 {NAME 117 121} {VALUE 123 136}} {ATTRIBUTE 139 158 {NAME 144 150} {VALUE 152 156}} {ELEMENTEND 159 162}} {ELEMENT 163 176 {ELEMENTBEGIN 163 171 {ELNAME 165 169}} {ELEMENTEND 172 176}} {ELEMENTEND 177 179}}
From: Fredrik Karlsson on 15 Jun 2010 04:38 Hi, I've played with it a bit now, and I've gotten a snit parser which parses according to the grammar I gave it. I gave up on TclOO as the parser produced seems to depend on two packages I cannot get(?). However, the legacy language I have to parse has one part of the syntax which I think causes left recursion. "Module=a" is a valid expression, as is "Module=a & Module=b..." which should be interpreted as "(Module=a) & (Module=b)..." in the parsing. For some reason, I get "Module=a" parsed in both examples, but the rest of the expression in the second expression is left out and not parsed at all. Is there a solution, you think? /Fredrik On Jun 10, 2:09 am, Andreas Kupries <akupr...(a)shaw.ca> wrote: > Fredrik Karlsson <dargo...(a)gmail.com> writes: > > Hi, > > > This looks very promising indeed, but still at bit opaque (at least > > for me). Given that I have a grammar in PEG format in the file > > grammar.peg, how do I get a TclOO that would be a parser for that > > grammar? > > > What I have is: > > ---- > > set peg [read [open grammar.peg r] ] > > set gram [pt::peg::import $peg] > > --- > > > , but how to get the parser? > > And here I thought that I had an example of that in the > docs. Apparently I confused myself with the pt_parser_api manpage, > which has an example of the use, when you have the class. > > Ok. > ============================================== > package require pt::pgen > package require fileutil > > set script [pt::pgen \ > peg [fileutil::cat grammar.peg] > oo \ > -file grammar.peg \ > -name name-of-grammar \ > -user $tcl_platform(user) \ > -class name-of-oo-class] > # Options are specific to the output format. > > fileutil::writeFile your-tcl-file.tcl $script > ============================================== > > There is also the 'pt' application, a light wrapper around pt::pgen. > > ============================================== > pt generate \ > peg grammar.peg > oo your-tcl-file.tcl \ > -name name-of-grammar \ > -class name-of-oo-clas > ============================================== > > HTH. > > > /Fredrik > > > On Jun 8, 7:46 am, Andreas Kupries <akupr...(a)shaw.ca> wrote: > >> I recently created a replacement, 'pt', also in tcllib. > > >> Documentation starts here > > >> http://docs.activestate.com/activetcl/8.5/tcllib/pt/pt_introduction.html > >> and http://docs.activestate.com/activetcl/8.5/tcllib/pt/pt.html > > >> Hopefully that is better. > > -- > So long, > Andreas Kupries <akupr...(a)shaw.ca> > <http://www.purl.org/NET/akupries/> > Developer @ <http://www.activestate.com/> > --------------------------------------------------------------------------- ----
From: tom.rmadilo on 15 Jun 2010 12:40
On Jun 15, 1:38 am, Fredrik Karlsson <dargo...(a)gmail.com> wrote: > Hi, > > I've played with it a bit now, and I've gotten a snit parser which > parses according to the grammar I gave it. I gave up on TclOO as the > parser produced seems to depend on two packages I cannot get(?). > > However, the legacy language I have to parse has one part of the > syntax which I think causes left recursion. > "Module=a" is a valid expression, as is > "Module=a & Module=b..." > which should be interpreted as "(Module=a) & (Module=b)..." in the > parsing. For some reason, I get "Module=a" parsed in both examples, > but the rest of the expression in the second expression is left out > and not parsed at all. > > Is there a solution, you think? The hard part is always convincing the parser what is valid and what isn't. However, it finally dawned on me that the better example of writing a PEG might be the one used for parsing a PEG. It has more information. The interesting thing about this type of parsing is that you don't have a separate tokenizer/lexer. Then you have language grammar components which have different associative rules, like shown in the calculator example with +/- and *. Another important difference with this type of parser is that it must validate the entire document at once. This seems to make tracking down parsing errors somewhat difficult. I noticed two different types of errors in development: is the PEG valid? Don't know exactly how to prove this, there seem to be some API for creating a canonical representation of the PEG, but they don't seem to work with my example, which correctly parses rmadilo documents. But if the PEG itself has an error, at least this can be used to fix the PEG. The other error is when your parser doesn't parse what you think is a valid document. How to test this? What I am doing is to put the document into a variable: % set data "....." Then from a tclsh prompt parse the document. You get an error with a number indicating the index of where the error occurred. Then you can do: % string range $data (some index prior to the error) (some index after the error) This could help you identify the exact character which is throwing off your parser. Stack traces don't offer me much help since they come from an auto-generated parser, but |