From: Joseph M. Newcomer on 18 May 2010 21:07 The counterexample is that he DID it. You'll have to ask him for proof. But I do lexical analysis all the time using DFAs and never once do I have a table. Years of experience taught me that they are faster than anything lex or analogous lexer generators can do. But I'll reserve judgment. joe On Tue, 18 May 2010 17:36:48 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: >On 5/18/2010 5:26 PM, Joseph M. Newcomer wrote: >> It is important to distinguish between the *concept* of DFA and the implementation of "a >> table-driven DFA" and the special case of "a table-drive DFA generated by the lex tool" >> >> All lexical analysis is done using the concept of DFAs, but the table-driven DFA output by >> lex is not very fast. Again, I point to the work of Nigel Horspool, which is very old >> work at this point, of generating tight C code, and no table appears anywhere in his >> implementation. >> joe >> >I really, really doubt that no table is used in a lexical analyzer that >is very very fast. Some little include of an include that you missed is >more likely. I can't accept this without direct proof. > >I can accept that he may have derived a parser that is very very fast >without the need for a table. This is more plausible. Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Peter Olcott on 18 May 2010 21:42 On 5/18/2010 8:07 PM, Joseph M. Newcomer wrote: > The counterexample is that he DID it. You'll have to ask him for proof. But I do lexical > analysis all the time using DFAs and never once do I have a table. Years of experience > taught me that they are faster than anything lex or analogous lexer generators can do. But > I'll reserve judgment. > joe How can you scan for dozens of keywords efficiently without a state transition matrix? Without a state transition matrix its gotta be something like if, else if, else if, et cetera. With a state transition matrix you can simultaneously look for an unlimited number of different character strings with zero comparisons per byte. > > On Tue, 18 May 2010 17:36:48 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: > >> On 5/18/2010 5:26 PM, Joseph M. Newcomer wrote: >>> It is important to distinguish between the *concept* of DFA and the implementation of "a >>> table-driven DFA" and the special case of "a table-drive DFA generated by the lex tool" >>> >>> All lexical analysis is done using the concept of DFAs, but the table-driven DFA output by >>> lex is not very fast. Again, I point to the work of Nigel Horspool, which is very old >>> work at this point, of generating tight C code, and no table appears anywhere in his >>> implementation. >>> joe >>> >> I really, really doubt that no table is used in a lexical analyzer that >> is very very fast. Some little include of an include that you missed is >> more likely. I can't accept this without direct proof. >> >> I can accept that he may have derived a parser that is very very fast >> without the need for a table. This is more plausible. > Joseph M. Newcomer [MVP] > email: newcomer(a)flounder.com > Web: http://www.flounder.com > MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Mihai N. on 19 May 2010 01:34 > I can't accept this without direct proof. Yet, we are all supposed to take everything *you* say without any proof. -- Mihai Nita [Microsoft MVP, Visual C++] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email
From: Mihai N. on 19 May 2010 01:42 > You code in Word? Word is essentially user-hostile for writing code! Actually, it is very nice. You set the spelling language to "C" and get nice squiggles for all syntax errors. ;-) Also, has the great benefit that it does not compile and run your code, which allows to write perfect code and create super fast algorithms. Compiling the code and running it destroy the perfection :-) It's called "reality" and it is a bad thing. -- Mihai Nita [Microsoft MVP, Visual C++] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email
From: Joseph M. Newcomer on 19 May 2010 02:18
See below... On Tue, 18 May 2010 20:42:55 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: >On 5/18/2010 8:07 PM, Joseph M. Newcomer wrote: >> The counterexample is that he DID it. You'll have to ask him for proof. But I do lexical >> analysis all the time using DFAs and never once do I have a table. Years of experience >> taught me that they are faster than anything lex or analogous lexer generators can do. But >> I'll reserve judgment. >> joe > >How can you scan for dozens of keywords efficiently without a state >transition matrix? Without a state transition matrix its gotta be >something like if, else if, else if, et cetera. **** You assumption that a "state transition matrix" is required shows that you have never actually written a hard-coded lexer. I've written so many that I've lost count. I'll knock one off as fast as I can type; in fact, typing speed is the limit to how long it takes to create one. How did you develop this fixation that a "state transition matrix" is required? That is only one way to implement a DFA. A switch statement works well, too, and if-statements are considered a viable technique. While a table-driven method is known to be one of the implementations, it is far and away not the ONLY known implementation. Nigel's experience was that a hard-coded recognizer without a table is blindingly fast, MUCH faster than a lexer that uses a transition matrix to drive the recognition. As these things go, using transition matrices is a rather "modern" invention in lexing. The compiler for the CDC Star computer used pipelined floating point operations to create a character class vector (because the Star was LOUSY at character manipulation but blindingly fast at pipelined floating point). So don't try to explain to me that table-driver lexers are the only possible implementation; I was writing hard-coded lexers in 1966, and using table-driven lexers in 1967 that were created from a lexer generator written by Art Evans for his PhD dissertation in 1961. We wrote a hand-coded lexer for the Bliss compiler. I've been writing compilers of one sort or another for 45 years, with table-driven components, hard-coded components, and everything in between. I've written sparse matrix packers using "comb vectors" to compress parse tables (an invention of people at Kahrsruhe University in Germany). And I can write programs that can convert NDFA machines to DFA machines, and then optimize the DFA to minimum states, and used to give this as a freshman programming exercise. So please stop trying to tell me how much I know or don't know about lexers, parsers, and compiler implementations in general. I also worked closely with one of the world's best language designers, and I learned a lot about language design from him. Not enough to be as good as he is, but enough to know when I see a bad design. Do you know what a "perfect hash" is? I've built perfect hash algorithms. Do you know where they are used, and why? I'm not some newbie at this business. I worked with optimizing compiler technology from 1970 through 1983, and my Ph.D. was in optimal code generation from table-driven code generators. I've built debuggers, byte code interpreters, tree interpreters, threaded code interpreters, and machine code generators. I wrote my first compiler in 1967, which took an Algol-like language and generator hard code. I was one of the key developers of an interactive language from 1967-1969. My first optimizing compiler was a student project in 1967 (I can date that exactly, because the machine I wrote it for was decomissioned in 1968; our grade was inversely proportional to code size, and I got an A+). I've forgotten more about compilers than most people have ever learned. The last compiler I wrote had not a trace of tables anywhere in it, and I wrote it three years ago (I just checked the file dates). My first Windows project was a compiler which had not a single table for either its lexer or parser, back in 1990. So I have no idea why you think these representations are essential. And my skills pale beside those of deep experts like Nigel Horspool; see, for example, http://webhome.cs.uvic.ca/~nigelh/pubs.html and I recommend the paper http://webhome.cs.uvic.ca/~nigelh/Publications/fastparse.pdf A 1990 paper wherein he shows why performance of table-driven parsers can be improved by very large factors (like 8x) by hard-coding the transitions in pure C code without a table anywhere in the resulting compiler. Don't try to explain why table-driven must be faster than any alternative when it was proven 20 years ago that this is not true. I suggest looking at Table 2 on page 19. You asked for proof; here it is. **** > >With a state transition matrix you can simultaneously look for an >unlimited number of different character strings with zero comparisons >per byte. **** So what's your complaint about adding lexer rules for localized digit sequences? You just said it won't add any time... joe > >> >> On Tue, 18 May 2010 17:36:48 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: >> >>> On 5/18/2010 5:26 PM, Joseph M. Newcomer wrote: >>>> It is important to distinguish between the *concept* of DFA and the implementation of "a >>>> table-driven DFA" and the special case of "a table-drive DFA generated by the lex tool" >>>> >>>> All lexical analysis is done using the concept of DFAs, but the table-driven DFA output by >>>> lex is not very fast. Again, I point to the work of Nigel Horspool, which is very old >>>> work at this point, of generating tight C code, and no table appears anywhere in his >>>> implementation. >>>> joe >>>> >>> I really, really doubt that no table is used in a lexical analyzer that >>> is very very fast. Some little include of an include that you missed is >>> more likely. I can't accept this without direct proof. >>> >>> I can accept that he may have derived a parser that is very very fast >>> without the need for a table. This is more plausible. >> Joseph M. Newcomer [MVP] >> email: newcomer(a)flounder.com >> Web: http://www.flounder.com >> MVP Tips: http://www.flounder.com/mvp_tips.htm Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm |