From: Paul Rubin on 10 Apr 2010 21:38 Steven D'Aprano <steve(a)REMOVE-THIS-cybersource.com.au> writes: > As entertaining as this is, the analogy is rubbish. Skis are far too > simple to use as an analogy for a parser (he says, having never seen skis > up close in his life *wink*). Have you looked at PyParsing's source code? > Regexes are only a small part of the parser, and not analogous to the > wood of skis. The impression that I have (from a distance) is that Pyparsing is a good interface abstraction with a kludgy and slow implementation. That the implementation uses regexps just goes to show how kludgy it is. One hopes that someday there will be a more serious implementation, perhaps using llvm-py (I wonder whatever happened to that project, by the way) so that your parser script will compile to executable machine code on the fly.
From: Paul McGuire on 11 Apr 2010 00:32 On Apr 10, 8:38 pm, Paul Rubin <no.em...(a)nospam.invalid> wrote: > The impression that I have (from a distance) is that Pyparsing is a good > interface abstraction with a kludgy and slow implementation. That the > implementation uses regexps just goes to show how kludgy it is. One > hopes that someday there will be a more serious implementation, perhaps > using llvm-py (I wonder whatever happened to that project, by the way) > so that your parser script will compile to executable machine code on > the fly. I am definitely flattered that pyparsing stirs up so much interest, and among such a distinguished group. But I have to take some umbrage at Paul Rubin's left-handed compliment, "Pyparsing is a good interface abstraction with a kludgy and slow implementation," especially since he forms his opinions "from a distance". I actually *did* put some thought into what I wanted in pyparsing before designing it, and this forms this chapter of "Getting Started with Pyparsing" (available here as a free online excerpt: http://my.safaribooksonline.com/9780596514235/what_makes_pyparsing_so_special#X2ludGVybmFsX0ZsYXNoUmVhZGVyP3htbGlkPTk3ODA1OTY1MTQyMzUvMTYmaW1hZ2VwYWdlPTE2), the "Zen of Pyparsing" as it were. My goals were: - build parsers using explicit constructs (such as words, groups, repetition, alternatives), vs. expression encoding using specialized character sequences, as found in regexen - easy parser construction from primitive elements to complex groups and alternatives, using Python's operator overloading for ease of direct implementation of parsers using ordinary Python syntax; include mechanisms for defining recursive parser expressions - implicit skipping of whitespace between parser elements - results returned not just as a list of strings, but as a rich data object, with access to parsed fields by name or by list index, taking interfaces from both dicts and lists for natural adoption into common Python idioms - no separate code-generation steps, a la lex/yacc - support for parse-time callbacks, for specialized token handling, conversion, and/or construction of data structures - 100% pure Python, to be runnable on any platform that supports Python - liberal licensing, to permit easy adoption into any user's projects anywhere So raw performance really didn't even make my short-list, beyond the obvious "should be tolerably fast enough." I have found myself reading posts on c.l.py with wording like "I'm trying to parse <blah-blah> and I've been trying for hours/days to get this regex working." For kicks, I'd spend 5-15 minutes working up a working pyparsing solution, which *does* run comparatively slowly, perhaps taking a few minutes to process the poster's data file. But the net solution is developed and running in under 1/2 an hour, which to me seems like an overall gain compared to hours of fruitless struggling with backslashes and regex character sequences. On top of which, the pyparsing solutions are still readable when I come back to them weeks or months later, instead of staring at some line-noise regex and just scratch my head wondering what it was for. And sometimes "comparatively slowly" means that it runs 50x slower than a compiled method that runs in 0.02 seconds - that's still getting the job done in just 1 second. And is the internal use of regexes with pyparsing really a "kludge"? Why? They are almost completely hidden from the parser developer. And yet by using compiled regexes, I retain the portability of 100% Python while leveraging the compiled speed of the re engine. It does seem that there have been many posts of late (either on c.l.py or the related posts on Stackoverflow) where the OP is trying to either scrape content from HTML, or parse some type of recursive expression. HTML scrapers implemented using re's are terribly fragile, since HTML in the wild often contains little surprises (unexpected whitespace; upper/lower case inconsistencies; tag attributes in unpredictable order; attribute values with double, single, or no quotation marks) which completely frustrate any re-based approach. Granted, there are times when an re-parsing-of-HTML endeavor *isn't* futile or doomed from the start - the OP may be working with a very restricted set of HTML, generated from some other script so that the output is very consistent. Unfortunately, this poster usually gets thrown under the same "you'll never be able to parse HTML with re's" bus. I can't explain the surge in these posts, other than to wonder if we aren't just seeing a skewed sample - that is, the many cases where people *are* successfully using re's to solve their text extraction problems aren't getting posted to c.l.py, since no one posts questions they already have the answers to. So don't be too dismissive of pyparsing, Mr. Rubin. I've gotten many e- mails, wiki, and forum posts from Python users at all levels of the expertise scale, saying that pyparsing has helped them to be very productive in one or another aspect of creating a command parser, or adding safe expression evaluation to an app, or just extracting some specific data from a log file. I am encouraged that most report that they can get their parsers working in reasonably short order, often by reworking one of the examples that comes with pyparsing. If you're offering to write that extension to pyparsing that generates the parser runtime in fast machine code, it sounds totally bitchin' and I'd be happy to include it when it's ready. -- Paul
From: Patrick Maupin on 11 Apr 2010 02:29 On Apr 10, 1:05 pm, Stefan Behnel <stefan...(a)behnel.de> wrote: > Running a Python program in CPython eventually boils down to a sequence of > commands being executed by the CPU. That doesn't mean you should write > those commands manually, even if you can. It's perfectly ok to write the > program in Python instead. Absolutely. But (as I seem to have posted many times recently) if somebody asks how to do "x" it may be useful to point out that it sounds like he really wants "y" and there are already several canned solutions that do "y", but if he really wants "x", here is how he should do it, or here is why he will have problems if he attempts to do it (hint: whether Jamie Zawinski decides to kill a puppy or not is not really a problem for somebody just asking a programming question -- that's really up to Jamie). Regards, Pat
From: Neil Cerutti on 12 Apr 2010 08:09
On 2010-04-11, Steven D'Aprano <steve(a)REMOVE-THIS-cybersource.com.au> wrote: > On Sat, 10 Apr 2010 10:11:07 -0700, Patrick Maupin wrote: >> On Apr 10, 11:35??am, Neil Cerutti <ne...(a)norwich.edu> wrote: >>> On 2010-04-10, Patrick Maupin <pmau...(a)gmail.com> wrote: >>> > as Pyparsing". ??Which is all well and good, except then the OP will >>> > download pyparsing, take a look, realize that it uses regexps under >>> > the hood, and possibly be very confused. >>> >>> I don't agree with that. If a person is trying to ski using pieces of >>> wood that they carved themselves, I don't expect them to be surprised >>> that the skis they buy are made out of similar materials. >> >> But, in this case, the guy ASKED how to make the skis in his woodworking >> shop, and was told not to be silly -- you don't use wood to make skis -- >> and then directed to go buy some skis that are, in fact, made out of >> wood. > > As entertaining as this is, the analogy is rubbish. You should have seen the car engine analogy I thought up at first. ;) > Skis are far too simple to use as an analogy for a parser (he > says, having never seen skis up close in his life *wink*). > Have you looked at PyParsing's source code? Regexes are only a > small part of the parser, and not analogous to the wood of > skis. I was mainly trying to get accross my incredulity that somebody should be surprised a parsing package uses regexes under the good. But for the record, a set of downhill skis comes with a really fancy interface layer: URL:http://images03.olx.com/ui/1/85/66/13147966_1.jpg -- Neil Cerutti |