From: Robert on 31 Jan 2010 14:57 I tried lxml, but after walking and making changes in the element tree, I'm forced to do a full serialization of the whole document (etree.tostring(tree)) - which destroys the "human edited" format of the original HTML code. makes it rather unreadable. is there an existing HTML parser which supports tracking/writing back particular changes in a cautious way by just making local changes? or a least tracks the tag start/end positions in the file? Robert
From: Stefan Behnel on 1 Feb 2010 03:34 Robert, 31.01.2010 20:57: > I tried lxml, but after walking and making changes in the element tree, > I'm forced to do a full serialization of the whole document > (etree.tostring(tree)) - which destroys the "human edited" format of the > original HTML code. makes it rather unreadable. What do you mean? Could you give an example? lxml certainly does not destroy anything it parsed, unless you tell it to do so. Stefan
From: Nobody on 1 Feb 2010 22:09 On Sun, 31 Jan 2010 20:57:31 +0100, Robert wrote: > I tried lxml, but after walking and making changes in the element > tree, I'm forced to do a full serialization of the whole document > (etree.tostring(tree)) - which destroys the "human edited" format > of the original HTML code. > makes it rather unreadable. > > is there an existing HTML parser which supports tracking/writing > back particular changes in a cautious way by just making local > changes? or a least tracks the tag start/end positions in the file? HTMLParser, sgmllib.SGMLParser and htmllib.HTMLParser all allow you to retrieve the literal text of a start tag (but not an end tag). Unfortunately, they're only tokenisers, not parsers, so you'll need to handle minimisation yourself.
|
Pages: 1 Prev: Meet Arab, Russian, American Singles From All Over The World Next: Why this error message |