From: dmtr on 27 Apr 2010 21:42 Is there any way to configure cElementTree to ignore the XML root namespace? Default cElementTree (Python 2.6.4) appears to add the XML root namespace URI to _every_ single tag. I know that I can strip URIs manually, from every tag, but it is a rather idiotic thing to do (performance wise).
From: Stefan Behnel on 28 Apr 2010 02:53 dmtr, 28.04.2010 03:42: > Is there any way to configure cElementTree to ignore the XML root > namespace? Default cElementTree (Python 2.6.4) appears to add the XML > root namespace URI to _every_ single tag. Certainly not in the serialised XML. Are you referring to the qualified names it uses? Stefan
From: dmtr on 29 Apr 2010 22:57 I'm referring to xmlns/URI prefixes. Here's a code example: from xml.etree.cElementTree import iterparse from cStringIO import StringIO xml = """<root xmlns="http://www.very_long_url.com"><child/></ root>""" for event, elem in iterparse(StringIO(xml)): print event, elem The output is: end <Element '{http://www.very_long_url.com}child' at 0xb7ddfa58> end <Element '{http://www.very_long_url.com}root' at 0xb7ddfa40> I don't want these "{http://www.very_long_url.com}" in front of my tags. They create performance disaster on large files (first cElementTree adds them, then I have to remove them in python). Is there any way to tell cElementTree not to mess with my tags? I need that in the standard python distribution, not my custom cElementTree build...
From: Stefan Behnel on 30 Apr 2010 01:12 dmtr, 30.04.2010 04:57: > I'm referring to xmlns/URI prefixes. Here's a code example: > from xml.etree.cElementTree import iterparse > from cStringIO import StringIO > xml = """<root xmlns="http://www.very_long_url.com"><child/></ > root>""" > for event, elem in iterparse(StringIO(xml)): print event, elem > > The output is: > end<Element '{http://www.very_long_url.com}child' at 0xb7ddfa58> > end<Element '{http://www.very_long_url.com}root' at 0xb7ddfa40> > > > I don't want these "{http://www.very_long_url.com}" in front of my > tags. > > They create performance disaster on large files I seriously doubt that they do. > (first cElementTree > adds them, then I have to remove them in python). I think that's your main mistake: don't remove them. Instead, use the fully qualified names when comparing. Stefan
From: dmtr on 30 Apr 2010 17:59 > I think that's your main mistake: don't remove them. Instead, use the fully > qualified names when comparing. > > Stefan Yes. That's what I'm forced to do. Pre-calculating tags like tagChild = "{%s}child" % uri and using them instead of "child". As a result the code looks ugly and there is extra overhead concatenating/comparing these repeating and redundant prefixes. I don't understand why cElementTree forces users to do that. So far I couldn't find any way around that without rebuilding cElementTree from source. Apparently somebody hard-coded the namespace_separator parameter in the cElementTree.c (what a dumb thing to do!!!, it should have been a parameter in the cElementTree.XMLParser() arguments): =========== self->parser = EXPAT(ParserCreate_MM)(encoding, &memory_handler, "}"); =========== Simply replacing "}" with NULL gives me desired tags without stinking URIs.
|
Next
|
Last
Pages: 1 2 3 Prev: PyCon Australia CFP: One Day Left! Next: assigning multi-line strings to variables |