Ignoring XML Namespaces with cElementTree [Python]

Prev: PyCon Australia CFP: One Day Left!
Next: assigning multi-line strings to variables

From: dmtr on 27 Apr 2010 21:42

Is there any way to configure cElementTree to ignore the XML root
namespace? Default cElementTree (Python 2.6.4) appears to add the XML
root namespace URI to _every_ single tag. I know that I can strip
URIs manually, from every tag, but it is a rather idiotic thing to do
(performance wise).

From: Stefan Behnel on 28 Apr 2010 02:53

dmtr, 28.04.2010 03:42:
> Is there any way to configure cElementTree to ignore the XML root
> namespace? Default cElementTree (Python 2.6.4) appears to add the XML
> root namespace URI to _every_ single tag.

Certainly not in the serialised XML. Are you referring to the qualified
names it uses?

Stefan

From: dmtr on 29 Apr 2010 22:57

I'm referring to xmlns/URI prefixes. Here's a code example:
from xml.etree.cElementTree import iterparse
from cStringIO import StringIO
xml = """<root xmlns="http://www.very_long_url.com"><child/></
root>"""
for event, elem in iterparse(StringIO(xml)): print event, elem

The output is:
end <Element '{http://www.very_long_url.com}child' at 0xb7ddfa58>
end <Element '{http://www.very_long_url.com}root' at 0xb7ddfa40>

I don't want these "{http://www.very_long_url.com}" in front of my
tags.

They create performance disaster on large files (first cElementTree
adds them, then I have to remove them in python). Is there any way to
tell cElementTree not to mess with my tags? I need that in the
standard python distribution, not my custom cElementTree build...

From: Stefan Behnel on 30 Apr 2010 01:12

dmtr, 30.04.2010 04:57:
> I'm referring to xmlns/URI prefixes. Here's a code example:
> from xml.etree.cElementTree import iterparse
> from cStringIO import StringIO
> xml = """<root xmlns="http://www.very_long_url.com"><child/></
> root>"""
> for event, elem in iterparse(StringIO(xml)): print event, elem
>
> The output is:
> end<Element '{http://www.very_long_url.com}child' at 0xb7ddfa58>
> end<Element '{http://www.very_long_url.com}root' at 0xb7ddfa40>
>
>
> I don't want these "{http://www.very_long_url.com}" in front of my
> tags.
>
> They create performance disaster on large files

I seriously doubt that they do.

> (first cElementTree
> adds them, then I have to remove them in python).

I think that's your main mistake: don't remove them. Instead, use the fully
qualified names when comparing.

Stefan

From: dmtr on 30 Apr 2010 17:59

> I think that's your main mistake: don't remove them. Instead, use the fully
> qualified names when comparing.
>
> Stefan

Yes. That's what I'm forced to do. Pre-calculating tags like tagChild
= "{%s}child" % uri and using them instead of "child". As a result the
code looks ugly and there is extra overhead concatenating/comparing
these repeating and redundant prefixes. I don't understand why
cElementTree forces users to do that. So far I couldn't find any way
around that without rebuilding cElementTree from source.

Apparently somebody hard-coded the namespace_separator parameter in
the cElementTree.c (what a dumb thing to do!!!, it should have been a
parameter in the cElementTree.XMLParser() arguments):
===========
self->parser = EXPAT(ParserCreate_MM)(encoding, &memory_handler, "}");
===========

Simply replacing "}" with NULL gives me desired tags without stinking
URIs.

| Next | Last
Pages: 1 2 3
Prev: PyCon Australia CFP: One Day Left!
Next: assigning multi-line strings to variables