From: C. Benson Manica on 21 Apr 2010 13:19 I have the following simple script running on 2.5.2 on a machine where the default character encoding is "ascii": #!/usr/bin/env python #coding: utf-8 import xml.dom.minidom import codecs str=u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem attrib= \"ó\"/></elements>" doc=xml.dom.minidom.parseString( str ) xml=doc.toxml( encoding="utf-8" ) file=codecs.open( "foo.xml", "w", "utf-8" ) file.write( xml ) file.close() I've specified utf-8 every place I can find that the documentation allows me to, and yet this doesn't even come close to working without UnicodeEncodeErrors. What on Earth do I have to do to please the character encoding gods?
From: Peter Otten on 21 Apr 2010 13:58 C. Benson Manica wrote: > I have the following simple script running on 2.5.2 on a machine where > the default character encoding is "ascii": > > #!/usr/bin/env python > #coding: utf-8 > > import xml.dom.minidom > import codecs > > str=u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem attrib= > \"ó\"/></elements>" > doc=xml.dom.minidom.parseString( str ) > xml=doc.toxml( encoding="utf-8" ) > file=codecs.open( "foo.xml", "w", "utf-8" ) > file.write( xml ) > file.close() > > I've specified utf-8 every place I can find that the documentation > allows me to, and yet this doesn't even come close to working without > UnicodeEncodeErrors. What on Earth do I have to do to please the > character encoding gods? Verify every step as you proceed? >>> import xml.dom.minidom >>> s = u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem attrib=\"ó\"/></elements>" >>> doc = xml.dom.minidom.parseString(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.5/xml/dom/minidom.py", line 1925, in parseString return expatbuilder.parseString(string) File "/usr/lib/python2.5/xml/dom/expatbuilder.py", line 940, in parseString return builder.parseString(string) File "/usr/lib/python2.5/xml/dom/expatbuilder.py", line 223, in parseString parser.Parse(string, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 62: ordinal not in range(128) It seems that parseString() doesn't like unicode -- let's try a byte string then: >>> doc = xml.dom.minidom.parseString(s.encode("utf-8")) >>> xml = doc.toxml(encoding="utf-8") No complaints -- let's have a look at the result: >>> xml '<?xml version="1.0" encoding="utf-8"?><elements><elem attrib="\xc3\xb3"/></elements>' That's a byte string, no need for codecs.open() then: >>> f = open("foo.xml", "w") >>> f.write(xml) >>> f.close() Peter
From: C. Benson Manica on 21 Apr 2010 14:03 On Apr 21, 1:58 pm, Peter Otten <__pete...(a)web.de> wrote: > C. Benson Manica wrote: >> (snip) > > It seems that parseString() doesn't like unicode Yes, I noticed that, and I already tried... > -- let's try a byte string > then: > > >>> doc = xml.dom.minidom.parseString(s.encode("utf-8")) > >>> xml = doc.toxml(encoding="utf-8") ....except that it didn't work: File "./demo.py", line 8, in <module> doc=xml.dom.minidom.parseString( str.encode("utf-8") ) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 62: ordinal not in range(128)
From: Peter Otten on 21 Apr 2010 14:25 C. Benson Manica wrote: > On Apr 21, 1:58 pm, Peter Otten <__pete...(a)web.de> wrote: >> C. Benson Manica wrote: >>> (snip) >> >> It seems that parseString() doesn't like unicode > > Yes, I noticed that, and I already tried... > >> -- let's try a byte string >> then: >> >> >>> doc = xml.dom.minidom.parseString(s.encode("utf-8")) >> >>> xml = doc.toxml(encoding="utf-8") > > ...except that it didn't work: > > File "./demo.py", line 8, in <module> > doc=xml.dom.minidom.parseString( str.encode("utf-8") ) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position > 62: ordinal not in range(128) Are you sure that your script has str = u"..." like in your post and not just str = "..." ? Peter
From: C. Benson Manica on 21 Apr 2010 14:40 On Apr 21, 2:25 pm, Peter Otten <__pete...(a)web.de> wrote: > Are you sure that your script has > > str = u"..." > > like in your post and not just > > str = "..." No :-) str=u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem attrib= \"ó\"/></elements>" doc=xml.dom.minidom.parseString( str.encode("utf-8") ) xml=doc.toxml( encoding="utf-8") file=codecs.open( "foo.xml", "w", "utf-8" ) file.write( xml ) file.close() fails: File "./demo.py", line 12, in <module> file.write( xml ) File "/usr/lib/python2.5/codecs.py", line 638, in write return self.writer.write(data) File "/usr/lib/python2.5/codecs.py", line 303, in write data, consumed = self.encode(object, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 62: ordinal not in range(128) but dropping the encoding argument to doc.toxml() seems to finally work. I'd be curious to know why the code you posted (that worked for you) didn't for me, but at this point I'm just happy with something functional. Thank you very kindly!
|
Next
|
Last
Pages: 1 2 Prev: problem when running .py file Next: Deleting more than one element from a list |