From: Stefan Behnel on 22 Apr 2010 01:48 C. Benson Manica, 21.04.2010 19:19: > I have the following simple script running on 2.5.2 on a machine where > the default character encoding is "ascii": > > #!/usr/bin/env python > #coding: utf-8 > > import xml.dom.minidom > import codecs > > str=u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem attrib= > \"�\"/></elements>" > doc=xml.dom.minidom.parseString( str ) > xml=doc.toxml( encoding="utf-8" ) > file=codecs.open( "foo.xml", "w", "utf-8" ) > file.write( xml ) > file.close() You are trying to re-encode an already encoded output string here. toxml(encoding="utf-8") returns a byte string. If you pass that into an encoding file object (as returned by codecs.open()), which expects unicode input, it will fail to re-encode the already encoded string. This gives a bizarre error in Python 2.x and an understandable one in Python 3. So the right solution is to let toxml() do the encoding and drop the use of codecs.open() in favour of f = open("foo.xml", "wb") (mind the 'b' in the file mode, which stands for 'bytes' or 'binary') Stefan
|
Pages: 1 Prev: PyCon Australia 2010 update Next: python 2.6 py2exe wx app fails |