Prev: Python does not allow a variable named "pass"
Next: finding objects in a piece of functional code ?
From: Baz Walter on 11 Apr 2010 10:12 i am using python 2.6 on a linux box and i have some utf-16 encoded files with crlf line-endings which i would like to open with universal newlines. so far, i have been unable to get this to work correctly. for example: >>> open('test.txt', 'w').write(u'a\r\nb\r\n'.encode('utf-16')) >>> repr(open('test.txt', 'rbU').read().decode('utf-16')) "u'a\\n\\nb\\n\\n'" >>> import codecs >>> repr(codecs.open('test.txt', 'rbU', 'utf-16').read()) "u'a\\n\\nb\\n\\n'" of course, the output i want is: "u'a\\nb\\n'" i suppose it's not too surprising that the built-in open converts the line endings before decoding, but it surprised me that codecs.open does this as well. is there a way to get universal newlines to work properly with utf-16 files? (nb: i'm not interested in other methods of converting line endings - just whether universal newlines can be made to work correctly).
From: Stefan Behnel on 11 Apr 2010 10:37 Baz Walter, 11.04.2010 16:12: > i am using python 2.6 on a linux box and i have some utf-16 encoded > files with crlf line-endings which i would like to open with universal > newlines. > > so far, i have been unable to get this to work correctly. > > for example: > > >>> open('test.txt', 'w').write(u'a\r\nb\r\n'.encode('utf-16')) > >>> repr(open('test.txt', 'rbU').read().decode('utf-16')) > "u'a\\n\\nb\\n\\n'" > >>> import codecs > >>> repr(codecs.open('test.txt', 'rbU', 'utf-16').read()) > "u'a\\n\\nb\\n\\n'" > > of course, the output i want is: > > "u'a\\nb\\n'" > > i suppose it's not too surprising that the built-in open converts the > line endings before decoding, but it surprised me that codecs.open does > this as well. The codecs module does not support universal newline parsing (see the docs). You need to use the new io module instead. Stefan
From: Baz Walter on 11 Apr 2010 11:16 On 11/04/10 15:37, Stefan Behnel wrote: > The codecs module does not support universal newline parsing (see the > docs). You need to use the new io module instead. thanks. i'd completely overlooked the io module - i thought it was only in python 2.7/3.x.
From: Antoine Pitrou on 11 Apr 2010 20:35 Le Sun, 11 Apr 2010 16:16:45 +0100, Baz Walter a écrit : > On 11/04/10 15:37, Stefan Behnel wrote: >> The codecs module does not support universal newline parsing (see the >> docs). You need to use the new io module instead. > > thanks. > > i'd completely overlooked the io module - i thought it was only in > python 2.7/3.x. To be precise, the 2.6 version is a slow one, written in pure Python (and it might be a bit less debugged too). But codecs.open() is slow, too.
|
Pages: 1 Prev: Python does not allow a variable named "pass" Next: finding objects in a piece of functional code ? |