Prev: Python Tkinter Simple Qn
Next: Deer Esurance
From: RG on 11 Aug 2010 21:21 I thought it was hard-coded into the Python executable at compile time, but that is apparently not the case: [ron(a)mickey:~]$ python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys;print sys.stdin.encoding UTF-8 >>> ^D [ron(a)mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python None [ron(a)mickey:~]$ And indeed, trying to pipe unicode into Python doesn't work, even though it works fine when Python runs interactively. So how can I make this work? Thanks, rg
From: Benjamin Kaplan on 11 Aug 2010 22:24 On Wed, Aug 11, 2010 at 6:21 PM, RG <rNOSPAMon(a)flownet.com> wrote: > I thought it was hard-coded into the Python executable at compile time, > but that is apparently not the case: > > [ron(a)mickey:~]$ python > Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> import sys;print sys.stdin.encoding > UTF-8 >>>> ^D > [ron(a)mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python > None > [ron(a)mickey:~]$ > > And indeed, trying to pipe unicode into Python doesn't work, even though > it works fine when Python runs interactively. So how can I make this > work? > Sys.stdin and stdout are files, just like any other. There's nothing special about them at compile time. When the interpreter starts, it checks to see if they are ttys. If they are, then it tries to figure out the terminal's encoding based on the environment. The code for this is in pythonrun.c if you want to see exactly what it's doing. If stdout and stdin aren't ttys, then their encoding stays as None and the interpreter will use sys.getdefaultencoding() if you try printing Unicode strings. By the way, there is no such thing as piping Unicode into Python. Unicode is an abstract concept where each character maps to a codepoint. Pipes can only deal with bytes. You may be using one of the 5 encodings capable of holding the entire range of Unicode characters (UTF-8, UTF-16 LE, UTF-16 BE, UTF-32 LE, and UTF-32 BE), but that's not the same thing as Unicode. You really have to watch your encodings when you pass data around between programs. There's no way to avoid it.
From: RG on 12 Aug 2010 01:50 In article <mailman.1988.1281579897.1673.python-list(a)python.org>, Benjamin Kaplan <benjamin.kaplan(a)case.edu> wrote: > On Wed, Aug 11, 2010 at 6:21 PM, RG <rNOSPAMon(a)flownet.com> wrote: > > I thought it was hard-coded into the Python executable at compile time, > > but that is apparently not the case: > > > > [ron(a)mickey:~]$ python > > Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) > > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin > > Type "help", "copyright", "credits" or "license" for more information. > >>>> import sys;print sys.stdin.encoding > > UTF-8 > >>>> ^D > > [ron(a)mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python > > None > > [ron(a)mickey:~]$ > > > > And indeed, trying to pipe unicode into Python doesn't work, even though > > it works fine when Python runs interactively. So how can I make this > > work? > > > > Sys.stdin and stdout are files, just like any other. There's nothing > special about them at compile time. When the interpreter starts, it > checks to see if they are ttys. If they are, then it tries to figure > out the terminal's encoding based on the environment. The code for > this is in pythonrun.c if you want to see exactly what it's doing. Thanks. Looks like the magic incantation is: export PYTHONIOENCODING='utf-8' > By the way, there is no such thing as piping Unicode into Python. Yeah, I know. I should have said "piping UTF-8 encoded unicode" or something like that. > You really have to watch your encodings > when you pass data around between programs. There's no way to avoid > it. Yeah, I keep re-learning that lesson again and again. rg
From: Anssi Saari on 12 Aug 2010 09:42 Benjamin Kaplan <benjamin.kaplan(a)case.edu> writes: > Sys.stdin and stdout are files, just like any other. There's nothing > special about them at compile time. When the interpreter starts, it > checks to see if they are ttys. If they are, then it tries to figure > out the terminal's encoding based on the environment. Just a related question, is looking at sys.stdin.encoding the proper way of doing things? I've been working on a script to display some email headers, some of which are encoded in MIME to various charsets. Until now I have used whatever locale.getdefaultlocale() returns as the target encoding, since "it seemed to work". Although on one computer the call returns ISO-8859-15 even though I don't quite understand why.
|
Pages: 1 Prev: Python Tkinter Simple Qn Next: Deer Esurance |