From: Matthias Kievernagel on 23 Apr 2010 08:00 Hello, I stumbled upon this one while porting some of my programs to Python 3.1. The program receives messages from a socket and displays them in a tkinter Text. Works fine in Python 2 and Python 3.1. The problems arrived when I wanted to know the details... First surprise: Text.insert accepts not only str but also bytes. So I looked into the sources to see how it is done. I found no magic in 'tkinter.__init__.py'. All python objects seem to go unchanged to _tkinter.c. There they are turned into Tcl objects using Tcl_NewUnicodeObj (for str) and Tcl_NewStringObj (for bytes). The man page for Tcl_NewStringObj says that it creates a tcl string from utf-8 encoded bytes. So I continued to test... Second surprise: Text.insert also works for latin-1 encoded bytes. It even works with mixed utf-8 and latin-1 encoded bytes. At least it works for me. Anyone can enlighten me, where this magic is done? Is it tcl magic or did I miss something in the python sources? Is this somewhere documented? Thanks for any hints, Matthias Kievernagel
From: eb303 on 23 Apr 2010 09:47 On Apr 23, 2:00 pm, Matthias Kievernagel <mkie...(a)Pirx.sirius.org> wrote: > Hello, > > I stumbled upon this one while porting some of my programs > to Python 3.1. The program receives messages from a socket > and displays them in a tkinter Text. Works fine in Python 2 > and Python 3.1. The problems arrived when I wanted to know > the details... > > First surprise: Text.insert accepts not only str > but also bytes. > > So I looked into the sources to see how it is done. > I found no magic in 'tkinter.__init__.py'. All python > objects seem to go unchanged to _tkinter.c. > There they are turned into Tcl objects using Tcl_NewUnicodeObj > (for str) and Tcl_NewStringObj (for bytes). > The man page for Tcl_NewStringObj says that it creates > a tcl string from utf-8 encoded bytes. > So I continued to test... > > Second surprise: Text.insert also works for latin-1 encoded bytes. > It even works with mixed utf-8 and latin-1 encoded bytes. > At least it works for me. > > Anyone can enlighten me, where this magic is done? > Is it tcl magic or did I miss something in the python sources? > Is this somewhere documented? > > Thanks for any hints, > Matthias Kievernagel Let me guess: you're on Windows? ;-) There is nothing in the Python sources that can help you here. Everything is handled by the underlying tcl/tk interpreter. The default encoding for strings in tcl happens to be UTF-8. So putting bytestrings with a UTF-8 encoding in a Text widget will just work. For latin-1 strings, there is some magic going on, but apparently, this magic happens only on Windows (hence my guess above ), which seems to recognize its default encoding by some means. My advice is: don't count on it. It won't work on any other platform, and it might even stop working on Windows one day. HTH - Eric -
From: Matthias Kievernagel on 23 Apr 2010 10:37 eb303 <eric.brunel.pragmadev(a)gmail.com> wrote: > On Apr 23, 2:00�pm, Matthias Kievernagel <mkie...(a)Pirx.sirius.org> > wrote: >> Hello, >> >> I stumbled upon this one while porting some of my programs >> to Python 3.1. The program receives messages from a socket >> and displays them in a tkinter Text. Works fine in Python 2 >> and Python 3.1. The problems arrived when I wanted to know >> the details... >> >> First surprise: Text.insert accepts not only str >> but also bytes. >> >> So I looked into the sources to see how it is done. >> I found no magic in 'tkinter.__init__.py'. All python >> objects seem to go unchanged to _tkinter.c. >> There they are turned into Tcl objects using Tcl_NewUnicodeObj >> (for str) and Tcl_NewStringObj (for bytes). >> The man page for Tcl_NewStringObj says that it creates >> a tcl string from utf-8 encoded bytes. >> So I continued to test... >> >> Second surprise: Text.insert also works for latin-1 encoded bytes. >> It even works with mixed utf-8 and latin-1 encoded bytes. >> At least it works for me. >> >> Anyone can enlighten me, where this magic is done? >> Is it tcl magic or did I miss something in the python sources? >> Is this somewhere documented? >> >> Thanks for any hints, >> Matthias Kievernagel > > Let me guess: you're on Windows? ;-) > > There is nothing in the Python sources that can help you here. > Everything is handled by the underlying tcl/tk interpreter. The > default encoding for strings in tcl happens to be UTF-8. So putting > bytestrings with a UTF-8 encoding in a Text widget will just work. For > latin-1 strings, there is some magic going on, but apparently, this > magic happens only on Windows (hence my guess above???), which seems to > recognize its default encoding by some means. My advice is: don't > count on it. It won't work on any other platform, and it might even > stop working on Windows one day. > > HTH > - Eric - Thanks for the info, Eric. Funny it's working for me, because I'm on Linux. So I'll take a look at the tcl/tk sources (8.4 btw.) I don't like this magic at all, run-time errors waiting for you at the most inconvenient moment. Best regards, Matthias Kievernagel.
|
Pages: 1 Prev: Windows debugging symbols for python 2.5.4 and pywin32 214 Next: PyCon Australia 2010 update |