From: Peter J. Holzer on 2 May 2010 08:24 On 2010-04-29 14:54, Peter Billam <peter(a)www.pjb.com.au> wrote: > Perl should know if it's in a utf environment and printing to a utf8 > device; python does, and so does vi, less, slrn, alpine, firefox and > everything else I use (except fmt). vi, less, slrn, and alpine know that they are dealing with a terminal and can assume that the environment correctly describes properties of this terminal. Perl or Python don't know this - the program written in Perl or Python does not necessarily read from or write to a terminal - very often it deals with files. These files are not necessarily text files and even if they are, they are not necessarily in the native encoding. I don't know how Python deals with this. Perl does have an environment variable (PERL_UNICODE) which can be used to control the default encoding. I've used it a lot during the early days of Perl 5.8.x and can only say that it causes more trouble than it's worth. Just set the appropriate encoding *in* your script - you have to think about your I/O anyway while writing the script. hp
From: Peter J. Holzer on 2 May 2010 11:44 On 2010-05-02 12:24, Peter J. Holzer <hjp-usenet2(a)hjp.at> wrote: > On 2010-04-29 14:54, Peter Billam <peter(a)www.pjb.com.au> wrote: >> Perl should know if it's in a utf environment and printing to a utf8 >> device; python does, and so does vi, less, slrn, alpine, firefox and >> everything else I use (except fmt). > > vi, less, slrn, and alpine know that they are dealing with a terminal > and can assume that the environment correctly describes properties of > this terminal. Actually, it is much more complicated: All of these programs deal not only with the terminal, but with "files" (I put quotes around that because it doesn't matter whether they are stored on disk or received via a socket or pipe). So while slrn may for example assume that the terminal sends and expects UTF-8 (because the user told it by setting LANG), it cannot just use UTF-8 for decoding Usenet postings. Instead it has inspect the headers of each posting to find the Content-Type header and decode the posting according to its charset parameter (and that still ignores the complexities of multi-part MIME messages which slrn can't handle). The situation is worse for vi: Unlike mail messages, text files aren't supposed to be portable between systems. So the user has to tell the editor for every file which charset it is unless it is the local charset or the editor can guess (UTF-8 is pretty easy to detect but most of the 8-bit charsets are hard to distinguish). hp
First
|
Prev
|
Pages: 1 2 3 Prev: FAQ 8.30 How can I convert my shell script to perl? Next: FAQ 6.10 What is "/o" really for? |