From: manstey on 11 May 2006 23:34 I am writing a program to translate a list of ascii letters into a different language that requires unicode encoding. This is what I have done so far: 1. I have # -*- coding: UTF-8 -*- as my first line. 2. In Wing IDE I have set Default Encoding to UTF-8 3. I have imported codecs and opened and written my file, which doesn't have a BOM, as encoding=UTF-8 4. I have written a dictionary for translation, with entries such as {'F':u'\u0254'} and a function to do the translation Everything works fine, except that my output file, when loaded in unicode aware emeditor has (u'F', u'\u0254') But I want to display it as: ('F', 'ɔ') # where the ɔ is a back-to-front 'c' So my questions are: 1. How do I do this? 2. Do I need to change any of my steps above?
From: Martin v. Löwis on 12 May 2006 01:22 manstey wrote: > 1. I have # -*- coding: UTF-8 -*- as my first line. > 2. In Wing IDE I have set Default Encoding to UTF-8 > 3. I have imported codecs and opened and written my file, which doesn't > have a BOM, as encoding=UTF-8 > 4. I have written a dictionary for translation, with entries such as > {'F':u'\u0254'} and a function to do the translation > > Everything works fine, except that my output file, when loaded in > unicode aware emeditor has > (u'F', u'\u0254') I couldn't quite follow this description: what is "your output file" (in what step is it created?), and how does (u'F', u'\u0254') get into this file? What is the precise Python statement that produces that line of output? > So my questions are: > 1. How do I do this? Most likely, you use (directly or indirectly) the repr() function to convert a tuple into that string. You shouldn't do that; instead, you should format the elements of the tuple yourself, e.g. through print >>f, u"('%s', '%s')" % value Regards, Martin
From: manstey on 16 May 2006 22:19 Hi Martin, HEre is how I write: input_file = open(input_file_loc, 'r') output_file = open(output_file_loc, 'w') for line in input_file: output_file.write(str(word_info + parse + gloss)) # = three functions that return tuples (u'F', u'\u0254') are two of the many unicode tuple elements returned by the three functions. What am I doing wrong?
From: Ben Finney on 16 May 2006 22:38 "manstey" <manstey(a)csu.edu.au> writes: > input_file = open(input_file_loc, 'r') > output_file = open(output_file_loc, 'w') > for line in input_file: > output_file.write(str(word_info + parse + gloss)) # = three functions that return tuples If you mean that 'word_info', 'parse' and 'gloss' are three functions that return tuples, then you get that return value by calling them. >>> def foo(): ... return "foo's return value" ... >>> def bar(baz): ... return "bar's return value (including '%s')" % baz ... >>> print foo() foo's return value >>> print bar <function bar at 0x401fe80c> >>> print bar("orange") bar's return value (including 'orange') -- \ "A man must consider what a rich realm he abdicates when he | `\ becomes a conformist." -- Ralph Waldo Emerson | _o__) | Ben Finney
From: manstey on 17 May 2006 00:20
I'm a newbie at python, so I don't really understand how your answer solves my unicode problem. I have done more reading on unicode and then tried my code in IDLE rather than WING IDE, and discovered that it works fine in IDLE, so I think WING has a problem with unicode. For example, in WING this code returns an error: a={'a':u'\u0254'} print a['a'] UnicodeEncodeError: 'ascii' codec can't encode character u'\u0254' in position 0: ordinal not in range(128) but in IDLE it correctly prints open o So, assuming I now work in IDLE, all I want help with is how to read in an ascii string and convert its letters to various unicode values and save the resulting 'string' to a utf-8 text file. Is this clear? so in pseudo code 1. F is converted to \u0254, $ is converted to \u0283, C is converted to \u02A6\02C1, etc. (i want to do this using a dictionary TRANSLATE={'F':u'\u0254', etc) 2. I read in a file with lines like: F$ FCF$ $$C$ etc 3. I convert this to \u0254\u0283 \u0254\u02A6\02C1\u0254 etc 4. i save the results in a new file when i read the new file in a unicode editor (EmEditor), i don't see \u0254\u02A6\02C1\u0254, but I see the actual characters (open o, esh, ts digraph, modified letter reversed glottal stop, etc. I'm sure this is straightforward but I can't get it to work. All help appreciated! |