From: John Machin on 28 Jul 2010 17:39 On Jul 29, 4:32 am, "Joe Goldthwaite" <j...(a)goldthwaites.com> wrote: > Hi, > > I've got an Ascii file with some latin characters. Specifically \xe1 and > \xfc. I'm trying to import it into a Postgresql database that's running in > Unicode mode. The Unicode converter chokes on those two characters. > > I could just manually replace those to characters with something valid but > if any other invalid characters show up in later versions of the file, I'd > like to handle them correctly. > > I've been playing with the Unicode stuff and I found out that I could > convert both those characters correctly using the latin1 encoder like this; > > import unicodedata > > s = '\xe1\xfc' > print unicode(s,'latin1') > > The above works. When I try to convert my file however, I still get an > error; > > import unicodedata > > input = file('ascii.csv', 'r') > output = file('unicode.csv','w') > > for line in input.xreadlines(): > output.write(unicode(line,'latin1')) > > input.close() > output.close() > > Traceback (most recent call last): > File "C:\Users\jgold\CloudmartFiles\UnicodeTest.py", line 10, in __main__ > output.write(unicode(line,'latin1')) > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position > 295: ordinal not in range(128) > > I'm stuck using Python 2.4.4 which may be handling the strings differently > depending on if they're in the program or coming from the file. I just > haven't been able to figure out how to get the Unicode conversion working > from the file data. > > Can anyone explain what is going on? Hello hello ... you are running on Windows; the likelihood that you actually have data encoded in latin1 is very very small. Follow MRAB's answer but replace "latin1" by "cp1252".
First
|
Prev
|
Pages: 1 2 Prev: Linear nterpolation in 3D Next: Newbie question regarding SSL and certificate verification |