Unicode error [Python]

Prev: make a folder to .nsi file(which finally will convert to .exe) use python
Next: time between now and the next 2:30 am?

From: dirknbr on 23 Jul 2010 06:14

I am having some problems with unicode from json.

This is the error I get

UnicodeEncodeError: 'ascii' codec can't encode character u'\x93' in
position 61: ordinal not in range(128)

I have kind of developped this but obviously it's not nice, any better
ideas?

try:
text=texts[i]
text=text.encode('latin-1')
text=text.encode('utf-8')
except:
text=' '

Dirk

From: Steven D'Aprano on 23 Jul 2010 06:42

On Fri, 23 Jul 2010 03:14:11 -0700, dirknbr wrote:

> I am having some problems with unicode from json.
>
> This is the error I get
>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\x93' in
> position 61: ordinal not in range(128)
>
> I have kind of developped this but obviously it's not nice, any better
> ideas?
>
> try:
> text=texts[i]
> text=text.encode('latin-1')
> text=text.encode('utf-8')
> except:
> text=' '

Don't write bare excepts, always catch the error you want and nothing
else. As you've written it, the result of encoding with latin-1 is thrown
away, even if it succeeds.

text = texts[i] # Don't hide errors here.
try:
text = text.encode('latin-1')
except UnicodeEncodeError:
try:
text = text.encode('utf-8')
except UnicodeEncodeError:
text = ' '
do_something_with(text)

Another thing you might consider is setting the error handler:

text = text.encode('utf-8', errors='ignore')

Other error handlers are 'strict' (the default), 'replace' and
'xmlcharrefreplace'.

--
Steven

From: Chris Rebert on 23 Jul 2010 06:45

On Fri, Jul 23, 2010 at 3:14 AM, dirknbr <dirknbr(a)gmail.com> wrote:
> I am having some problems with unicode from json.
>
> This is the error I get
>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\x93' in
> position 61: ordinal not in range(128)

Please include the full Traceback and the actual code that's causing
the error! We aren't mind readers.

This error basically indicates that you're incorrectly mixing byte
strings and Unicode strings somewhere.

Cheers,
Chris
--
http://blog.rebertia.com

From: dirknbr on 23 Jul 2010 06:56

To give a bit of context. I am using twython which is a wrapper for
the JSON API

search=twitter.searchTwitter(s,rpp=100,page=str(it),result_type='recent',lang='en')
for u in search[u'results']:
ids.append(u[u'id'])
texts.append(u[u'text'])

This is where texts comes from.

When I then want to write texts to a file I get the unicode error.

Dirk

From: Thomas Jollans on 23 Jul 2010 12:27

On 07/23/2010 12:56 PM, dirknbr wrote:
> To give a bit of context. I am using twython which is a wrapper for
> the JSON API
>
>
> search=twitter.searchTwitter(s,rpp=100,page=str(it),result_type='recent',lang='en')
> for u in search[u'results']:
> ids.append(u[u'id'])
> texts.append(u[u'text'])
>
> This is where texts comes from.
>
> When I then want to write texts to a file I get the unicode error.

So your data is unicode? Good.

Well, files are just streams of bytes, so to write unicode data to one
you have to encode it. Since Python can't know which encoding you want
to use (utf-8, by the way, if you ask me), you have to do it manually.

something like:

outfile.write(text.encode('utf-8'))

| Next | Last
Pages: 1 2 3 4
Prev: make a folder to .nsi file(which finally will convert to .exe) use python
Next: time between now and the next 2:30 am?