decode(..., errors='ignore') has no effect [Python]

Prev: problem with multiprocessing and defaultdict
Next: Is python not good enough?

From: Jens Müller on 12 Jan 2010 06:38

Hi,

I try to decode a string,e.g.
u'M\xfcnchen, pronounced [\u02c8m\u028fn\xe7\u0259n]'.decode('cp1252',
'ignore')
but even thoug I use errors='ignore'
I get UnicodeEncodeError: 'charmap' codec can't encode character u'\u02c8'
in position 21: character maps to <undefined>

How come?

Thanks,
Jens

From: Peter Otten on 12 Jan 2010 07:06

Jens Müller wrote:

> I try to decode a string,e.g.
> u'M\xfcnchen, pronounced [\u02c8m\u028fn\xe7\u0259n]'.decode('cp1252',
> 'ignore')
> but even thoug I use errors='ignore'
> I get UnicodeEncodeError: 'charmap' codec can't encode character u'\u02c8'
> in position 21: character maps to <undefined>
>
> How come?

To convert unicode into str you have to *encode()* it.

u"...".decode(...) will implicitly convert to ASCII first, i. e. is
equivalent to

u"...".encode("ascii").decode(...)

Hence the error message

....codec can't encode character u'\u02c8'...

Peter

From: Ulrich Eckhardt on 12 Jan 2010 07:04

Jens Müller wrote:
> I try to decode a string,e.g.
> u'M\xfcnchen, pronounced [\u02c8m\u028fn\xe7\u0259n]'.decode('cp1252',
> 'ignore')
> but even thoug I use errors='ignore'
> I get UnicodeEncodeError: 'charmap' codec can't encode character u'\u02c8'
> in position 21: character maps to <undefined>
>
> How come?

Wrong way? Don't you want to encode the Unicode string using codepage 1252
instead?

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

From: Jens Müller on 12 Jan 2010 07:50

> To convert unicode into str you have to *encode()* it.
>
> u"...".decode(...) will implicitly convert to ASCII first, i. e. is
> equivalent to
>
> u"...".encode("ascii").decode(...)
>
> Hence the error message

Ah - yes of course.

And how can you use the system's default encoding with errors=ignore?
The default encoding is the one that is used if no parameters are given to
"encode".

Thanks again!

From: Lie Ryan on 12 Jan 2010 09:27

On 01/12/10 23:50, Jens Müller wrote:
>> To convert unicode into str you have to *encode()* it.
>>
>> u"...".decode(...) will implicitly convert to ASCII first, i. e. is
>> equivalent to
>>
>> u"...".encode("ascii").decode(...)
>>
>> Hence the error message
>
> Ah - yes of course.
>
> And how can you use the system's default encoding with errors=ignore?
> The default encoding is the one that is used if no parameters are given
> to "encode".
>
> Thanks again!

>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> u'M\xfcnchen, pronounced
[\u02c8m\u028fn\xe7\u0259n]'.encode(sys.getdefaultencoding(), 'ignore')
'Mnchen, pronounced [mnn]'

unless this is for debugging, I doubt ignoring error in this particular
case is an acceptable solution (how do you pronounce [mnn]?)

| Next | Last
Pages: 1 2
Prev: problem with multiprocessing and defaultdict
Next: Is python not good enough?