From: Jens Müller on 12 Jan 2010 06:38 Hi, I try to decode a string,e.g. u'M\xfcnchen, pronounced [\u02c8m\u028fn\xe7\u0259n]'.decode('cp1252', 'ignore') but even thoug I use errors='ignore' I get UnicodeEncodeError: 'charmap' codec can't encode character u'\u02c8' in position 21: character maps to <undefined> How come? Thanks, Jens
From: Peter Otten on 12 Jan 2010 07:06 Jens Müller wrote: > I try to decode a string,e.g. > u'M\xfcnchen, pronounced [\u02c8m\u028fn\xe7\u0259n]'.decode('cp1252', > 'ignore') > but even thoug I use errors='ignore' > I get UnicodeEncodeError: 'charmap' codec can't encode character u'\u02c8' > in position 21: character maps to <undefined> > > How come? To convert unicode into str you have to *encode()* it. u"...".decode(...) will implicitly convert to ASCII first, i. e. is equivalent to u"...".encode("ascii").decode(...) Hence the error message ....codec can't encode character u'\u02c8'... Peter
From: Ulrich Eckhardt on 12 Jan 2010 07:04 Jens Müller wrote: > I try to decode a string,e.g. > u'M\xfcnchen, pronounced [\u02c8m\u028fn\xe7\u0259n]'.decode('cp1252', > 'ignore') > but even thoug I use errors='ignore' > I get UnicodeEncodeError: 'charmap' codec can't encode character u'\u02c8' > in position 21: character maps to <undefined> > > How come? Wrong way? Don't you want to encode the Unicode string using codepage 1252 instead? Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
From: Jens Müller on 12 Jan 2010 07:50 > To convert unicode into str you have to *encode()* it. > > u"...".decode(...) will implicitly convert to ASCII first, i. e. is > equivalent to > > u"...".encode("ascii").decode(...) > > Hence the error message Ah - yes of course. And how can you use the system's default encoding with errors=ignore? The default encoding is the one that is used if no parameters are given to "encode". Thanks again!
From: Lie Ryan on 12 Jan 2010 09:27 On 01/12/10 23:50, Jens Müller wrote: >> To convert unicode into str you have to *encode()* it. >> >> u"...".decode(...) will implicitly convert to ASCII first, i. e. is >> equivalent to >> >> u"...".encode("ascii").decode(...) >> >> Hence the error message > > Ah - yes of course. > > And how can you use the system's default encoding with errors=ignore? > The default encoding is the one that is used if no parameters are given > to "encode". > > Thanks again! >>> import sys >>> sys.getdefaultencoding() 'ascii' >>> u'M\xfcnchen, pronounced [\u02c8m\u028fn\xe7\u0259n]'.encode(sys.getdefaultencoding(), 'ignore') 'Mnchen, pronounced [mnn]' unless this is for debugging, I doubt ignoring error in this particular case is an acceptable solution (how do you pronounce [mnn]?)
|
Next
|
Last
Pages: 1 2 Prev: problem with multiprocessing and defaultdict Next: Is python not good enough? |