Prev: [Germany / München] Munich Python User Group Meeting this thursday!
Next: example of multi threads
From: Gabriel Genellina on 23 Jun 2010 03:53 En Tue, 22 Jun 2010 11:02:58 -0300, <python(a)bdurham.com> escribi�: > Python 2.6.4 (Win32): Anyone have any explanation for the > following > > encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna' > ) > > Given that other processes may have to use the output of these > methods, what is the recommended technique? > > Demonstration: > >>>> import encodings.idna >>>> name = u'junk\xfc\xfd.txt' >>>> name > u'junk\xfc\xfd.txt' >>>> encodings.idna.ToASCII( name ) > 'xn--junk.txt-95ak' >>>> name.encode( 'idna' ) > 'xn--junk-3rag.txt' >>>> encodings.idna.ToUnicode( encodings.idna.ToASCII( name ) ) > u'junk\xfc\xfd.txt' >>>> name.encode( 'idna' ).decode( 'idna' ) > u'junk\xfc\xfd.txt' IDNA is *specifically* designed to operate with domain names, not arbitrary text. (IDNA = Internationalizing Domain Names in Applications). Even the encoding/decoding part alone(punycode) is specifically tailored for use in domain names. Do not use it for any other purpose. That said, it seems that encodings.idna.ToUnicode/ToAscii work on individual 'labels' only (a domain name is comprised of several labels separated by '.') -- and encode/decode('idna') takes the whole name, splits, and processes each label (following RFC 3490, I presume) -- Gabriel Genellina
|
Pages: 1 Prev: [Germany / München] Munich Python User Group Meeting this thursday! Next: example of multi threads |