encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna') [Python]

Prev: [Germany / München] Munich Python User Group Meeting this thursday!
Next: example of multi threads

From: Gabriel Genellina on 23 Jun 2010 03:53

En Tue, 22 Jun 2010 11:02:58 -0300, <python(a)bdurham.com> escribi�:

> Python 2.6.4 (Win32): Anyone have any explanation for the
> following
>
> encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna'
> )
>
> Given that other processes may have to use the output of these
> methods, what is the recommended technique?
>
> Demonstration:
>
>>>> import encodings.idna
>>>> name = u'junk\xfc\xfd.txt'
>>>> name
> u'junk\xfc\xfd.txt'
>>>> encodings.idna.ToASCII( name )
> 'xn--junk.txt-95ak'
>>>> name.encode( 'idna' )
> 'xn--junk-3rag.txt'
>>>> encodings.idna.ToUnicode( encodings.idna.ToASCII( name ) )
> u'junk\xfc\xfd.txt'
>>>> name.encode( 'idna' ).decode( 'idna' )
> u'junk\xfc\xfd.txt'

IDNA is *specifically* designed to operate with domain names, not
arbitrary text. (IDNA = Internationalizing Domain Names in Applications).
Even the encoding/decoding part alone(punycode) is specifically tailored
for use in domain names. Do not use it for any other purpose.

That said, it seems that encodings.idna.ToUnicode/ToAscii work on
individual 'labels' only (a domain name is comprised of several labels
separated by '.') -- and encode/decode('idna') takes the whole name,
splits, and processes each label (following RFC 3490, I presume)

--
Gabriel Genellina

|
Pages: 1
Prev: [Germany / München] Munich Python User Group Meeting this thursday!
Next: example of multi threads