From: Jeff Johnson on 2 Apr 2010 11:40 "Tim Roberts" <timr(a)probo.com> wrote in message news:d01br5d743h8428470dlsoaj1ngthi7j0b(a)4ax.com... > So, yes, the Unicode code points from U+0080 to U+00FF always take two > bytes in UTF-8. But the "opposite" is not true! That is, just because the UTF-8 encoding yields 2 bytes does not suggest that the UTF-16 encoding will "likely" have 0 in the MSB. If there are 1920 possible 2-byte UTF-8 sequences and only 128 of them represent U+0080 - U+00FF, then that accounts for only 6.667% of the possible 2-byte sequences. So back to Tony's question: >> When UTF-8 encoding is using 2 bytes is it then common that UTF-16 has >> zeros >> in the highorder byte as it is in this case where 241 fits in one byte ? I would say "Don't count on it."
From: Tim Roberts on 4 Apr 2010 00:06 "Jeff Johnson" <i.get(a)enough.spam> wrote: > >But the "opposite" is not true! That is, just because the UTF-8 encoding >yields 2 bytes does not suggest that the UTF-16 encoding will "likely" have >0 in the MSB. If there are 1920 possible 2-byte UTF-8 sequences and only 128 >of them represent U+0080 - U+00FF, then that accounts for only 6.667% of the >possible 2-byte sequences. So back to Tony's question: > >>> When UTF-8 encoding is using 2 bytes is it then common that UTF-16 has >>> zeros in the highorder byte as it is in this case where 241 fits in one >>> byte ? > >I would say "Don't count on it." You're right. The question I read was not the question he really asked. -- Tim Roberts, timr(a)probo.com Providenza & Boekelheide, Inc.
First
|
Prev
|
Pages: 1 2 Prev: This spanish character string "ñ" cause something that I don't understand Next: PGP Decryption |