From: Arne Vajhøj on 19 Mar 2010 21:15 On 19-03-2010 14:25, Jeff Johnson wrote: > "Peter Duniho"<no.peted.spam(a)no.nwlink.spam.com> wrote in message > news:uIZGlx3xKHA.5364(a)TK2MSFTNGP05.phx.gbl... >>>> I would not consider Unicode an encoding. >>> >>> Uh, why? An encoding is simply a means of associating a set of bytes with >>> the characters they represent. That's what Unicode does. >> >> I believe Arne's point is that "Unicode" by itself does not describe a way >> to encode characters as bytes. There are specific encodings within >> Unicode (as part of the standard): UTF-8, UTF-16, and UTF-32. But Unicode >> by itself describes a collection of valid characters, not how they are >> encoded as bytes. > > Ah. I just go with the convention that "Unicode" by itself, at least in the > .NET world, means UTF-16LE. It is relative common in traditional Win32 C++ context. I hope that it is not so common in .NET context. The docs for String and Char very specifically say that it is Unicode in UTF-16 encoding. But it may very well be the original posters interpretation as well, because he listed Unicode but not UTF-16. Arne
From: Tony Johansson on 20 Mar 2010 06:23 "Arne Vajh�j" <arne(a)vajhoej.dk> skrev i meddelandet news:4ba42194$0$276$14726298(a)news.sunsite.dk... > On 19-03-2010 14:25, Jeff Johnson wrote: >> "Peter Duniho"<no.peted.spam(a)no.nwlink.spam.com> wrote in message >> news:uIZGlx3xKHA.5364(a)TK2MSFTNGP05.phx.gbl... >>>>> I would not consider Unicode an encoding. >>>> >>>> Uh, why? An encoding is simply a means of associating a set of bytes >>>> with >>>> the characters they represent. That's what Unicode does. >>> >>> I believe Arne's point is that "Unicode" by itself does not describe a >>> way >>> to encode characters as bytes. There are specific encodings within >>> Unicode (as part of the standard): UTF-8, UTF-16, and UTF-32. But >>> Unicode >>> by itself describes a collection of valid characters, not how they are >>> encoded as bytes. >> >> Ah. I just go with the convention that "Unicode" by itself, at least in >> the >> .NET world, means UTF-16LE. > > It is relative common in traditional Win32 C++ context. > > I hope that it is not so common in .NET context. The docs for > String and Char very specifically say that it is Unicode in > UTF-16 encoding. > > But it may very well be the original posters interpretation > as well, because he listed Unicode but not UTF-16. > > Arne I used the different enum that Encoding class had. Here the Unicode was UTF-16. //Tony
From: Jeff Johnson on 20 Mar 2010 10:51
"Arne Vajh�j" <arne(a)vajhoej.dk> wrote in message news:4ba42087$0$276$14726298(a)news.sunsite.dk... >>> I would not consider Unicode an encoding. >> >> Uh, why? An encoding is simply a means of associating a set of bytes with >> the characters they represent. That's what Unicode does. > > No. > > Unicode is a mapping between the various symbols and a number. > > Encoding is the mapping between the number and 1-many bytes. Right, but consider this little gem: =========== Encoding.Unicode Property Gets an encoding for the UTF-16 format using the little-endian byte order. =========== I think people can be forgiven for equating the two, especially in the context of .NET code, since Microsoft plainly made it look that way. |