Why is not Encoding.Unicode called Encoding.UTF16 [CSharp]

Prev: how to display all the code points in a code page ASMO-708
Next: Outlook 2003 Plugin from visual studio 2008

From: Tony Johansson on 3 Jun 2010 08:34

Hi!

Here I have some example of different encodings.
We for example UTF7 and UTF8.

But what seems strange is why not Encoding.Unicode is called UTF16 ?

StreamWriter swUtf7 = new StreamWriter("utf7.txt", false,
Encoding.UTF7);
swUtf7.WriteLine("Hello, World!");
swUtf7.Close();

StreamWriter swUtf8 = new StreamWriter("utf8.txt", false,
Encoding.UTF8);
swUtf8.WriteLine("Hello, World!");
swUtf8.Close();

StreamWriter swUtf16 = new StreamWriter("utf16.txt", false,
Encoding.Unicode);
swUtf16.WriteLine("Hello, World!");
swUtf16.Close();

StreamWriter swUtf32 = new StreamWriter("utf32.txt", false,
Encoding.UTF32);
swUtf32.WriteLine("Hello, World!");
swUtf32.Close();

//Tony

From: Jeff Johnson on 3 Jun 2010 11:53

"Tony Johansson" <johansson.andersson(a)telia.com> wrote in message
news:uwHWnkxALHA.980(a)TK2MSFTNGP04.phx.gbl...

> Here I have some example of different encodings.
> We for example UTF7 and UTF8.
>
> But what seems strange is why not Encoding.Unicode is called UTF16 ?

That's just what Microsoft chose to call it, unfortunately. They probably
did so because that was the encoding they chose for storing strings
internally in .NET, so they considered it the "baseline" Unicode encoding.

From: Thomas Scheidegger on 3 Jun 2010 16:13

> why not Encoding.Unicode is called UTF16?

at the beginning there was only one 'Unicode' standard called 'UCS-2'
and was choosen (many years ago) for Windows (NT, Win32).
Later it was redefined/extended as 'UTF-16';
read:
http://en.wikipedia.org/wiki/UTF-16/UCS-2

FAQ: What is the difference between UCS-2 and UTF-16?
http://www.unicode.org/faq/basic_q.html#14

--
Thomas Scheidegger - 'NETMaster'
http://dnetmaster.net/

From: Arne Vajhøj on 3 Jun 2010 18:46

On 03-06-2010 08:34, Tony Johansson wrote:
> But what seems strange is why not Encoding.Unicode is called UTF16 ?

Bad decision by MS. It is very misleading.

But remember that .NET was not created without any history.

In Win32 API world there is a long history for using:

ANSI = single byte charset (ISO-8859-x or more specifically their
CodePage equivalents)

Unicode = UCS-2/UTF-16

It is confusing because ANSI is an american standardization organization
and Unicode is the codepoint tables.

But a lot of the people that worked creating .NET came from the
Win32 API world.

I believe this is something .NET inherited.

And no matter how much we dislike it, then Encoding.Unicode is
not likely to be removed ever, because it would break too much code.

What they could do was to:
- add Encoding.UTF16 with the same functionality
- mark Encoding.Unicode as deprecated

Arne

|
Pages: 1
Prev: how to display all the code points in a code page ASMO-708
Next: Outlook 2003 Plugin from visual studio 2008