encoding [CSharp]

Prev: How To Iterate the Hashtable in C#
Next: Serialization/LINQ

From: Tony Johansson on 18 Mar 2010 11:39

Hi!

Here is some encodings standards
1.ASCII
2.Unicode
3.UTF-7
4.UTF-8
5.UTF-32

In the beginning of the file encoded with Unicode,UTF-8 and UTF-32 is code
markers but file encoded
with ASCII and UTF-7 does not contains any code markers at all.
So why is that not code markers for these two.

//Tony

From: Harlan Messinger on 18 Mar 2010 12:22

From: Peter Duniho on 18 Mar 2010 12:53

Tony Johansson wrote:
> Hi!
>
> Here is some encodings standards
> 1.ASCII
> 2.Unicode
> 3.UTF-7
> 4.UTF-8
> 5.UTF-32
>
> In the beginning of the file encoded with Unicode,UTF-8 and UTF-32 is code
> markers but file encoded
> with ASCII and UTF-7 does not contains any code markers at all.
> So why is that not code markers for these two.

You are not guaranteed markers for the standard Unicode formats either.

ASCII was "designed" long before anyone was really thinking hard about
portable character encodings, so there was no chance it would support a
marker.

And UTF-7 is used in such specialized situations, there's no need for a
marker because anything that can use it will be doing so in a context
where there's some other way to specify the format.

In general, it's very difficult to identify encoding from the text file
itself. There are some exceptions (XML allows inclusion of the
encoding, for example, as part of the header), but most of the time
encoded text needs some external indicator as to what encoding is used.
Either some convention or some explicit statement to that effect.

Pete

From: Jeff Johnson on 18 Mar 2010 13:47

"Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message
news:uafrPurxKHA.3560(a)TK2MSFTNGP02.phx.gbl...

> In general, it's very difficult to identify encoding from the text file
> itself.

Yup: http://blogs.msdn.com/michkap/archive/2006/07/11/662342.aspx

From: Arne Vajhøj on 18 Mar 2010 21:17

On 18-03-2010 11:39, Tony Johansson wrote:
> Here is some encodings standards
> 1.ASCII
> 2.Unicode
> 3.UTF-7
> 4.UTF-8
> 5.UTF-32
>
> In the beginning of the file encoded with Unicode,UTF-8 and UTF-32 is code
> markers but file encoded
> with ASCII and UTF-7 does not contains any code markers at all.
> So why is that not code markers for these two.

I would not consider Unicode an encoding.

And the BOM is optional not required for UTF-8.

Regarding why then BOM only makes sense for certain
encodings, but in the end it is a matter of
choice by whoever designed the encoding.

If you define the TonyEncoding to map between Unicode
and bytes, then you can put the headers in front that
you want.

Arne

| Next | Last
Pages: 1 2 3
Prev: How To Iterate the Hashtable in C#
Next: Serialization/LINQ