From: Tony Johansson on 18 Mar 2010 11:39 Hi! Here is some encodings standards 1.ASCII 2.Unicode 3.UTF-7 4.UTF-8 5.UTF-32 In the beginning of the file encoded with Unicode,UTF-8 and UTF-32 is code markers but file encoded with ASCII and UTF-7 does not contains any code markers at all. So why is that not code markers for these two. //Tony
From: Harlan Messinger on 18 Mar 2010 12:22 Tony Johansson wrote: > Hi! > > Here is some encodings standards > 1.ASCII > 2.Unicode > 3.UTF-7 > 4.UTF-8 > 5.UTF-32 > > In the beginning of the file encoded with Unicode,UTF-8 and UTF-32 is code > markers but file encoded > with ASCII and UTF-7 does not contains any code markers at all. > So why is that not code markers for these two. > The purpose of the marker is to indicate whether the data is stored in "big-endian" or "little-endian" order--that is, whether multibyte encodings are arranged high-order byte first or low-order byte first. Therefore, the need for this marker only arose when multibyte encodings were introduced.
From: Peter Duniho on 18 Mar 2010 12:53 Tony Johansson wrote: > Hi! > > Here is some encodings standards > 1.ASCII > 2.Unicode > 3.UTF-7 > 4.UTF-8 > 5.UTF-32 > > In the beginning of the file encoded with Unicode,UTF-8 and UTF-32 is code > markers but file encoded > with ASCII and UTF-7 does not contains any code markers at all. > So why is that not code markers for these two. You are not guaranteed markers for the standard Unicode formats either. ASCII was "designed" long before anyone was really thinking hard about portable character encodings, so there was no chance it would support a marker. And UTF-7 is used in such specialized situations, there's no need for a marker because anything that can use it will be doing so in a context where there's some other way to specify the format. In general, it's very difficult to identify encoding from the text file itself. There are some exceptions (XML allows inclusion of the encoding, for example, as part of the header), but most of the time encoded text needs some external indicator as to what encoding is used. Either some convention or some explicit statement to that effect. Pete
From: Jeff Johnson on 18 Mar 2010 13:47 "Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message news:uafrPurxKHA.3560(a)TK2MSFTNGP02.phx.gbl... > In general, it's very difficult to identify encoding from the text file > itself. Yup: http://blogs.msdn.com/michkap/archive/2006/07/11/662342.aspx
From: Arne Vajhøj on 18 Mar 2010 21:17
On 18-03-2010 11:39, Tony Johansson wrote: > Here is some encodings standards > 1.ASCII > 2.Unicode > 3.UTF-7 > 4.UTF-8 > 5.UTF-32 > > In the beginning of the file encoded with Unicode,UTF-8 and UTF-32 is code > markers but file encoded > with ASCII and UTF-7 does not contains any code markers at all. > So why is that not code markers for these two. I would not consider Unicode an encoding. And the BOM is optional not required for UTF-8. Regarding why then BOM only makes sense for certain encodings, but in the end it is a matter of choice by whoever designed the encoding. If you define the TonyEncoding to map between Unicode and bytes, then you can put the headers in front that you want. Arne |