From: Tony Johansson on 25 Mar 2010 11:05 Hi! This Unicode UTF-8 can use up to 24 bit for encoding. UTF-8 support almost all languages so what is the reason to use another Unicode then this UTF-8. //Tony
From: Tony Johansson on 25 Mar 2010 11:09 "Tony Johansson" <johansson.andersson(a)telia.com> skrev i meddelandet news:OU1ZNyCzKHA.5940(a)TK2MSFTNGP02.phx.gbl... > Hi! > > This Unicode UTF-8 can use up to 24 bit for encoding. UTF-8 support almost > all languages so what is the reason > to use another Unicode then this UTF-8. > > //Tony I must correct myself UTF-8 can use up to 48-bit. //Tony
From: Maate on 25 Mar 2010 12:33 On 25 Mar., 16:09, "Tony Johansson" <johansson.anders...(a)telia.com> wrote: > "Tony Johansson" <johansson.anders...(a)telia.com> skrev i meddelandetnews:OU1ZNyCzKHA.5940(a)TK2MSFTNGP02.phx.gbl... > > > Hi! > > > This Unicode UTF-8 can use up to 24 bit for encoding. UTF-8 support almost > > all languages so what is the reason > > to use another Unicode then this UTF-8. > > > //Tony > > I must correct myself UTF-8 can use up to 48-bit. > > //Tony Hey, I'm not sure, but I would guess that UTF-8 is slightly more expensive to parse than other unicode encodings. For example, when reading UTF-16 encoded text the parser would know that it has to read exactly two bytes per character. On the other hand, if UTF-8 encoded, the number of bytes to read per character will depend on the information stored in individual bits. You could consider just a simple example: this code in c# "my test string".Substring(5, 1), will be easy to calculate in UTF-16, but with UTF-8 the parser would have to calculate the individual character starting from the beginning in order to determine which bytes actually represents character number 5 - perhaps making it at least 5 times as expensive. Probably this also explains why for example .NET CLR stores text as UTF-16 internally - it probably makes it easier (better performant) to manipulate and search text. Anyway, just some thoughts :-) Br. Morten
From: Chris Dunaway on 25 Mar 2010 12:58 On Mar 25, 10:05 am, "Tony Johansson" <johansson.anders...(a)telia.com> wrote: > Hi! > > This Unicode UTF-8 can use up to 24 bit for encoding. UTF-8 support almost > all languages so what is the reason > to use another Unicode then this UTF-8. > > //Tony http://www.joelonsoftware.com/articles/Unicode.html
From: Konrad Neitzel on 25 Mar 2010 15:12
Hi all! "Maate" <maate(a)retkomma.dk> schrieb im Newsbeitrag news:cb98f95e-6f15-45c7-bc05-44e0b96f922d(a)e7g2000yqf.googlegroups.com... > Hey, I'm not sure, but I would guess that UTF-8 is slightly more > expensive to parse than other unicode encodings. Why that? UTF-16 also is not fixed to 2 Bytes per character. It can use more bytes per character if required (A reason, why there is also a UTF-32) > For example, when > reading UTF-16 encoded text the parser would know that it has to read > exactly two bytes per character. On the other hand, if UTF-8 encoded, > the number of bytes to read per character will depend on the > information stored in individual bits. And yes, that can be the important point. Whenever you want to have random access to characters without parsing all characters till the character you want to read, you must be carefull that you really know how you many bytes each character has. UTF-16 is not fixed to 2 Bytes! That is a common mistake you find often. If you want a fixed 2 Byte encoding, UCS-2 could be choosen but then you do not support all characters that are supported with UTF-16! More details can be found on http://en.wikipedia.org/wiki/UTF http://en.wikipedia.org/wiki/UTF-16 http://en.wikipedia.org/wiki/UTF-32 Konrad |