Prev: localtime deprecated?
Next: bind guard ?
From: Alf P. Steinbach on 11 Jun 2006 17:28 * jrm: > std::wstring might not be a good idea according to the details section > here from ustring class: > > <snip > src=http://www.gtkmm.org/docs/glibmm-2.4/docs/reference/html/classGlib_1_1ustring.html#_details> I see nothing there that says std::wstring with UTF-16 or UTF-32 would be a bad choice. However, if more than 16-bit Unicode (the original Unicode, now the Basic Multilingual Plane of full Unicode) is required, then on a C++ implementation with 16-bit wchar_t -- such as a Windows C++ compiler -- a std::wstring has the same potential problem as a std::string has with UTF-8, that it doesn't support the variable length encoding. On the third hand, if the platform is exclusively Windows (NT family), then std::wstring corresponds directly to what's required for system calls, so that in most cases no conversion is required, either way. -- A: Because it messes up the order in which people normally read text. Q: Why is it such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Bronek Kozicki on 11 Jun 2006 17:34 Jeff Koftinoff wrote: > But UTF-16 and UTF-32 both are potentially multi-code-point per > character encodings... See the "Grapheme Boundaries" section of: they are best one can get now. B. [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Bronek Kozicki on 11 Jun 2006 17:33 jrm wrote: > std::wstring might not be a good idea according to the details section > here from ustring class: why not? std::wstring is typicaly implemented on top of Unicode support of target platform, and character type used is typically some fixed-width Unicode encoding, like UTF16 (on Windows) of UTF32 (on Linux; I do not know about other flavours of Unix). UTF8 is not character type (neither UTF16 or UTF32 are, but at least they are fixed width, so they can map to wchar_t) but fancy encoding. And typical location of data encoding is not in data processing, but input/output. Anything that can be represented in UTF8 can be also represented in UTF32 and in UTF16 (or almost anything - there are surrogates to compensate shorter characters in UTF16, but I'm not sure how much value they provide) B. [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Pete Becker on 11 Jun 2006 17:40 Wu Yongwei wrote: > > A gotcha under Windows: wchar_t is 2 bytes wide. > wchar_t is a type defined by the compiler. For some Windows compilers it's 2 bytes wide, for others it isn't. -- Pete Becker Roundhouse Consulting, Ltd. [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Pete Becker on 11 Jun 2006 17:41
jrm wrote: > std::wstring might not be a good idea according to the details section > here from ustring class: > > <snip > src=http://www.gtkmm.org/docs/glibmm-2.4/docs/reference/html/classGlib_1_1ustring.html#_details> > > In a perfect world the C++ Standard Library would contain a UTF-8 > string class. Unfortunately, the C++ standard doesn't mention UTF-8 at > all. Note that std::wstring is not a UTF-8 string class because it > contains only fixed-width characters (where width could be 32, 16, or > even 8 bits). > > </snip> > Back in the olden days, the Japanese tried to work with multi-byte representations of Japanese characters. The result of that experience was that they insisted that C add wide character support so they wouldn't have to. -- Pete Becker Roundhouse Consulting, Ltd. [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |