Prev: localtime deprecated?
Next: bind guard ?
From: Dave on 7 Jun 2006 18:20 A few weeks ago I looked for an implementation of std::string that can handle UTF8 strings. I was thinking that the STL iterator abstraction would be nice for iterating over a variable length encoded string. So far I haven't found anything. Does anybody know of a UTF8 std::string implementation? I'm really curious how the char_traits template was implemented to handle variable length character encodings. Thanks, Dave [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: johnchx2@yahoo.com on 8 Jun 2006 18:46 Dave wrote: > I'm really curious how the char_traits template was implemented to > handle variable length character encodings. I don't think it has been. std::basic_string is, AFAIK, intended to work only with fixed-length encodings (i.e. "internal" representation). Translation to and from variable-length encodings is handled by locales associated with i/o streams. There may however be std::string-like classes out there that handle variable length encodings. [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Niek Sanders on 8 Jun 2006 18:51 Dave wrote: > A few weeks ago I looked for an implementation of std::string that can > handle UTF8 strings. I was thinking that the STL iterator abstraction > would be nice for iterating over a variable length encoded string. So > far I haven't found anything. Does anybody know of a UTF8 std::string > implementation? > The QString class in TrollTech's QT library supports UTF8. The documentation is here: http://doc.trolltech.com/4.0/qstring.html - Niek Sanders http://www.cis.rit.edu/~njs8030/ [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Ulrich Eckhardt on 8 Jun 2006 18:54 Dave wrote: > A few weeks ago I looked for an implementation of std::string that can > handle UTF8 strings. I was thinking that the STL iterator abstraction > would be nice for iterating over a variable length encoded string. So > far I haven't found anything. Does anybody know of a UTF8 std::string > implementation? > > I'm really curious how the char_traits template was implemented to > handle variable length character encodings. It isn't. std::basic_string assumes that you have one character per element of the string. That said, there are basically two ways to use std::basic_string when you need UTF-8: 1. Use std::wstring This means that you internally use wchar_t as character type which, at least on some platforms, can hold the whole Unicode range in characters. You then convert to UTF-8 where you need it (Note: iostreams already contain a conversion facility called codecvt which is perfectly suited to reading and writing UTF-8 files). 2. Use std::string This means that you store the UTF-8 string as-is in a char based string. The main caveat is that somestring[4] will not give you the fifth character of the string, it just gives you the fifth byte. Typically, you don't need single-character access very often though, so that should not be a problem - if you need, you could implement an iterator that iterates over a UTF-8 sequence or simply convert it to wchar_t if that suffices. I personally use std::wstring (and wcout, wfstream etc) for every thing that is supposed to be presented to a user in my programs and that needs full Unicode range. Uli [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Vaclav Haisman on 8 Jun 2006 19:09
Dave wrote, On 8.6.2006 0:20: > A few weeks ago I looked for an implementation of std::string that can > handle UTF8 strings. I was thinking that the STL iterator abstraction > would be nice for iterating over a variable length encoded string. So > far I haven't found anything. Does anybody know of a UTF8 std::string > implementation? > > I'm really curious how the char_traits template was implemented to > handle variable length character encodings. > > Thanks, > Dave > Try IBM's libICU. -- VH [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |