Prev: localtime deprecated?
Next: bind guard ?
From: Maxim Yegorushkin on 8 Jun 2006 19:10 Dave wrote: > A few weeks ago I looked for an implementation of std::string that can > handle UTF8 strings. I was thinking that the STL iterator abstraction > would be nice for iterating over a variable length encoded string. So > far I haven't found anything. Does anybody know of a UTF8 std::string > implementation? std::string was not designed to handle variable length characters. The idea was that code works with fixed length characters most of the time only converting to/from variable length characters on output/input. [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Tom Widmer on 9 Jun 2006 04:50 Dave wrote: > A few weeks ago I looked for an implementation of std::string that can > handle UTF8 strings. I was thinking that the STL iterator abstraction > would be nice for iterating over a variable length encoded string. So > far I haven't found anything. Does anybody know of a UTF8 std::string > implementation? > > I'm really curious how the char_traits template was implemented to > handle variable length character encodings. std::basic_string and std::char_traits only operate on fixed width encodings. The general std approach is to only use variable length encodings in storage, converting them to and from fixed length when performing IO (using a codecvt facet). OTOH, lots of other string libraries do handle UTF8 strings, just not std::basic_string. Tom [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Bronek Kozicki on 9 Jun 2006 05:03 Dave wrote: > A few weeks ago I looked for an implementation of std::string that can > handle UTF8 strings. I was thinking that the STL iterator abstraction I suggest that for your normal data processing needs you stick with fixed-width Unicode encodings, like UTF16 or UTF32 - most std::wstring implementations directly support one or another. Use UTF8 only for input/output using IO specific for your platform and/or its support functions. The reason is simple - efficiency. B. [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: shunsuke on 9 Jun 2006 05:17 Dave wrote: > A few weeks ago I looked for an implementation of std::string that can > handle UTF8 strings. I was thinking that the STL iterator abstraction > would be nice for iterating over a variable length encoded string. So > far I haven't found anything. Does anybody know of a UTF8 std::string > implementation? Boost has (secretly?) such iterators. Check <boost/regex/pending/unicode_iterator.hpp> They are not string classes but iterator adaptors. -- Shunsuke Sogame [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: kanze on 9 Jun 2006 18:14
Bronek Kozicki wrote: > Dave wrote: > > A few weeks ago I looked for an implementation of std::string that > > can handle UTF8 strings. I was thinking that the STL iterator > > abstraction > I suggest that for your normal data processing needs you stick with > fixed-width Unicode encodings, like UTF16 or UTF32 - most std::wstring > implementations directly support one or another. Use UTF8 only for > input/output using IO specific for your platform and/or its support > functions. The reason is simple - efficiency. I'm not sure I agree. I think a lot depends on the application. For a large set of applications, I'm pretty sure that UTF-8 strings would be more efficient. With the correct supporting tools (e.g. a regex class which understands them), they probably wouldn't be any harder to use. The one case where they really loose is with random access based strictly on the character index, e.g. accessing the 132nd character in a string (without accessing any of the intermediate characters). But if my applications are typical, that's something that you never do -- outside of an editor, when would you do something like that? -- James Kanze GABI Software Conseils en informatique orient?e objet/ Beratung in objektorientierter Datenverarbeitung 9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34 [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |