Prev: Generating a derived class from a base class
Next: Why is the return type of count_if() "signed" rather than "unsigned"?
From: Mathias Gaunard on 23 Jul 2010 06:43 On Jul 23, 12:49 am, "Martin B." <0xCDCDC...(a)gmx.at> wrote: > * No unicode aware string class And exactly what would it do that would be of any use? -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Stanley Friesen on 23 Jul 2010 22:55 "Martin B." <0xCDCDCDCD(a)gmx.at> wrote: >Stanley Friesen wrote: >> "joe" <jc1996(a)att.net> wrote: >> >>> Francis Glassborow wrote: >>>> joe wrote: >>> [...] >>>> Anyway this has got very far from C++ where we certainly do need a way >>>> to handle text in more than just American English. >>> Not far at all from C++ given that it has lame support for Unicode, >> >> In C++0X there is actually considerable support. It allows many >> non-punctuation characters in identifiers (e.g. variable names, class >> names &c.). It provides conversions between the three main >> representations (UTF-8, UTF-16, and UTF-32). It at least allows for >> tailorable Unicode collation. The only thing it lacks that I see as a >> substantial issue is UTF-16 and/or UTF-32 iostreams. This is >> unfortunate, as both Windows and modern Unix support such files at the >> OS level. > >As I see it, some support is added for better handling of unicode at >compile time. (Uni character literals, charXX_t, etc.) > >We are left with the same mess we always had at runtime. (modulo >char32_t, maybe): >* No unicode aware string class Support for u16string and u32string seems sufficient for low level purposes, especially combined with conversions between the various formats, and collation support. I am not sure that the *language* should mandate much more than this. More complex Unicode processing is generally task-specific. An editor has different needs than a Web browser, for instance. (Also I think the "ctype" functionality in the Unicode character traits classes has to apply proper Unicode semantics). >* No way to tell what character set a char* is encoded in (and this will >get worse with compile-time u8 constants). >* std::exception works only with char* Which still allows UTF-8 strings. -- The peace of God be with you. Stanley Friesen [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Stanley Friesen on 23 Jul 2010 22:55 Mathias Gaunard <loufoque(a)gmail.com> wrote: >On Jul 22, 1:26 pm, Stanley Friesen <sar...(a)friesen.net> wrote: > >> It provides conversions between the three main >> representations (UTF-8, UTF-16, and UTF-32). > >Not really in a way that is practical to use though. > Well, sstreams (string streams) should provide that capability, even if that is a trifle clumsy. > >> The only thing it lacks that I see as a >> substantial issue is UTF-16 and/or UTF-32 iostreams. This is >> unfortunate, as both Windows and modern Unix support such files at the >> OS level. > >basic_istream<char16_t> etc. should work just fine. That will read or write a UTF-8 file, not a UTF-16/UTF-32 file. The specification is quite clear - it is required to apply the appropriate codecvt facet. > > >> But any decent development >> environment will allow actual Unicode source files, and apply the as-if >> rule to treat valid non-ASCII characters identically to the escape >> codes. > >So GCC, the most widely used C and C++ compiler, is not a decent >development environment? It is not a development environment at all, it is just a compiler. A development environment includes build configuration, syntax-aware editing, syntax-aware searches and so on. Still, I think it would be a very useful improvement to allow it to accept UTF-8 text files, at the very least. >As was clearly stated in the parent message, GCC only supports >inputting unicode characters in identifiers as escape codes. I understand. I also do not consider GCCs C++0X support complete as of now. Heck, the *standard* isn't even official yet, and there have been significant changes to it in the last 6 months that GCC cannot possibly have implemented yet. And last I checked, GCC's documentation made no claim to implement the entirety even of the draft standard at that time. So, before we judge it, let us wait until the GNU people claim full support of the final standard. -- The peace of God be with you. Stanley Friesen [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Mathias Gaunard on 24 Jul 2010 23:28 On Jul 24, 2:55 pm, Stanley Friesen <sar...(a)friesen.net> wrote: > Mathias Gaunard <loufo...(a)gmail.com> wrote: > >On Jul 22, 1:26 pm, Stanley Friesen <sar...(a)friesen.net> wrote: > > >> It provides conversions between the three main > >> representations (UTF-8, UTF-16, and UTF-32). > > >Not really in a way that is practical to use though. > > Well, sstreams (string streams) should provide that capability, even if > that is a trifle clumsy. string streams do not invoke codecvt facets, only file streams do. Also note most current implementations do not allow N to M conversion with codecvt facets, and only allow one-way 1 to N (in-memory fixed width, in-file variable-width), so I'd be quite careful about this. The alternative is applying the codecvt facet directly, which has a fairly ugly interface and requires static contiguous buffers. What we truly need is an iterator-based interface, that basically behaves like std::copy, or better yet, iterator adaptors that convert as you iterate. But that's not sufficient, you also need ways to segment strings (graphemes, words, sentences), do normalization, case conversion, etc. None of which are nowhere near possible in C++0x. > >> The only thing it lacks that I see as a > >> substantial issue is UTF-16 and/or UTF-32 iostreams. This is > >> unfortunate, as both Windows and modern Unix support such files at the > >> OS level. > > >basic_istream<char16_t> etc. should work just fine. > > That will read or write a UTF-8 file, not a UTF-16/UTF-32 file. The > specification is quite clear - it is required to apply the appropriate > codecvt facet. That's not a problem at the stream level, but at the filebuf level. File streams invoke codecvt facets to convert from their type to char because filebufs are char-based. > >So GCC, the most widely used C and C++ compiler, is not a decent > >development environment? > > It is not a development environment at all, it is just a compiler. A > development environment includes build configuration, syntax-aware > editing, syntax-aware searches and so on. Looks like you only know the world of software development as you see it through your Microsoft Visual Studio window. > >As was clearly stated in the parent message, GCC only supports > >inputting unicode characters in identifiers as escape codes. > > I understand. I also do not consider GCCs C++0X support complete as of > now. You said that any decent development environment that exists supports it NOW. I'm just putting you in fact of your inaccurate statements. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Martin B. on 24 Jul 2010 23:30
On 23.07.2010 23:43, Mathias Gaunard wrote: > On Jul 23, 12:49 am, "Martin B."<0xCDCDC...(a)gmx.at> wrote: > >> * No unicode aware string class > > And exactly what would it do that would be of any use? > Like, make working with "normal" strings (as opposed to performance relevant data-crunshing strings) a no-brainer? Like, provide a clear, easy and efficient interface to work with unicode strings. Clear like: * If I have an object of such a class I *know* it is a valid unicode string and not some locale-, system-, or implementation-defined character array mumbo jumbo. * No way to implicitly convert it to and from any character (array) type without clearly specifying what encoding to use for this. Efficient and easy like: * The internal representation is configurable and it's efficient to extract a primitive-type-array of the internal represenation but the normal joe-programmer doesn't have to care about the internal represenation. cheers, Martin -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |