From: Seungbeom Kim on 28 Nov 2006 22:38 Walter Bright wrote: > Nemanja Trifunovic wrote: >> But what woud you expect? The user simply must take the string encoding >> into consideration when doing string operations like that. If s1 >> contains a string in some multibyte encoding the user must be aware of >> it. This is not specific to utf-8. > > If it supported utf-8, I would expect things like encoding and decoding > of utf-8 to work. std::string right now offers nothing for the utf-8 user. Then, is this close to something you have in your mind? utf8_string s("\xED\x95\x9C\xEA\xB8\x80"); // s has U+D55C U+AE00 utf8_string::iterator i = s.begin(); // i points to the U+D55C wchar_t c = *i; // c gets 0xD55C *i = 0xA9 /* or L'\xA9' */; // s has U+00A9 U+AE00; i.e. // s is "\xC2\xA9\xEA\xB8\x80" now It would practically be nice to have something like this in the standard library (and I'm sure there are some libraries out there which do this), but I immediately begin to wonder "Why only UTF-8, when even ASCII is not mandated? Why not anything else?", and also "Do we really need that many string classes?". Maybe should std::string be able to take a customizable conversion scheme (a.k.a. "charset"), � la allocators, that allows storage in the encoded form and other processing in the decoded form? (This makes me also wonder lots of weird cases such as what the decoded form would mean for UTF-8 in an EBCDIC-based implementation, but I cannot think of it more/better now...) I'm not either sure which is better, to convert the strings over the I/O boundaries and treat them as wide-character strings internally (what the current C, C++ standards provide), or to have the strings in the same form internally as the external (say UTF-8) and perform the encoding/decoding on the fly whenever necessary. -- Seungbeom Kim [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Peter Dimov on 28 Nov 2006 22:41 Walter Bright wrote: > Here's what Digital Mars C++ does, which implements C99 complex numbers: > > ------------------ program ------------------ > #include <complex.h> > > complex long double f( complex long double c ) > { > return c; > } > ------------------- asm --------------------------- > ?f@@YA_W_W@Z: > fld tbyte ptr 4[ESP] > fld tbyte ptr 0Eh[ESP] > ret > -------------------------------------------------- So your point is that having complex as a built-in allows you to define an ABI that returns it in ST0:ST1 (but you still pass it on the stack.) I admit that it's unlikely for any ABI to return UDTs in registers. It's hard for me to imagine a situation where this could make a substantial difference. For it to matter, the function must not be inlined but must still be short enough and its results must not be stored into memory (to call another such function, perhaps) but used for a computation. I could be wrong. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: LR on 28 Nov 2006 22:47 Walter Bright wrote: > > 1) Extend std::string to support UTF-8. I'm not sure why you'd want to do that. Wouldn't it make more sense to use std::basic_string? > 2) Extend std::string to support strings with embedded 0's. I was under the impression that std::string does this already. Perhaps awkwardly. LR -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Walter Bright on 28 Nov 2006 22:50 Nemanja Trifunovic wrote: > That's orthogonal to what we are talking about here. I am saying that > std::string can be used for storing utf-8 encoded strings and doing > string operations on them. In fact, I'm doing that all the time and it > works just fine. The need to recognize new types comes from the fact > that we are moving away from the "legacy encodings" to Unicode and it > is a good idea to have specialized types that would be less general > than std::string, but would offer some Unicode-specific functionality > (conversions, etc). I agree, but it just reiterates my point that std::string cannot be extended to support Unicode and do Unicode things with it. A new type has to be created. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Walter Bright on 28 Nov 2006 22:51
kwikius wrote: >> Yes, you can do >> that with D, and there's at least one person doing so (Oskar Linde). > > I am always interested in physical quantities libraries. Is there a > link to some documentation, to compare it with my C++ version? I have an early version of it I use for testing, but I don't know the current state of it, or if Oskar intended it for general use or his own purposes. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |