From: Seungbeom Kim on
Walter Bright wrote:
> Nemanja Trifunovic wrote:
>> But what woud you expect? The user simply must take the string encoding
>> into consideration when doing string operations like that. If s1
>> contains a string in some multibyte encoding the user must be aware of
>> it. This is not specific to utf-8.
>
> If it supported utf-8, I would expect things like encoding and decoding
> of utf-8 to work. std::string right now offers nothing for the utf-8 user.

Then, is this close to something you have in your mind?

utf8_string s("\xED\x95\x9C\xEA\xB8\x80");
// s has U+D55C U+AE00
utf8_string::iterator i = s.begin();
// i points to the U+D55C
wchar_t c = *i; // c gets 0xD55C
*i = 0xA9 /* or L'\xA9' */;
// s has U+00A9 U+AE00; i.e.
// s is "\xC2\xA9\xEA\xB8\x80" now

It would practically be nice to have something like this in the standard
library (and I'm sure there are some libraries out there which do this),
but I immediately begin to wonder "Why only UTF-8, when even ASCII is
not mandated? Why not anything else?", and also "Do we really need that
many string classes?". Maybe should std::string be able to take a
customizable conversion scheme (a.k.a. "charset"), � la allocators,
that allows storage in the encoded form and other processing in the
decoded form?

(This makes me also wonder lots of weird cases such as what the decoded
form would mean for UTF-8 in an EBCDIC-based implementation, but I
cannot think of it more/better now...)

I'm not either sure which is better, to convert the strings over the
I/O boundaries and treat them as wide-character strings internally
(what the current C, C++ standards provide), or to have the strings
in the same form internally as the external (say UTF-8) and perform
the encoding/decoding on the fly whenever necessary.

--
Seungbeom Kim

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Peter Dimov on
Walter Bright wrote:

> Here's what Digital Mars C++ does, which implements C99 complex numbers:
>
> ------------------ program ------------------
> #include <complex.h>
>
> complex long double f( complex long double c )
> {
> return c;
> }
> ------------------- asm ---------------------------
> ?f@@YA_W_W@Z:
> fld tbyte ptr 4[ESP]
> fld tbyte ptr 0Eh[ESP]
> ret
> --------------------------------------------------

So your point is that having complex as a built-in allows you to define
an ABI that returns it in ST0:ST1 (but you still pass it on the stack.)
I admit that it's unlikely for any ABI to return UDTs in registers.

It's hard for me to imagine a situation where this could make a
substantial difference. For it to matter, the function must not be
inlined but must still be short enough and its results must not be
stored into memory (to call another such function, perhaps) but used
for a computation. I could be wrong.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: LR on
Walter Bright wrote:


>
> 1) Extend std::string to support UTF-8.

I'm not sure why you'd want to do that. Wouldn't it make more sense to
use std::basic_string?

> 2) Extend std::string to support strings with embedded 0's.

I was under the impression that std::string does this already. Perhaps
awkwardly.

LR


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Walter Bright on
Nemanja Trifunovic wrote:
> That's orthogonal to what we are talking about here. I am saying that
> std::string can be used for storing utf-8 encoded strings and doing
> string operations on them. In fact, I'm doing that all the time and it
> works just fine. The need to recognize new types comes from the fact
> that we are moving away from the "legacy encodings" to Unicode and it
> is a good idea to have specialized types that would be less general
> than std::string, but would offer some Unicode-specific functionality
> (conversions, etc).

I agree, but it just reiterates my point that std::string cannot be
extended to support Unicode and do Unicode things with it. A new type
has to be created.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Walter Bright on
kwikius wrote:
>> Yes, you can do
>> that with D, and there's at least one person doing so (Oskar Linde).
>
> I am always interested in physical quantities libraries. Is there a
> link to some documentation, to compare it with my C++ version?

I have an early version of it I use for testing, but I don't know the
current state of it, or if Oskar intended it for general use or his own
purposes.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]