Prev: localtime deprecated?
Next: bind guard ?
From: Eugene Gershnik on 20 Jun 2006 09:53 David J. Littleboy wrote: >> How >> the Unicode sequence in its editor is converted to this encoding is >> up to the application but a reasonable user expectation is that what >> looks like ? >> should be transmitted as ?. > > One man's "reasonable user expectation" is another's unacceptable > abomination. Just because you can't see a reason for transmitting as > two characters doesn't mean there isn't one. In particular, there are > a lot of combining characters in Unicode, most of which can't be > encoded in "Western European". And how is it relevant here? We are talking about a character that *is* present in ISO 8859-1 not any arbitrary character. It has two possible encodings in Unicode wich both map to the same one in ISO 8859-1. The normalization forms of Unicode are equivalent and encoding conversion should not depend on which one was used as a source. To provide C++ context correct encoding conversion should look something like basic_string<uchar> source = ...; basic_string<uchar> form_c = normalize_unicode(source); string result = convert(get_encoding("ISO 8859-1"), source); where uchar is UTF character of your favorite size or even wchar_t if it stores UTF on your platform. > So there simply isn't any general > solution to the problem. There is. See above. > Again, that's _your_ desire. There are a lot of other users out > there. Some of us speak an Oriental language or two, and realize that > all bets are off if you change encodings. Some of us speak multiple languages too and realize that the above is wrong. If my Unicode text contains _only characters compatible with the target encoding_ all bets are not off. -- Eugene [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: loufoque on 20 Jun 2006 09:52
Pete Becker wrote : > WCHAR has to be 2 bytes and store UTF-16 in little-endian format, I fail to see how the Win32 API can handle UTF-16. It looks like it can only do UCS-2. [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |