From: Le Chaud Lapin on 2 Nov 2009 04:26 Hi All, Do I get any guarantee that the unsigned version of a corresponding signed type is: 1. the same size as the signed version 2. equivalent to the signed version as far as the bit pattern is concerned I especially would like to know if the standard prevents the compiler from changing the bit pattern for the cast. Here are some unsigned/signed pairs: unsigned char/signed char unsigned int/signed int unsigned long int/signed long int TIA, [I realize this is silly/trivial question, but my book is not available right now.] -Le Chaud Lapin- -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: SG on 2 Nov 2009 13:57 On 2 Nov., 22:26, Le Chaud Lapin wrote: > > Do I get any guarantee that the unsigned version of a corresponding > signed type is: > > 1. the same size as the signed version In terms of the sizeof operator: yes. I'm not sure if that implies that the value representation uses the same subset of bits (as in "potential padding at the same places"). But I would be surprized if that's not the case for some implementation. > 2. equivalent to the signed version as far as the bit pattern is > concerned It is if the signed number is non-negative. In addition, it will also be the same bit pattern for negative numbers given the system uses 2's complement (as opposed to 1's complement or sign+magnitude). This directly follows from the "modulo rule": The conversion will obey equivalence modulo N where N is the number of bits in the target *unsigned* type. The converse is not guaranteed (conversion to signed) by the standard. But you can expect implementations that use 2's complement to make this guarantee. The conversion to signed int where the original value can not be represented is implementation-defined and thus has to be documented by the implementation. Cheers, SG -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Johannes Schaub (litb) on 2 Nov 2009 13:57 Le Chaud Lapin wrote: > Hi All, > > Do I get any guarantee that the unsigned version of a corresponding > signed type is: > > 1. the same size as the signed version > 3.9.1/3 in the Standard says yes - they have the same storage size making "sizeof" yield the same value. > 2. equivalent to the signed version as far as the bit pattern is > concerned > Standard says in the same paragraph: "the value representation of each corresponding signed/unsigned type shall be the same.", for all non-negative values of the signed type. > I especially would like to know if the standard prevents the compiler > from changing the bit pattern for the cast. > > Here are some unsigned/signed pairs: > > unsigned char/signed char > unsigned int/signed int > unsigned long int/signed long int > According to 4.7/2, converting signed -> unsigned is a mathematical operation. The resulting value is the least unsigned value congruent to the source integer (modulo 2**n with n being the number of bits in the representation of the unsigned integer). So the bit-pattern could change. For example, (unsigned int)-1 yields UINT_MAX, because the difference UINT_MAX - (-1) is divisible by 2**BITS_IN_UINT and is the least positive integer doing so. For two's complement, there is no change in the bit pattern. But for sign-magnitude, you go from "1000...01" to "1111...11", and for one's complement, you go to all-one from "1111...110" and so on. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Le Chaud Lapin on 3 Nov 2009 06:46 On Nov 3, 12:57 am, "Johannes Schaub (litb)" <schaub-johan...(a)web.de> wrote: > Le Chaud Lapin wrote: > > Do I get any guarantee that the unsigned version of a corresponding > > signed type is: > > > 2. equivalent to the signed version as far as the bit pattern is > > concerned > > According to 4.7/2, converting signed -> unsigned is a mathematical > operation. The resulting value is the least unsigned value congruent to the > source integer (modulo 2**n with n being the number of bits in the > representation of the unsigned integer). > > So the bit-pattern could change. For example, (unsigned int)-1 yields > UINT_MAX, because the difference UINT_MAX - (-1) is divisible by > 2**BITS_IN_UINT and is the least positive integer doing so. For two's > complement, there is no change in the bit pattern. But for sign-magnitude, > you go from "1000...01" to "1111...11", and for one's complement, you go to > all-one from "1111...110" and so on. Thanks for the clear explanation. So I can see one's complement might be an issue for what I am trying to do. I am trying to build a UNICODE Buffer object from a string given by const char *. [Please ignore the fact that the UNICODE Buffer is templated on 'C'.]: template <typename C> struct Buffer { unsigned int length_; C *pointer; Buffer (const char *string) { length_ = 0; while (string[length_]) ++length_; if (length_) { pointer = new C[length_ + 1]; for (unsigned int i = 0; i <= length_; ++i) pointer[i] = static_cast<C>(static_cast<unsigned char> (string[i])); } else pointer = 0; } } ; On platforms where type char is inherently unsigned, the static_cast in the code above was not necessary, because the type of 'C' is always unsigned in my design. But when type char is inherently signed, the cast is necessary because a char value exceeding 127 will be negative, sign extension will occur during bit-width-extension for conversion to an unsigned type, and unsigned type will become large positive value, per the congruence rule that you and SG mentioned. This value will, of course, not be the UNICODE value that I wanted, even when there is a complete match between the bit pattern for type char, and bit pattern for type 'C' when tyepof(C) == wchar_t. So without cast, "plus �a change, plus c'est la m�me chose" ...becomes "plus @a change, plus c'est la m(a)me chose" // @ = value > 255. I would like to take a char, and ensure that, when converted to whar_t, it always yields the proper corresponding unsigned value in wchar_t. I am permitted to assume 1-byte chars, but not 2's- complement. I thought of using reinterpret_cast on the char to force it to unsigned char, but wanted to get different opinions before trying that. -Le Chaud Lapin- -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Le Chaud Lapin on 7 Nov 2009 07:24
On Nov 3, 5:46 pm, Le Chaud Lapin <jaibudu...(a)gmail.com> wrote: > I would like to take a char, and ensure that, when converted to > whar_t, it always yields the proper corresponding unsigned value in > wchar_t. I am permitted to assume 1-byte chars, but not 2's- > complement. > > I thought of using reinterpret_cast on the char to force it to > unsigned char, but wanted to get different opinions before trying > that. Seems that these questions and so many others can be answered by the C+ +0x draft document: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2914.pdf A trivial fact for those who wonder, as I did, what means the 'x' in 'C ++0x': It is a place-holder for single decimal digit 0-9, indicating date of release of "specification", 2000-2009, respectively. Thanks to Seungbeom Kim and others for passively referring to the standard in my previous posts, which finally induced me to take a look. It is worth the and not as painful as I imagined it would be. :) -Le Chaud Lapin- -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |