Endianness of padded scalar objects [Visual C]

Prev: why the different results?
Next: Endianness of padded scalar objects - Correction

From: David Lowndes on 25 Feb 2010 13:27

>You had one previously .-)

What exactly is it supposed to show?

Are the asserts supposed to be true or false?

As I mentioned, the results from that were exactly what I'd assume
them to be.

Dave

From: Ray Mitchell on 25 Feb 2010 13:49

Hi Igor,

I appreciate all of the time you've taken to converse with me regarding this
issue. I have commented on some of your comments again below. Although I
don't necessarily agree with your interpretations, it's probably due to my
own lack of knowledge on some of the topics.

Thanks,
Ray

"Igor Tandetnik" wrote:

> Ray wrote:
> > "Igor Tandetnik" wrote:
> >> Assigning to one member of the union and then reading another exhibits undefined behavior.
> >
> > Where did you get this information? Could you please refer me to the
> > appropriate section of the C standard that states this is the case?
>
> 6.7.2.1p14 The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time.

I don't agree that this makes reading a member that was not most recently
written undefined as long as the member being read shares all of its bytes
with the recently written member. Concerning the "older" members of a union,
6.1.6.2p7 of the standard says, "When a value is stored in a member of an
object of union type, the bytes of the object representation that do not
correspond to that member but do correspond to other members take unspecified
values, but the value of the union object shall not thereby become a trap
representation." When the compiler generates code to access the various
union members, that code merely accesses the appropriate bytes in the common
object and interprets them in the way appropriate to that member's data type.
The code to do this is "permanent" and does not change just because another
member was recently written. Instead, the access is made without any memory
of what might have happened to the object previously. As a result the values
of the bytes being read are exactly the values that were written. Of course,
if a 4-byte type float member were written but 4-byte type int member were
then read, the value of the int would be implementation dependent, but not
because the individual bytes were not the same values that were written, but
merely because of the difference in the representations of a float and an int.

>
> 6.2.4p2 The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, and retains its last-stored value throughout its lifetime. If an object is referred to outside of its lifetime, the behavior is undefined.

I agree totally, and the object in this case is the underlying memory common
to all members. But this is unrelated to the issue we're discussing since
the lifetime of the object does not end between the write and the read.

>
> The above should be sufficient, but, as an independent evidence of the intent:
>
> 6.5.2.3p5 One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible.
>
> By implication, if two members of a union are _not_ structures that share a common initial sequence, then you can't store one and then inspect the other, or the special guarantee wouldn't be needed.

I'm sorry but that's not how I interpret the implication this.

>
> > Logically, to me at least, since
> > all union members start at the same address, examining the bytes of only the
> > most recently written member via a character pointer should yield perfectly
> > valid results, and that is what I am doing.
>
> It would yield unspecified results:
>
> 6.2.6.1p1 The representations of all types are unspecified except as stated in this subclause.
>
> > And even if what you state is
> > true I could simply set a separate character pointer equal to the address of
> > the entire union and examine the individual bytes that way, thereby not
> > reading using another member.
>
> That you can do, and you don't need a union:
>
> long l = 1;
> char* p = (char*)&l;

Agreed. The union example was arbitrary and self-contained.

>
> There's a special dispensation for this:
>
> 6.5p7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
> ....
> - a character type.
>
> 6.3.2.3p7 A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type... When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
>
> However, this doesn't give you much, in view of aforementioned 6.2.6.1p1 - in general, you have no idea what to expect when looking at individual bytes of an object.

But my original example was not the general case. It merely set the value
of an integral type to a value of 1, and I believe that guarantees that only
the least significant bit will be a 1.

>
>
> I would concede the following: if you limit yourself to architectures with "unsurprising" representations (in particular, no padding bits as defined by 6.2.6.2p1), and the only uncertainty is whether the architecture is little- or big-endian, then you can detect this by inspecting an integer via a char* pointer as shown above. But you are relying on a lot of a-priori knowledge.

Yes, but I believe that we've diverged from the original endian swapping
issue into the issue of inspecting bytes. So, the endian swapping issue
still remains unresolved in my mind. For the time being at least I'll just
consider it an implementation dependent issue.

>
> However, you seem to be specifically concerned with machines that have "surprising" representation (such as padding bits within the value). I'm not even sure such beasts exist in nature. In any case, if you are so uncertain of the details of the architecture that you need to ask the question in the first place, I don't quite see how looking at individual bytes of the representation may enlighten you.
> --
> With best wishes,
> Igor Tandetnik
>
> With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
> .
>

From: Ray Mitchell on 25 Feb 2010 14:21

"David Lowndes" wrote:

> >Let's assume a fictional long double type that is 12 bytes large. However,
> >it doesn't need a 12-byte alignment but actually a 16-byte alignment
> >because the FPU says so.
>
> I could argue that in essence that makes the type really 16 bytes
> though.
>
> I'm still not convinced by fictional things. Does anyone have a real
> example that would truly illustrate this?

sizeof must report all bytes including padding since sizeof is defined to
report the number of bytes of storage used for an object. Try gcc 4 on a Mac
OS X implementation. sizeof reports 16 bytes but 4 of those bytes are used
for padding (and the padding bytes are not necessarily 0s either). As an
item of curiosity, I've found that if you simply assign one type long double
variable to another, the padding bytes don't get copied in the process. As
an example of why sizeof must include padding bytes, consider this: The
typical way to read/write a file involves the fread/fwrite functions, and
sizeof is often used to determine argument values:

long double x[DIM] = {...};
FILE *fp = ...open some file...
fwrite((void *)x, sizeof(*x), sizeof(x)/sizeof(*x), fp);
fread((void *)x, sizeof(*x), sizeof(x)/sizeof(*x), fp);

Since fread/fwrite have no idea of the actual data types they are
transferring, they would have no idea of whether or not to insert/discard
padding from the objects in the array for some data types but not for others.
Thus, the padding must be included in the sizeof report so it can be written
to the file by fwrite for later retrieval and insertion into the array by
fread.

Ray

>
> Dave
> .
>

From: Bo Persson on 25 Feb 2010 14:46

Ray Mitchell wrote:
> Hi Igor,
>
> I appreciate all of the time you've taken to converse with me
> regarding this issue. I have commented on some of your comments
> again below. Although I don't necessarily agree with your
> interpretations, it's probably due to my own lack of knowledge on
> some of the topics.
>
> Thanks,
> Ray
>
> "Igor Tandetnik" wrote:
>
>> Ray wrote:
>>> "Igor Tandetnik" wrote:
>>>> Assigning to one member of the union and then reading another
>>>> exhibits undefined behavior.
>>>
>>> Where did you get this information? Could you please refer me to
>>> the appropriate section of the C standard that states this is the
>>> case?
>>
>> 6.7.2.1p14 The size of a union is sufficient to contain the
>> largest of its members. The value of at most one of the members
>> can be stored in a union object at any time.
>
> I don't agree that this makes reading a member that was not most
> recently written undefined as long as the member being read shares
> all of its bytes with the recently written member.

But it does. There is only one member present in the union. You can't
read what is not there.

It is true that many (most?) compilers will allow you to do it, just
because it is common to do so. The language standard doesn't require
it though, so portability is not optimal.

Bo Persson

From: Igor Tandetnik on 25 Feb 2010 14:51

Ray Mitchell <RayMitchell(a)discussions.microsoft.com> wrote:
> "Igor Tandetnik" wrote:
>> 6.7.2.1p14 The size of a union is sufficient to contain the largest
>> of its members. The value of at most one of the members can be
>> stored in a union object at any time.
>
> I don't agree that this makes reading a member that was not most
> recently written undefined as long as the member being read shares
> all of its bytes with the recently written member. Concerning the
> "older" members of a union,
> 6.1.6.2p7 of the standard says, "When a value is stored in a member
> of an object of union type, the bytes of the object representation
> that do not correspond to that member but do correspond to other
> members take unspecified values, but the value of the union object
> shall not thereby become a trap representation."

This just says that the union shouldn't turn into something that the CPU would throw a hardware exception on (some architectures have bit patterns that cause the CPU to do so - known as "trap representations").

> When the compiler
> generates code to access the various union members, that code merely
> accesses the appropriate bytes in the common object and interprets
> them in the way appropriate to that member's data type. The code to
> do this is "permanent" and does not change just because another
> member was recently written.

Of course not. But the program that necessitates running this code exhibits undefined behavior. Consider:

int* p = malloc(sizeof(int));
*p = 1;
if (rand() % 2) {
free(p);
}
*p = 42;

Code that assigns 42 to *p doesn't change just because memory is freed. Nevertheless, if it was indeed freed, that line exhibits undefined behavior - it accesses an object whose lifetime has ended.

> Instead, the access is made without any
> memory of what might have happened to the object previously.

That doesn't make such access any more valid.

> As a
> result the values of the bytes being read are exactly the values that
> were written.

Not necessarily. The compiler can legally optimize away the assignment to one member of the union, seeing that the member is never read afterwards. See also

http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Optimize-Options.html#index-fstrict_002daliasing-542

(note that GCC doesn't perform this optimization in the simple case - only because there's too much invalid code in existence that would be broken by it). If the compiler does that, then no value is written, and the value read is random garbage.

>> 6.2.4p2 The lifetime of an object is the portion of program
>> execution during which storage is guaranteed to be reserved for it.
>> An object exists, has a constant address, and retains its
>> last-stored value throughout its lifetime. If an object is referred
>> to outside of its lifetime, the behavior is undefined.
>
> I agree totally, and the object in this case is the underlying memory
> common to all members.

Not quite. The union as a whole is an object, and each union member is itself an object:

6.2.5p20 A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type.

Remember also 6.7.2.1p14: "The value of at most one of the members can be stored in a union object at any time." Thus, one member object cannot possibly "retain its last-stored value" when another member is assigned to - the union can only hold one value at a time.

C++ standard states this more explicitly:

3.8p1 ...The lifetime of an object of type T ends when: ... the storage which the object occupies is reused...

> But this is unrelated to the issue we're
> discussing since the lifetime of the object does not end between the
> write and the read.

Lifetime of the union doesn't, but lifetime of the member object whose storage has been hijacked does.

>> However, this doesn't give you much, in view of aforementioned
>> 6.2.6.1p1 - in general, you have no idea what to expect when looking
>> at individual bytes of an object.
>
> But my original example was not the general case. It merely set the
> value
> of an integral type to a value of 1, and I believe that guarantees
> that only the least significant bit will be a 1.

What is the basis for this belief? It is my turn now to demand chapter and verse.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: why the different results?
Next: Endianness of padded scalar objects - Correction