编程语言
首页 > 编程语言> > 将union字段中的位解释为C/C++中的不同数据类型

将union字段中的位解释为C/C++中的不同数据类型

作者:互联网

我试图访问Union位作为不同的数据类型.例如:

    typedef union {
    uint64_t x;
    uint32_t y[2];
    }test;

    test testdata;
    testdata.x = 0xa;
    printf("uint64_t: %016lx\nuint32_t: %08x %08x\n",testdata.x,testdata.y[0],testdata.y[1]);
    printf("Addresses:\nuint64_t: %016lx\nuint32_t: %p %p\n",&testdata.x,&testdata.y[0],&testdata.y[1]);

输出是

uint64_t: 000000000000000a
uint32_t: 0000000a 00000000
Addresses:
uint64_t: 00007ffe09d594e0
uint32_t: 0x7ffe09d594e0 0x7ffe09d594e4

y指向的起始地址与x的起始地址相同.由于两个字段使用相同的位置,x的值不应该是00000000 0000000a吗?

为什么不发生这种情况?内部转换如何在具有不同数据类型的不同字段的联盟中发生?

需要做什么来使用联合以与uint64_t中相同的顺序检索精确的原始位作为uint32_t?

编辑:
正如评论中所提到的,C给出了未定义的行为.
它在C中如何工作?我们真的可以这样做吗?

解决方法:

我将首先解释您的实现中会发生什么.

您正在uint64_t值和2个uint32_t值的数组之间进行类型惩罚.根据结果​​,您的系统是小端,并且很乐意通过简单地重新解释字节表示来接受类型惩罚.并且0x0a的字节表示为小端uint64_t是:

Byte number  0    1    2    3    4    5    6    7  
Value        0x0a 0x00 0x00 0x00 0x00 0x00 0x00 0x00

little endian中的最低有效字节具有最低地址.现在很明显为什么uint32_t [2]表示为{0x0a,0x00}.

但是你所做的只是在C语言中是合法的.

C语言:

C11表示为6.5.2.3结构和工会成员:

3 A postfix expression followed by the . operator and an identifier designates a member of
a structure or union object. The value is that of the named member,95) and is an lvalue if
the first expression is an lvalue.

95)说明明确说:

If the member used to read the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type
as described in 6.2.6 (a process sometimes called ‘‘type
punning’’). This might be a trap representation.

因此,即使注释不是规范性的,它们的目的是明确标准应该被解释的方式=>代码是有效的,并且在定义uint64_t和uint32_t类型的小端系统上定义了行为.

C语言:

C部分更严格. C17的草案n4659在[basic.lval]中说明:

8 If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined:56
(8.1) — the dynamic type of the object,
(8.2) — a cv-qualified version of the dynamic type of the object,
(8.3) — a type similar (as defined in 7.5) to the dynamic type of the object,
(8.4) — a type that is the signed or unsigned type corresponding to the dynamic type of the object,
(8.5) — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type
of the object,
(8.6) — an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic
data members (including, recursively, an element or non-static data member of a subaggregate or
contained union),
(8.7) — a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
(8.8) — a char, unsigned char, or std::byte type.

注释56明确地说:

The intent of this list is to specify those circumstances in which an object may or may not be aliased.

因为在C标准中从未引用过双关语,并且因为结构/联合部分不包含C的重新解释的等价物,这意味着在C中读取不是最后写入的成员的值会调用undefined行为.

当然,常见的编译器实现编译C和C,并且大多数它们甚至在C源中也接受C语言,因为gcc C编译器很乐意接受C源文件中的VLA.毕竟,未定义的行为包括预期的结果……但是你不应该依赖它来获取可移植的代码.

标签:c-3,c,bit-manipulation,unions
来源: https://codeday.me/bug/20190731/1587024.html