struct nested in a union nested in a struct

I'm reading about the OVERLAPPED struct:

https://docs.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped

I know that a struct is essentially a class with members whose access modifier defaults to 'public'. A 'union' is a type that assumes the byte-size of the largest type declared in its definition (a region of memory, shared by all declared types, whose compile-time type depends on the context used.

I wanted to check that I understand this piece of code correctly (I use to code in C++ but I've been dabbling with Python and C# so much lately that this looks a little foreign to me):

1
2
3
4
5
6
7
8
9
10
11
12
typedef struct _OVERLAPPED {
  ULONG_PTR Internal;
  ULONG_PTR InternalHigh;
  union {
    struct {
      DWORD Offset;
      DWORD OffsetHigh;
    } DUMMYSTRUCTNAME;
    PVOID Pointer;
  } DUMMYUNIONNAME;
  HANDLE    hEvent;
} OVERLAPPED, *LPOVERLAPPED;


1. _OVERLAPPED is an alias for a `struct` that's been created by typedef.
2. The declaration initialized one _OVERLAPPED struct called "OVERLAPPED" and one pointer to an _OVERLAPPED struct called "LPOVERLAPPED".
3. _OVERLAPPED contains 4 members: "Internal", "InternalHigh", "DUMMYUNIONNAME", and "hEvent".
4. DUMMYUNIONNAME defines two members: "DUMMYSTRUCTNAME" (an anonymous struct) and "Pointer".
5. DUMMYSTRUCTNAME defines two members: "Offset", "OffsetHigh".
6. DUMMYUNIONNAME, being a Union, can either assume the type of 'PVOID' or 'DUMMYSTRUCTNAME" so you can use it like this:

1
2
3
4
PVOID val = OVERLAPPED.DUMMYUNIONNAME.Pointer; // Should be ok
struct val = OVERLAPPED.DUMMYUNIONNAME.DUMMYSTRUCTNAME; // Is this valid?
DWORD val = OVERLAPPED.DUMMYUNIONNAME.DUMMYSTRUCTNAME.Offset; // Should be ok
DWORD val = OVERLAPPED.DUMMYUNIONNAME.DUMMYSTRUCTNAME.OffsetHigh; //Should be ok 


1
2
3
4
struct _OVERLAPPED foo; //an object of class _OVERLAPPED
OVERLAPPED bar; //another object, same class as foo
LPOVERLAPPED ptr; //a pointer
ptr = &bar; //it may point to _OVERLAPPED objects 
that's C syntax
althought valid in C++, it may be simplified to
1
2
struct OVERLAPPED{/**/};
OVERLAPPED foo, *ptr; //note, no struct at the start here 

so your point 2 is incorrect, and point 6 may be something like
1
2
DWORD val = foo.DUMMYUNIONNAME.DUMMYSTRUCTNAME.Offset;
//struct val = OVERLAPPED.DUMMYUNIONNAME.DUMMYSTRUCTNAME; //afaik, can't be done 
also, remember that you can access just one of the members of the union
Last edited on
I see. So the typedef creates an alias for struct _OVERLAPPED and OVERLAPPED is the name of that alias. Similarly, LPOVERLAPPED is a pointer to a struct _OVERLAPPED.

also, remember that you can access just one of the members of the union

I don't quite understand this advice. You should be able to access (read value from/assign value to) the union value through any of its member. I can see how if write-access isn't controlled, race-conditions or bugs can occur.
What @ne555 is referring to is common but not always a factual point.

Perhaps a union is created like:

1
2
3
4
5
union 
{
 long l;
 double d;
} s;


The union could hold either a long or a double. If it holds a double, it makes little sense to reference the long member, and if it held a long it makes little sense to reference the double. You can, but there's no utility in doing so.

On the other hand, one might try

1
2
3
4
5
6
7
8
union
{
 struct { uint32_t low;
          uint32_t high;
        };

 uint64_t v;
} s;


In this situation, the real purpose of the union is v, the 64 bit integer. It may be important on some machines to be able to access the low and high 32 bit words of this 64 bit value, and so in this case it makes some sense to use either form, depending on the situation.

Last edited on
I was under the impression that doing
1
2
s.v = 42;
foo = s.low;
is undefined behaviour, as you are not accessing the last set member, but another one.
@ne555,

Not at all.

One of the "classic" uses of this kind of union was for some of the Windows API interfaces dealing with 64 bit numbers for the filesystem when building targets on 32 bit machines using:


1
2
3
4
5
6
7
8
9
10
11
typedef union _ULARGE_INTEGER {
  struct {
    DWORD LowPart;
    DWORD HighPart;
  };
  struct {
    DWORD LowPart;
    DWORD HighPart;
  } u;
  ULONGLONG QuadPart;
} ULARGE_INTEGER, *PULARGE_INTEGER;


In all use cases, the "LowPart" and "HighPart" were merely 32 bit compatible "views" into the "QuadPart", which is a 64 bit unsigned integer. It was used to allow 32 bit targets "build" the 64 bit unsigned integer that the Windows API would ultimately use.

What is would represent undefined usage of the type we're discussing is something like:

1
2
3
4
5
union s
{
 int v;
 double r;
};


Yet, this is really no different than the undefined behavior of something like:

1
2
3
4
5
float r;
int v;

v = *(int *)&r;


In other words, the members of the union are like pre-cast versions of a piece of memory. If the cast would result in something peculiar, it may be of no value to reference more than one pertinent "version" in the union (like a "variant" union). There's no defined result of casting a double pointer to an int * and dereferencing it. The bits of the float would copy to the integer, but the value in that integer would be meaningless. The best one might try to do, more logically with an unsigned integer, is to use bit testing to see if the float is a NAN or INF value (which is better done by standard library functions anyway, but it otherwise a reasonable way to test a float).

So, the union of an int and a double would really be an "either/or" kind of scenario like you're thinking.

So, what happens if one casts something like:

1
2
uint64_t b = 6400;
uint32_t a = *(int *)&b;


This doesn't mean that "a" gets a meaningful value in most contexts, but gets one half of the bits of the 64 bit integer, as in the Microsoft example. It isn't exactly undefined, but it isn't the way to cast, say, a long long to an int.

The reverse, however, may be a way to build a 64 bit integer out of two 32 bit components, as in the Microsoft example above.

Another "classic", which isn't much used these days, is for a typical 8 bit per channel pixel with alpha

1
2
3
4
5
6
7
8
9
10
union Pixel
{
 struct { unsigned char r;
          unsigned char g;
          unsigned char b;
          unsigned char a;
        } Channels;

 uint32_t v;
}; 


In this style, v is a 32 bit unsigned integer representing the entire package of rgba. Accessing the channels is hardly different than using "bit fiddling" to access those parts, but without having to fiddle with bits (masking and shifting).

The primary use case I recall was so that moving pixels was moving unsigned 32 bit integers (faster than moving 4 chars), while still offering rgba breakdown of the package (which is to say, both are used, not an either/or scenario).
> I was under the impression that doing
1
2
> s.v = 42;
> foo = s.low;
> is undefined behaviour, as you are not accessing the last set member, but another one.

Yes. As per the standard, it is undefined behaviour.
It does not fall under one of the well defined use cases (standard-layout structs that share a common initial sequence / examining an object as a sequence of char, signed char or unsigned char).

Though many (most common) compilers permit type-punning (of standard layout types) through unions as a non-standard extension to the language.
The problem with v = *(int *)&r is distinct: it's not that the result is meaningless, but that it violates strict aliasing and therefore exhibits undefined behavior. If the "expected" behavior is required in this case, ask for non-conformance at a performance penalty by passing -fno-strict-aliasing to your compiler.
Last edited on
The union could hold either a long or a double. If it holds a double, it makes little sense to reference the long member, and if it held a long it makes little sense to reference the double. You can, but there's no utility in doing so.


I absolutely disagree with this, 100%.
The 'union hack' replaces casting pointers to get at raw bytes, if you put a unsigned char array in the union, you can touch any byte of any other member, without having to do pointer casting or other approaches. There are times when that is exactly what you wanted to do. There are also some useful tricks, like comparing small strings as 64 bit ints vs comparing 8 letters one by one, or morphing various types of data into integers for hashing type functions. For example, taking that double to a 64 bit int and doing some sort of manipulation on it to turn it into an array index in a hash table.

None of that is 'the only way to do it' or even always the 'best way to do it' but they are also small, simple examples. The point is, sometimes, morphing is exactly what you want to do.
Last edited on
@jonnin,

I think you misunderstood the context of that quote. I say that because in your discussion, you don't cover a union of doubles and longs.

Did you, for example, see the union Pixel example? It does what you're describing here.

What I was pointing out is that examining the bits of a long has no utility when cast to a double, or the reverse, though I go on to point out that the one thing examining the bits of a double cast as a long (well, 64 bit integer, longs are not universally the same size) could be used to use "bit fiddling" to determine if the double is a NAN or INF - which is actually the kind of thing you're saying is done.

I'm not actually clear that we disagree at all.

Unions are used, as I put it earlier, to "pre-cast" multiple "views". I'm not sure how that disagrees.

My point is that something like the old variant union Microsoft provided for Visual Basic interfacing offered several useless "views" given certain contexts. Variant, for those who don't remember, was a huge union with every basic type included. VB used a type of variable storage that could be just about anything - a single variable that didn't really have a type per se, but could be any type. The variant union had "views" for everything from integers to doubles to string pointers. The programmer merely had to pick the right view.

So, if the variant offered a "double" view, but the underlying value was actually an int, looking at the variable as an "int" made no sense.

As @JLBorges, technically some unions provided undefined behavior. The Pixel union above, for example, works as expected, because there is no expectation that the 32 bit value has any meaning as an integer, but as a package of 4 color channels which can be moved as a unit. C++ does nothing to guarantee what the integer would look like, so there is "undefined" behavior where that integer is used as an integer and not 4 color channels. Indeed, the value of that integer for a given pixel would change on different platforms over endianness.

The problem with many unions is exactly that point. There are some few arrangements, as @JLBorges points out, that exhibit fully defined behavior. Many (perhaps most) unions exhibit undefined behaviors, like those I point out where the union provides incompatible or unstable views of the same bits.

@ne555's point, which started this segment of the thread, was to suggest the reason for undefined behavior was merely the order of access. From that post

is undefined behaviour, as you are not accessing the last set member, but another one.


So, this suggests that unions can only be used, with defined behavior, if the last set "view" is the one read later.

This isn't accurate because that isn't the reason for the undefined behavior. The undefined behavior must be understood in the context of what those two different views are, not merely that they are not the same view.

My counterpoint (not really yet posted after @JLBorges) is that while there are technical merits to calling some union usage as "undefined" behavior, it isn't in that class of "undefined" where expect serious crashes, like deleting on a pointer twice, or running off the end of an array.

Like you said, sometimes the union expresses an intent, which ignores the well defined behaviors of C/C++, precisely because of the nature of the goal, like touching various bits that are otherwise even more work (with casting and pointer 'magic') to accomplish.

The Overlapped structure at the top of this discussion was Microsoft specific notion, as was the "large integer" union they used for years are an example of both points at once, the notion of utility of this thing and of undefined behavior associated with it.

The large integer union I posted above is a bit clearer example on the point. Microsoft has been implemented mostly on x86 platforms, but there was a PowerPC version for a time, which has opposing endianness. Some code using the large integer union would likely have different behavior on those PowerPC versions, due to the fact C/C++ does not define the order of the bits in 64 vs 32 vs 16 vs 8 bit components. Code which built up a 64 bit integer using that union would work find on any x86 platform, but when switching to a target of opposing endianness, the code wouldn't make much sense.



I think you misunderstood the context of that quote. I say that because in your discussion, you don't cover a union of doubles and longs.


I did in a roundabout way --
or morphing various types of data into integers for hashing type functions
. Eg taking a double's bits as if an integer and using that to make a table index or something...

that said, maybe we do agree :) I wish c++ had embraced the de-facto union access. Its prettier than pointers.
I wish c++ had embraced the de-facto union access. Its prettier than pointers.


Yep, we do agree more than not!
Topic archived. No new replies allowed.