Unions!

From my book:


The use of Unions


➤You can use it so that a variable, A, occupies a block of memory at one point in a program, which is later occupied by another variable, B, of a different type, because A is no longer required. I recommend that you don’t do this. It’s not worth the risk of error that is implicit in such an arrangement. You can achieve the same effect by allocating memory dynamically.

➤ Alternatively, you could have a situation in a program where a large array of data is required, but you don’t know in advance of execution what the data type will be — it will be determined by the input data. I also recommend that you don’t use unions in this case, since you can achieve the same result using a couple of pointers of different types and, again, allocating the memory dynamically.

➤ A third possible use for a union is one that you may need now and again — when you want to interpret the same data in two or more different ways. This could happen when you have a variable that is of type long, and you want to treat it as two values of type short. Windows will sometimes package two short values in a single parameter of type long passed to a function. Another instance arises when you want to treat a block of memory containing numeric data as a string of bytes, just to move it around.

➤ You can use a union as a means of passing an object or a data value around where you don’t know in advance what its type is going to be. The union can provide for storing any one of the possible range of types that you might have.



I don't exactly understand why it isn't recommended to use unions in the first and second reasons, is it just a matter of preference or is there a good reason behind it.

I don't even understand number 3, although I do understand 4.

I tried to do some research except I could only find threads of people complaining that a union is a remnant of c and why it even exists, except a lot of those reasons were very different from what I read. It was also something related to polymorphism, something I haven't learned about.

Hope someone can explain, thanks guys!
#4 is a practical use (CORBA is implemented this way, for example), although you need a way to identify which type is active in the transmitted union (look up "tagged union"). This usage has gone down in the recent decade, in favor of more robust protocols.

#3 is undefined behavior in both C++ and C, but the majority of the compilers allow this as a non-standard language extension. For example, there exist clever algorithms that do things to floating-point numbers by modifying their underlying bit representations. Such code might use union { double d; uint64_t n; }, write the double into d, modify n, and then read d again.

You might save some space with #1 and #2, in a very low-memory application where every byte is precious, which is, of course, an uncommon situation.
3) IP address is a 4bytes integer and usually passed as it, but you probably accustomed to see it as four octets or bytes. Using union you can interpret same value as uint32_t (for passing to and from functions) and as struct of four uint8_t (for outputting to user). As such struct is POD, which have some strict rules about memory placement, it will work.

2) Union cannot hold information about what data type was actually stored in it. And it doesn't woks well with template metaprogramming.

1) Breaks type safety too as (2), can (and by Murphy's law will) lead to errors. If you really need that some value occupy concrete space, use placement new.

Unions are rarely needed. You probably will never use them. But as there is some cases where goto is profitable, there is areas where unions is helpful.
Last edited on
@MiiNiPaa

1. What do you mean Breaks type safety?
2. And why wont it work well with template metaprogramming?
3. Also I still dont exactly understand #3, should I just ignore it for now?

@Cubbi
4. What exactly is COBRA? What robust protocols have replaced this use of unions?
3. Still dont understand exactly what the use of 3 is.

Thanks guys for these explanations!


Last edited on
closed account (zb0S216C)
Anmol444 wrote:
"I don't exactly understand why it isn't recommended to use unions in the first and second reasons, is it just a matter of preference or is there a good reason behind it."

For the first case, I think the author may have been referring to the "variant" concept. If this is the case, then it's perfectly OK -- in fact, it's the only way to implement a variant efficiently with memory constraints.

For the second case, if you had a large array, you'd allocate it on the heap anyway, but the implementation would need to know how big each element would be to ensure a sufficient amount of memory is allocated. On the flip-side, a programmer with common sense would parse the user's input to detect the type of memory you would set aside. There's no need for a union in this case.

Anmol444 wrote:
"I don't even understand number 3"

For the third case:

Book wrote:
"This could happen when you have a variable that is of type long, and you want to treat it as two values of type short."

Mother of God! What if "sizeof( short )" has a length that is less than half of "sizeof( long )"? If you're going to write this sort of information in a book, specify the compiler so that the reader can lookup information regarding the size of intrinsic data-types.

Book wrote:
"Another instance arises when you want to treat a block of memory containing numeric data as a string of bytes, just to move it around."

I don't know if the other users of this forum would agree, but that's an abuse of unions, in my opinion.

Book wrote:
"You can use a union as a means of passing an object or a data value around where you don’t know in advance what its type is going to be."

...but how would one know how much memory is enough to store a piece of data when the type is unknown? Surely if you're going to use a union in this case, you'd have some incline of possible types of data the user would pass?

Wazzak
Last edited on
@Framework
Just for conformation, by type of memory, you mean the type of variable right?

This book is written for Visual C++, so he assumes you are using VC++ as your compiler.

Also in order to treat a long as two short variables you would put half the bits in one short and half the bits in another right?

What does it mean, treat a block of memory containing numeric data as a string of bytes, just to move it around?


In a union you can only have the types you want in it so if you do put them in it you would know the size I guess. And why would you need to know the size, isn't that the compilers job?
Anmol444 wrote:
4. What exactly is COBRA?

CORBA. not COBRA. See wikipedia etc, it's a long story.

Wazzak wrote:
the author may have been referring to the "variant" concept

I'd say a variant is an example of #4 ("storing any one of the possible range of types"), but it is indeed hard to tell.
closed account (zb0S216C)
Anmol444 wrote:
"This book is written for Visual C++, so he assumes you are using VC++ as your compiler."

Which version?

Anmol444 wrote:
"Also in order to treat a long as two short variables you would put half the bits in one short and half the bits in another right? "

Yes, you could, but as I said, what if "sizeof( short )" has a length that is less than half of "sizeof( long )"?

Anmol444 wrote:
"What does it mean, treat a block of memory containing numeric data as a string of bytes, just to move it around?"

It's kinda hard for me to tell, but I'm interpreting it as this:

1
2
3
4
5
union Bad
{
  int Value_; // He did not express signed-ness, so I assumed signed.
  char String_[sizeof(int)]; // Again, signed-ness has not been expressed.
};

If what I assumed is correct, then s/he meant that "Bad::String_" could be used to manipulate the string-equivalent of "Bad::Value_" by arranging each byte as you see fit. I don't see the use of this as the character quivalent of each byte of "Bad::Value_" may not be a numerical character. Even so, arranging the bytes of "Bad::Value_" through "Bad::String_" will result in a seemingly arbitrary value.

Anmol444 wrote:
"And why would you need to know the size, isn't that the compilers job?"

Yes it is, but not in this case. Sometimes, the identity of some piece of storage may not be known during run-time, just like a piece of data pointed-to by a void pointer.

Wazzak
Last edited on
1. VC++ 11, the latest

2. Yea but it he made the book specifically for VC++ 11

3. Should I just ignore this one for now, im very confused? lol

4. True but most of the time it would.
closed account (zb0S216C)
Anmol444 wrote:
"Yea but it he made the book specifically for VC++ 11"

It doesn't matter. The size of intrinsic data-types are decided by the compiler, not the standard; the standard only guarantees minimum widths of each intrinsic data-type.

Anmol444 wrote:
"True but most of the time it would."

You could assume that a piece of unknown storage has some known size and/or type, but you have to also consider the possibility of unknown types and/or sizes.

Wazzak
VC++ is the compiler lol, visual studio is the IDE, VC++ is the compiler.

Ok

So for now should I just ignore 3?
@Framework
..but how would one know how much memory is enough to store a piece of data when the type is unknown? Surely if you're going to use a union in this case, you'd have some incline of possible types of data the user would pass?


But doesn't a compiler reserve enough memory for the largest sized type in the union? It does this for alignment reasons AFAIK. So if the union is declared as holding a char or a double, enough space is always set aside for a double.

Obviously, using a union doesn't save any memory space.

Unions seem to be a bit old fashioned these days - as Cubbi was saying .
@TheIdeasMan

Anything done with unions is also possible with everything else right? There is no performance gain or lose by using unions right?

So using them is more or less preference?
closed account (zb0S216C)
TheIdeasMan wrote:
"But doesn't a compiler reserve enough memory for the largest sized type in the union? It does this for alignment reasons AFAIK. So if the union is declared as holding a char or a double, enough space is always set aside for a double."

I know how unions work and that's not the point I was making. I went on to say that:

Myself wrote:
"...the identity of some piece of storage may not be known during run-time, just like a piece of data pointed-to by a void pointer."

In some scenario(s), the type and/or size of some piece of storage may not be known. Therefore, we cannot assume the type and/or size of the storage or else we'll have issues trying to store that information. Of course, we'd also have to assume the storage is actually valid.

With unions, however, lack of information about a piece of storage could mean that our union may not support the type of storage.

TheIdeasMan wrote:
"Obviously, using a union doesn't save any memory space."

I'd say that whether a union saves space or not depends on how you use it.

Wazzak
Last edited on
@Anmol

As Cubbi & MiiNiPaa were saying there are better ways of doing the same thing these days. Templates spring to mind.

So no I would not say it is a preference at all.
Last edited on
Oh ok, so its just better to just use the more modern methods?

Also is there at any point where I will HAVE to use a union?
closed account (zb0S216C)
You don't have to use a union. Unions are a unique construct, so there use may be warranted in some cases. Whether or not you need a union depends on your program's design and needs.

Wazzak
Topic archived. No new replies allowed.