I Have NO IDEA how this Line of Code Works

Hello, I am currently reading an article online about how virtual is implemented in C++ using a vtable. The author of the article provides an explanation by creating an "implementation" of virtual in C, not C++. Given that I have little experience with C at all, some of the syntax here confuses me. Here is the article:

http://blog.httrack.com/blog/2014/05/09/a-basic-glance-at-the-virtual-table/

My main problem lies in the following function:

1
2
3
4
5
6
void CB_print(struct B *this) {
  // we know that this points after the struct A layout
  // so let's fix the pointer by ourselves!
  size_t offset = (size_t) &((struct C*) NULL)->b;
  C_print((struct C*) ( ( (char*) this ) - offset ) );
}


I'm guessing to C programmers this looks like a piece of cake but I have no idea what I am looking at. I have never had to manipulate raw pointers like this in C++ so I need some help. Namely, what exactly does this line do?

size_t offset = (size_t) &((struct C*) NULL)->b;

Looking at this, I am guessing it does the following. First it takes a NULL (0) value and casts it into a struct C*. In C++, I guess this would be the same as a struct C nullptr? Anyways, it then uses the arrow (->) operator to reach a data member of struct C, which is b.

How is this possible? I thought it is not possible to dereference null pointers? And then it takes the address of all that garbage and then finally converts it to (size_t). I still have no idea what this accomplishes or what value offset will hold at the end of this statement. The following line also has me completely stumped:

C_print((struct C*) ( ( (char*) this ) - offset ) );

Looking back at the definition of C_Print, the function has a single parameter, a struct C*. This line does some crazy casts to char* for some reason and subtracts offset (?) and then converts it to struct C* to pass as the argument for C_Print? What exactly does all of this mean?
Yes, this is a tad confusing. I'm not convinced an early student of virtual functions needs to see this "under the hood" stuff, but...I'm of two minds here. My early C++ study was in the 80's, and I knew C for years at the time. I would have enjoyed this, but for students learning C++ without that background, it's a bit much.

This:


size_t offset = (size_t) &((struct C*) NULL)->b;

Forms an offset (hence the name). It "pretends" that a struct c * is at location zero, but then gives you the address of the member "b". That works on an actual object like:

& ( c->b )

Which might be a little more obvious, this is the address of "b" within the struct "c".

Using a zero for the address of "c", therefore, gives us the offset, or you might say distance from the beginning of the struct to the member "b".

This is not a dereference. This would be:

a = c->b;

If "c" were zero, this would crash. However, that happens because of the "=" operator, causing the compiler to generate code to reaching into the structure and read the supposed value at "b".

However, "& ( c->b )" is merely calculating an address. No attempt was made to get data from that address, so it isn't a dereference of "c".

There is C++ syntax to do that more cleanly, but this is a "C" example.


C_print((struct C*) ( ( (char*) this ) - offset ) );

Looking back at the definition of C_Print, the function has a single parameter, a struct C*. This line does some crazy casts to char* for some reason


Crazy casts are why we become C++ programmers ;)

Well, here what is happening is a calculation is being performed using offset found previously. We now know where "b" begins. I have no idea what type "this" is, but I assume it's a base to struct C?

If so, I assume "b" is the first variable in struct C.

There is some distance (an offset) between the data in a base to struct C, and the beginning of struct C - that offset is likely that distance (I'm assuming, I think more material from your source is required to know).

That said, the "char*" cast is designed to make "this" look like an array of bytes. This way address calculations look like simple integer subtraction. Once that new address is calculated, it is the position of the struct C (presumably), and so the result is cast to a struct C * for use.

This kind of thing is known to be "dangerous", in that most bugs from "C" are related to this "crazy casting".

Like I said, I think it is for a particular audience, like I was in the 80's - a C programmer interested in these internals about the vtable.


Forms an offset (hence the name). It "pretends" that a struct c * is at location zero, but then gives you the address of the member "b".


I don't follow. How can it "pretend" that a struct c* is at location 0? So after this line there actually isn't anything at location 0? How then does it access "b" if "b" doesn't actually exist?

Well, here what is happening is a calculation is being performed using offset found previously. We now know where "b" begins. I have no idea what type "this" is, but I assume it's a base to struct C?


this is actually a struct B*

There is C++ syntax to do that more cleanly, but this is a "C" example.


Just curious, what would be a cleaner way to do it in C++?
Last edited on
I don't follow. How can it "pretend" that a struct c* is at location 0?


Casting. There is nothing "sacred" about 0. It is invalid to use zero as a memory location for reading, but as was pointed out earlier, the expressions being used do not read from the structure, so the location zero is never used.

Think of it this way. Let's say you have a C;

1
2
3
4
5
6
7
C c;

auto cptr = &c;
auto bptr = &(c.b);

auto vect_to_b = (char *)bptr - (char *)cptr;


What do you get with vect_to_b? It is the distance in bytes from the beginning of c (by its address) to the address of the member b (by its address)

Now, in any of that, did the compiler generate code which reads from "c" or "b". No. It didn't. It does, however, happen that c is a valid instance in this example, not located at zero.

But what would it have mattered? Since it did not read from the instance, all it calculated was the distance between two positions in memory.

It would not matter where "c" is located. It's memory location has no effect on the resulting calculation. vect_to_b is only calculating the distance between the two positions.

That is why the code you're asking about works, because the location of "c" would not matter, and so "zero" is as good as any choice to use for code that performs the subtraction you see in my example.

However, the example you posted does not perform a subtraction from two positions. It only shows the distance to the member "b". Consider what you get from the two addresses in my example, cptr and bptr. What are they? The first is where "c" was formed, and the second - this is the key part here - the second (which is "bptr"), will be at some distance away from "cptr". The subtraction gives you how far.

Now, what if "c" formed at zero. Just suppose. If it had, then &cptr would be zero. What would bptr be? Zero plus the distance to "b". What would the operation producing vect_to_b do? It would subtract zero from bptr, which is otherwise a useless task that can be skipped because "c" started at zero.

That's what your code expression does. It skips the useless subtraction because it formed a "fake" or "supposed" instance at zero, which that code intends never to dereference or access. It is only using the address calculation to get an offset, a distance. It uses the same idea as the example I posted, knowing it can skip the subtraction because it would be a subtraction of zero from the member address.

look up "offsetof"..it can be used like this:

int x = offsetof( C, b );

It gives you the same answer as the code you've been asking about.

Oh, and I think it's a "C" macro, not a C++ macro...there are various ways to express this concept.


There is also this curious option called a pointer-to-member operator:

1
2
3
4
5
auto mptr = &C::b; // note CAP C - the class type, not "c", the instance

C c;

c.*mptr = 0;


If "b" is an int, that might look like:

1
2
3
4
5
int C::*mptr = &C::b;

C c;

c.*mptr = 0;


Which, of course, could give us the address:

auto bptr = &( c.*mptr );

or maybe something like

1
2
3
4
5
C *c = nullptr; // for clarity that c is zero

auto mptr = &C::b;

auto offset = &(c->*mptr);


Which you may, again, find maddening :)

Note, too, that "mptr" is in a class of "special" pointers that may not actually be the size of "common" pointers. They are implementation specific, which means they work on various compilers, but may look quite different on each compiler.

Also, to be clear, "mptr" is a pointer type known as a pointer to member. It is from a "class" perspective, not the perspective of an instance (hence the way it is assigned above using "C", not "c").

Its use in "c.*mptr" and "c->*mptr" above is what's known as the "pointer to member operator", where the .* and ->* has a "special" meaning, an operator of a specific kind applicable only to such pointers. What appears before the ".*" and "->*" provides an instance upon which the operator can be applied.

Last edited on
Thank you for the response! I understand it now.
Just for practice, I made a version that should do the same thing but written differently. Please let me know if this would achieve the same result:

 
 size_t offset = ( (size_t)&( (struct C*)this )->b ) - (size_t)this;


where this is a struct B* not a struct C*.
Last edited on
Well, close, but not quite.

Pointer subtraction takes into account the type of the pointer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
 char *p1{ "abcde" };
 char *p2 = p1 + 3; // point to 'd'

 size_t offset_chars = p2 - p1;


 int i1[]{ 1, 2, 3, 4, 5 };

 int *ip1 = &i1[0];
 int *ip2 = ip1 + 3; // points to entry of '4'

 size_t offset_ints = ip2 - ip1;
 size_t offset_intbytes = (char *) ip2 - (char *) ip1;


both "offset_chars" and "offset_ints" will be 3.

However, ints are larger than chars. The distance given is not the distance in bytes, but of integers in the case of offset_ints.

For offset_intbytes, the pointers have been cast to char *, under the assumption that a char is one byte long. Now the subtraction yields the distance in BYTES (12 on my machine).

This

size_t offset_nonesense = ip2 - p1;

If added to the snippet above would refuse to compile because the pointers are not of the same type, so the subtraction seems meaningless to the compiler.

So, without handing you the answer, what casts do you think are required to get an offset in BYTES for the code you posted?

Last edited on
char* right? But I still don't see why the code I wrote is wrong. For example, in the example you gave me, you could also do the following:

size_t offset_intbytes = (size_t) ip2 - (size_t) ip1;

If you replace the char*'s with a size_t, you are basically taking the raw memory addresses that the pointers hold and subtracting them. Thus, you are working with individual bytes, not "numbers of ints." Running this in the compiler you get, just like you said, 12 as well. If that's the case, isn't the way I wrote the offset legal since it gets you the same value since I am basically casting the pointers to size_t's and subtracting the memory addresses? I get the same exact value:

size_t offset = ( (size_t)&( (struct C*)this )->b ) - (size_t)this;
Yes, you've cast the numeric value of the pointers into a type of integer, and performed integer math on them.

There's merit to that case.

I misread what you posted (I'm actually replacing the timing belts on my car...was a bit distracted)

Topic archived. No new replies allowed.