Question regarding custom "dynamic_cast" implementation

I'm not sure if this question should go here or in the beginner forum, but here goes.

I was watching a CppCon video that explains a custom "dynamic_cast" implementation. The speaker goes over every point, from basic polymorphism to how virtual is implemented under the hood (here is the video for those interested: https://www.youtube.com/watch?v=QzJL-8WbpuU&t=2331s )

Near the end of the video, he gives an implementation of a custom dynamic_cast which he calls "dynamicast." This is a function from that implementation that is called when dynamicast is to convert to a void pointer to the most derived object:

1
2
3
4
5
6
7
void* dynamicast_to_mdo(void* p)
{
    uint64_t* vptr = *reinterpret_cast<uint64_t** >(p);
    uint64_t mdoffset = vptr[-2];
    void* adjusted_this = static_cast<char* >(p) + mdoffset;
    return adjusted_this;
}


So basically what this code does is it takes p, which is a pointer to the subobject and takes the "mdoffset" which is the offset to the most derived object to create an "adjusted_this" and returns it as a void pointer. However, I am very lost at the syntax of it all. Namely this line:

uint64_t* vptr = *reinterpret_cast<uint64_t** >(p);

Firstly, why is the type "uint64_t"? Secondly, why are there so many '*'s in the reinterpret_cast line? Why is it casting to a uint64** instead of a uint64*?

Last edited on
I preface with a point that such "custom" casting has a limited value for study, but a highly dubious or useless real world applications. It would have application if your writing compilers, decompilers, debuggers...that kind of stuff.

Setting that aside now....

uint64_t is among the "standard" types intended to avoid ambiguity about the size of the old fashioned built in types like int, long and char. The size of those types vary by platform and compiler. Even char is not guaranteed to be 1 byte in size (I've always seen it to be 1 byte, from the 70's to the present, but there are limited, perhaps exotic, examples where it is not).

This particular choice, though, may not be the right choice. It assumes the CPU is 64 bits, that pointers are 64 bits in the target application/operating system. Where that is valid, it will work, but there is a type "uintptr_t" which morphs to fit the platform and target, to form an integer type sufficient to store a common pointer. I say common pointer because some pointers have various, implementation defined sizes (like pointers to members).

All pointer addition with integers takes the type of the pointer into account. If expressed as purely integer math, such that "p" is an unsigned integer (like uint64_t or uintptr_t), and p holds some memory address:

uintptr_t r = p + n * sizeof( type );

This is pseudo code. For any integer n, standard pointer math will imply a "sizeof" stride, or simply a multiplication of that integer by the size of the type. For example:

int * r = iptr + 4;

If iptr were 1000, the result in r will not be 1004, because integers are not 1 byte in size. If the integer is 4 bytes in size, the result would be 1016.

This:

uint64_t mdoffset = vptr[-2];

Treats vptr as a pointer to an array of uint64_t, which happens to be the size of a pointer. The author wants the value of an entry in this array, but the array doesn't point to the "0" element, it is already pointing to a position 2 entries down from that, so this odd approach works backwards into the array to get it. This could have been done by decrementing vptr by 2.

uint64_t mdoffset = *( vptr - 2 );

So mdoffset is a value in bytes (we're assuming the author knows the layout of the vtable under these circumstances). It expresses an offset, presumably, to the target base position being calculated.

So now, on to:

uint64_t* vptr = *reinterpret_cast<uint64_t** >(p);

"p" is a void *, which means it has no type applicable to pointer math. It is presumably a pointer to the object in question. There is a pointer at this location which points to the object's vtable. One could just read that pointer with a single indirection, as you've no doubt considered, with something like:

auto vpvalue = * ( reinterpret_cast< uint64_t *>(p) );

This would read the pointer stored at p. What type would vpvalue be? It would be a uint64_t, an unsigned 64 bit integer. It isn't a pointer in that context.

It is, however, the numeric value of that pointer, pointing to the vtable where the author knows an offset is held 2 entries back from that pointer.

The author could have, then, prepared a pointer so that pointer math would apply:

uint64_t * vptr = reinterpret_cast< uint64_t * >( vpvalue );

This would read the value of vpvalue, which contains the bits of a pointer, and store it in vptr which is now a pointer type, from which the vptr[ -2 ] makes some sense (though, perhaps, it looks odd).

In order to avoid two lines of code, the author combined these ideas into one line.

uint64_t* vptr = *reinterpret_cast<uint64_t** >(p);

So, "p" is cast to a "pointer to a pointer", but it is such a pointer to a uint64_t. The result dereferences that double pointer, which resolves to a uint64_t *, reading from the location of p the value stored there, which is a pointer to the vtable. The resulting type is stored as a uint64_t * without further casting (no double step involved).

You may still be asking, why not:

auto vpvalue = ( reinterpret_cast< uint64_t *>(p) );

...without that dereference in front of the reinterpret_cast.

That form would store p into vpvalue as a uint64_t *. What is required is to read the pointer stored there, not p itself. In this example, vpvalue would be the location of the instance, not the location of the vtable of that instance. The pointer to the vtable is what's stored at p, or *p...a dereference of p.
Last edited on
I see. So if I understand correctly, the value that vpvalue holds if using a single indirection pointer cast would be the address of the first entry of the vtable (the address of vptr[0]), but since the type is just a uint64_t, we can only see that as a value and we cannot dereference it because it is not a pointer. If we add a second level of indirection, we can use this same value and interpret it as a pointer so we can dereference it to the vtable and get the mdoffset value by using a negative index (vptr[-2]). Does that sound correct?
Last edited on
You have all the central points.
Topic archived. No new replies allowed.