An implementation issue: Polymorphic types and separate compilation

Pages: 12
Dear forum members,

I know that not anyone of you is paid for providing help to other C++ users, I know that most of you do not have much time to spend on this forum, I know that it is very hard to be concentrated after a full-day work especially when the topic is a tricky one like this and when reading very long posts like my first post in this topic, and I know that all of you have other duties as well but please be concentrated when you read other people's posts and most of all be precise if you decide to answer. I am not always right when I make assumptions but I always do my best to be clear and precise. Reading your answers I have a very strong feeling that not anyone of you has carefully read my posts (thus didn't understand the problem or what I said) and above all you give contradictory answers. So if you decide to participate in a topic please be concentrated and precise.


Both statements are true: There is one vtable for T in each translation unit where T is odr-used. Since they are emitted as defaulted weak symbols (you can see that with nm on Linux), there is one vtable for T in the linked program. Translation units do not exist after linking.

First these are not statements but questions so they cannot be true. Second, my questions asked whether VFT information for a type is used in a program as multiplied copies in many object modules in which case the program would not rely on assistance from linker, or as one global data structure in which case the program depends on assistance from linker. These obviously exclude each other so they cannot be both true.

Since Cubbi actually checked the object files generated by the compiler I will trust his answer. His answer implies that my assumptions I made in the Solution_2 that I wrote in the section POSSIBLE SOLUTIONS of my first post are true. I will not repeat myself but I will just point to the most important implications I wrote there:

- assistance from linker is needed since there would be one global VFT for each polymorphic type that would be accessed by the code in different modules but as I said there is no need to extend the linker with a special feature just to support polymorphism. However as I said there the compiler would create special "hidden" global variables that would represent VFT's that would not be visible in the source code but would be visible to the linker as symbols in the import/export sections of the object files.

- I asked there where (in which module) would compiler decide to define a VFT for a type (put the VFT symbol in the export section) and where to declare it (put the VFT symbol in the import section), and I made a possible answer. Any comments on that?


To the virtual table of C, since the object is a C, and C is the last class in C's inheritance tree that contains virtual functions.


It's pointing to the C object. Within the C object there is a pointer to C's VFT. The main function doesn't know any of that, it treats it exactly as it would treat any T object.

Since both helios and Cubbi have the same answer I will proceed from that. This returns us to the very beginning of the problem and the question I asked on Jan 23, 2016 at 9:01pm:

Can you explain to me how the code in Source2 can "parse" VFT of a dynamic child type that it is not aware of and that is defined only in Source1? The code in Source2 has to find a virtual function's position in the table. The generated code in Source2 has to somehow extract part of the dynamic child type's VFT that corresponds to type T of the function's F return type. But how can it extract that part if it does not know anything about the structure it is extracting from?


In other words: The code generated from Source2 does not know anything about C since it is not defined there. Yet, according to both helios and Cubbi it will receive a pointer to the C object that has a pointer to a global VFT of C. But the code in Source2 does not understand (does not know the structure of) VFT of C. So how can it find where in the table is fT?
Second, my questions asked whether VFT information for a type is used in a program as multiplied copies in many object modules in which case the program would not rely on assistance from linker, or as one global data structure in which case the program depends on assistance from linker. These obviously exclude each other so they cannot be both true.


The program requires assistance from the linker for, well, linking. It does not require any additional help from the linker for linking to a particular table. It's just another emitted symbol to be resolved in the latter case. And, in the former case, the linker handles it in the same way as identical expanded templates.

There is no special handling required by the linker in either case.

Yet, according to both helios and Cubbi it will receive a pointer to the C object that has a pointer to a global VFT of C. But the code in Source2 does not understand (does not know the structure of) VFT of C.

It knows:
The location of the table.
And where in that table the pointer to the (virtual) function for an object of type base will occur.

By extension, it also knows where in that table the pointer to the function for any type that derives from base will occur, because they will be in the same place. So, if all a function is doing is invoking a virtual function from a pointer to type base, the function just needs to know about the base type (and, indeed, one can't do that without including the header file for the base class.)
The program requires assistance from the linker for, well, linking. It does not require any additional help from the linker for linking to a particular table. It's just another emitted symbol to be resolved in the latter case. And, in the former case, the linker handles it in the same way as identical expanded templates.

There is no special handling required by the linker in either case.

You repeat what I said in Solution2 in the section POSSIBLE SOLUTIONS in my first post. But this dependence on the linker is for handling polymorphism as I stated there. So there is dependence on the linker for handling polymorphism since there are special variables generated by the compiler specially for handling polymorphism that are hidden in the source code but visible to the linker with all the important implications I pointed to in my Solution2. In case VFT's weren't global (and some of you suggested that in your answers) there wouldn't be such dependence on the linker.

It knows:
The location of the table.
And where in that table the pointer to the (virtual) function for an object of type base will occur.

Please be concentrated. You said that the code in Source2 will receive the address of C object. You also said that in the object's data structure the code will find the pointer to global VFT of C. So the code in Source2 will receive the address of a global variable it knows nothing about. And it knows nothing about global VFT of C since C is not declared in Source2. So it does not understand the global structure at the pointed address, and therefore it cannot find the address of fT in the structure.

Somewhere in the process you and helios both made a wrong assumption (you actually repeated my wrong assumption in my first post). I think I know where the fault is and what is the final answer to my question, but I just prefer that if possible we all come to the same conclusion independently to make the answer more certain/reliable.


P.S. I have just realized that I thought I was talking to Cubbi not cire. My apologies.
Last edited on
But this dependence on the linker is for handling polymorphism as I stated there. So there is dependence on the linker for handling polymorphism since there are special variables generated by the compiler specially for handling polymorphism that are hidden in the source code but visible to the linker with all the important implications I pointed to in my Solution2. In case VFT's weren't global (and some of you suggested that in your answers) there wouldn't be such dependence on the linker.


I'm sorry, but your own personal definitions and beliefs aren't adequate.

The tool chain is dependent on the linker to link. The linker knows nothing about the concept of polymorphism. Whether the linker is dealing with object code where the use of virtual member functions has or not, it does nothing differently. As you mention yourself, the code generated to handle this (including the instantiation of tables) is done so by the compiler.


Please be concentrated.

Please don't leave out the part of the quote that addresses the issue you bring up. I'm sure if you were "concentrated" you wouldn't have done so.

I have a strong suspicion you're trolling, so I will not be revisiting this thread.
Last edited on
You also said that in the object's data structure the code will find the pointer to global VFT of C. So the code in Source2 will receive the address of a global variable it knows nothing about. And it knows nothing about global VFT of C since C is not declared in Source2.
Who said Source2 knows nothing about the virtual table? Source2 doesn't know the entire contents of the virtual table, but it does know that the first element is a pointer to an implementation of T::fT(). It knows this because it can see the declarations of T and of T::fT(), and it knows that T::fT() is virtual. Since compilers have a defined order to put functions into the virtual table, Source2 can know where in the virtual table it can find the pointers to the functions it knows about even if the virtual table contains pointers to functions it doesn't know about.
Source2 doesn't know the entire contents of the virtual table, but it does know that the first element is a pointer to an implementation of T::fT().


Why first? What if child C has another parent T2:

Source1:
1
2
3
4
5
struct T { virtual void fT(); };
struct T2 {virtual void fT2();};
struct C : T2, T { virtual void fC(); void fT(); };

T * F(){ return new C; }



Source2 is the same as in Quark (21) - Jan 24, 2016 at 2:49pm
Again, Source 2 does not know the structure of C since it is not declared there.
F() returns a pointer to the T subobject within C. Did you think derived-to-base conversion was value-preserving?
F() returns a pointer to the T subobject within C. Did you think derived-to-base conversion was value-preserving?


Now you change your opinion!

on Jan 24, 2016 at 2:49pm I asked:
Question2: When F returns T * in main what kind of data structure does the pointer point to?


and your reply on Jan 24, 2016 at 5:37pm was:
It's pointing to the C object. Within the C object there is a pointer to C's VFT.


Pay attention to the bold part!

Now you say that the kind of data structure is different, namely that that of T. But how can it find
a pointer to C's VFT
Within the C object
there?
Why first?
There's no reason to put it anywhere else.

Now you change your opinion!
Your question and our answers were regarding a particular example.

PS: Why are you such an asshole?
Last edited on
Now you say that the kind of data structure is different, namely that that of T

T* points at the C object, but, since this T in your new example is not the first base, it's not pointing at the first byte of the C object (the first byte is where the T2 subobject begins). Everything else remains the same: main() will follow the pointer, find a vptr, follow that, and find itself in C's VFT looking at the address of thee final overrider of fT().

If you're finding this hard to follow, I strongly recommend reading a book on C++ and/or getting a hold on a C++ compiler, there are multiple freely downloadable compilers available.
Everything else remains the same: main() will follow the pointer, find a vptr...


But where to
find a vptr,

?

This is what I asked in my last post. Cubbi, can you answer that?
Nice. Very mature.
1
2
3
4
5
6
7
8
$ gdb ./test
...
8           T* p = F();
...
(gdb) print p
$2 = (T *) 0x601018
(gdb) print *p
$3 = {_vptr.T = 0x400870 <vtable for C+56>}

Just for the sake of understanding of other people who might read this thread, what I am asking in my last two posts is:
How can code in Source2 find a pointer to global VFT of type C in a data structure pointed to with a pointer of type T returned from F if Source2 does not know anything about type C?

I am pointing to contradictory answers we are receiving here. There is a very wrong assumption here.
Does anyone else has opinion about the problem here?
There is a very wrong assumption here.
That being...?
For the sake of community I will share my thoughts about what I think is the answer to the problem here.

I see two possible answers, one being a better choice than the other, but the choice between the two is externally visible i.e. it is visible on the interface between the independent modules having polymorphic types involved. That is to say if we want interoperability between the independently compiled modules having polymorphic types involved on the interface the different compilers must follow the same standard.

There are two issues here:

1) monolithic or composite pointers to polymorphic types.
Monolithic pointers point to only one data structure while composite pointers point to two data structures (a pointer to instance data + a pointer to VFT)

2) local or global VFT of polymorphic types.
Local VFT means that for each polymorphic type T there is a local copy of the VFT in each module where T is defined. Global VFT means that there is only one copy of VFT for each polymorphic type T that is accessible to all modules where T is defined.


Monolithic vs. Composite pointers
1) First solution: monolithic pointers to polymorphic types
It is a very wrong assumption that if we want to have monolithic pointers for polymorphic types there is always one and only one pointer to VFT for type C in an object of type C. Such assumption cannot answer questions I asked on Jan 24, 2016 at 8:43pm. For example in an object of a polymorphic type C (see Jan 24, 2016 at 2:49pm) there are as many pointers pointing to VFT for type C as there are polymorphic base classes of type C + 1, and in our case that is one for T and one for C thus two pointers pointing to VFT for type C in an object of type C. One pointer points to part of VFT for C that corresponds to type C and the other pointer points to part of VFT for type C that corresponds to type T. Where are these pointers located? At the beginning of instance data of relevant sub-objects (in our case one pointer is at the beginning of C and another at the beginning of type T sub-object of C object). It is important to see that in an object of type C the T sub-object's pointer does not point to VFT for type T but to the part of VFT for type C that corresponds to type T (is analogous in structure but not in actual data to VFT for type T).

This model works but is somewhat less efficient since it requires possibly many (as many as base classes) pointers to (different parts) of VFT for dynamic type of an polymorphic object as part of the object's instance data.


2) Second solution: composite pointers to polymorphic types
There is no actual need for pointers to VFT to be part of object's instance data in order to have polymorphism related features working. Why? Because there is a fixed relation between all the VFT pointers in one particular object, that relation is determined at compile time thus there is no need to "save" these calculations in each object instance data since these can be calculated at run time, just like other non-polymorphic pointers are converted at run time. So these conversions would have to be performed at run time for each of the two parts of a composite pointer pointing to a polymorphic type. In other words it is possible to determine/calculate at compile time all the other pointers having only one pointer as input. And that input can be part of the pointer's data. For example having a pointer of type C * pointing to a C object (see Jan 24, 2016 at 2:49pm) the composite pointer will point to instance data with one part and to VFT for type C with another part. It is easy then to perform conversion to T * pointer in Source 1 which would then point to T sub-object's instance data with one part and to part of VFT for type C that corresponds to type T with another part. What if one wanted to convert T * back to C * with dynamic cast in Source2? First, type C would have to be defined in Source2. Then the compiler would easily generate code that would check if T * with its VFT part is really pointing to VFT for type C. And that check can only be done if VFT's are global not local. The code would realize at run time that T * is indeed pointing to the part of VFT for type C that is corresponding to type T thus the conversion would succeed. Otherwise not.

This solution is more memory efficient approach but it requires two conversions at run time for two parts of the composite pointer type, unlike monolithic pointers for non-polymorphic types which would require only one conversion.


Global vs. Local VFT
It is possible to support the most important feature of polymorphism - late binding - without insisting on global VFT thus getting rid of dependence on linker for handling polymorphism (with all the implications of that). Late binding can be supported with both solutions with local VFT's. But local VFT cannot support another polymorphism feature - dynamic cast. Thus both solutions require global VFT per each polymorphic type as explained in Solution 2 in section POSSIBLE SOLUTIONS of my first post.


Conclusion:

In order to have interoperability between the modules compiled with different compilers (possibly even with different languages) having polymorphic types on the interface between the modules the compilers (supporting polymorphism) would have to document :
1) how VFT tables are generated/structured having types defined within an inheritance hierarchy
2) how the symbol names for global VFT's are generated and if they can be controlled with compiler parameters
3) which of the solutions above is used (are the pointers to polymorphic types monolithic or composite)


I think the issue is closed.

br
Last edited on
There is no interoperability between binaries compiled by different compilers, or even by different versions of the same compiler.

I think the issue is closed.
Alright, then.
Topic archived. No new replies allowed.
Pages: 12