Weird errors with pure virtual functions

I've got a strange problem here. Platform is Ubuntu oneiric 64 bit, g++ 4.6.1-9

I have a bunch of utility classes that I've developed over my career that I compile into a static library. Part of this is a class structure for working with satellite data. Has an abstract base class called Satellite, and then several derived classes for working with particular types of satellite data.

The Satellite class has two initialization functions to build objects directly from a MySQL database. These are pure virtual functions, prototypes as follows:

1
2
3
virtual int init(MYSQL *db, unsigned dbid, int size_file = -1, int cvid = -1) = 0;

virtual int init(MYSQL *db, unsigned catnum, const JulianDate &start, const JulianDate &stop, int size_file = -1, int cvid = -1, int dbid = -1) = 0;


This code, and the implementation of these functions in the derived classes has remained unchanged for several years (checked the subversion logs). And it's worked fine. The MySQL functions (there are quite a few more in the library) are wrapped in a preproccesor directive because some of the stuff I work on doesn't need MySQL or runs on platforms where it isn't available. So I haven't had the preproc directive turned on for awhile.

So now I'm reworking an old project that needs mysql. So I turned it on ... and got bizarre errors. Again, the project worked fine previously, and subversion confirms that there were no code changes.

Now when I run the application, it's failing - typically on an assert statement in a function that it has no way to get to!?! Literally - you go down the stack trace and it jumps into a function that it couldn't have gotten to from the current function!?

Another user of the library experienced similar problems, but doesn't need the MySQL functionality. So when he turned it off & recompiled, it works fine again!

I took this other app and started selectively commenting out the various blocks controlled by the mysql proproc directive and I've narrowed it down to the two lines I posted above. If I comment those out, all is good. If I uncomment them, it fails. If I uncomment either one of them and leave the other out, I get a different error, but similar in nature to the one described above. The same error regardless of which line is uncommented.

Any idea what's going on here?? It seems like at some point it just jumps to the wrong function call, from which it eventually bombs. Neither of the apps in question are actually using these init() functions.

Here's an example of the compile command for a file in the library. NetBeans is used for the build process.

g++ -Wall -O0 -fno-inline -c -g -Werror -DUSE_MYSQL -I/usr/include/mysql -I../Headers -MMD -MP -MF build/Debug_no_pvm/GNU-Linux-x86/_ext/2136743839/point.o.d -o build/Debug_no_pvm/GNU-Linux-x86/_ext/2136743839/point.o ../Point/point.cc

Here's the prototype and implementation of one of these functions from one of the derived classes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
//in class definition
int init(MYSQL *db, unsigned dbid, int size_file = -1, int cvid = -1);

//implementation edited for brevity
int Interp_Sat::init(MYSQL *db, unsigned dbid, int size_file, int cvid)
{
 assert(dbid > 0);
 
 ephem.clear();
 
 ostringstream qstr;

 MYSQL_RES *result;
 MYSQL_ROW row;
 
 qstr << "SELECT ephem_file.path, ephem_file.file_name, catalog_num, ephem_file.elset_epoch FROM ephem_file, elset_file WHERE  elset_file_id=elset_file.id AND ephem_file.id=" << dbid << ends;

 if( mysql_query(db, qstr.str().c_str()) )
    {
     cout << "ERROR: Database query failed in Interp_Sat::init(MYSQL*, unsigned, unsigned, unsigned)\n";
     return -1;
    }

 result = mysql_store_result(db);

 if(result == NULL)
    {
     cout << "ERROR: Failed retrieving database results in Interp_Sat::init(MYSQL*, unsigned, unsigned, unsigned)";
     return -2;
    }

 if( (row = mysql_fetch_row(result)) == NULL)
    {
     cout << "ERROR: No data found by database query in Interp_Sat::init(MYSQL*, unsigned, unsigned, unsigned)\n";
     return -3;
    }

 string filename = make_filename(row[0], row[1]);
 
 int err = ephem.init(filename, false);
 
 //If we succesfully read data from this file, record its database ID
 if(!err && ephem.size())
    set_database_id(dbid);
    
 DataEpoch.init("%Y-%m-%d %H:%M:%S", row[3]);

 mysql_free_result(result);
 
 CatNum = atoi(row[2]);
 
 if(ephem.size() == 0)
    {
     GoodElset = false;
     
     return -1;
    }

 GoodElset  = true;

 State      = ephem.state(0);
 Epoch      = ephem.first_epoch();

 return 0;
}
This does look weird. However, I'm afraid we won't be able to help you - the code that you showed looks fine and it's definitely not enough to diagnose the problem. The only thing I can think of now is that some part of your code uses the Interp_Sat class incorrectly (in an unsafe way, perhaps through some C-style or reinterpret casts, maybe something persistense-related etc.), and when you add virtual functions (by the way, are there any other virtual functions?), the layout of the class changes because of virtual table pointer, which eventually breaks the whole thing. Maybe you could make a minimal example of this problem and post it here, so we could try to compile and see the problem ourselves. Perhaps it's a compiler issue (although, I doubt it).

Good luck with your struggle!
As KRAkatau said, it's difficult to diagnose with such little info. However, I have a theory. If you have a class like so (where satellite would be B1 most likely) :

1
2
3
4
class Derived : public B1, public B2
{

};


and somewhere in client code:

 
B2 * pB2 = (B2*)(void*)myDerived;


I believe methods called on pB2 will have some really weird results, because it's calling them with lookups to the vtable expecting the vtable to be that of a B2, when the vtable is actually that of a Derived. So if vtable for B1 looks like

[0] b1foo
[1] b1bar

and vtable for B2 is

[0] b2foo
[1] b2bar

than vtable for derived would be something like

[0] b1foo
[1] b1bar
[2] b2foo
[3] b2bar

From the previous example, if you call pB2->b2foo(), it will say "b2foo? okay, thats the first function in B2s vtable, so I'll just get the address of the first function in pB2". Only problem is, that function is b1foo!

Note, I haven't really really brushed up on this kind of stuff in a long while, so I could be 100% wrong, but I think you can get erratic results like you are experiencing with this type of error, and I think it makes sense since it occurs depending on the presence of a particular virtual function. To fix, you would want to change all your casts to 'dynamic_cast'.
Last edited on
Thanks for the comments guys. I understand that it's not much to go on ... was hoping for some sort of "aha" moment :). I honestly haven't done much with inheritance before. I have been trying to winnow it down to a small enough example to post, but these are big classes dependent on lots of other code and external resources.

rollie - I'm not using multiple inheritance at all. I'm also not sure what the point of your 2nd code block is, so I doubt I'm doing something like that! In one of the apps that is causing problems I'm only using one of the derived classes, and I'm only using the objects directly - ie. not via a base class pointer.

I'll see if I can come up with a more complete example to post.
Another note - error is apparently not due to the two functions being pure virtual. Changing them to just virtual functions still gives the same error behavior. Not sure why I hadn't tried that before.
When you crash inside a function you think you cannot possibly be in, go back up the call stack and look how the object ('this') is being created, assigned, etc.
Dammit - wrote this whole thing out only to lose it because my login had timed out!

Ok, here we go again...

I've stripped down the code substantially to help this along. Removed the other derived classes of Satellite, leaving only the SGP4_Sat derived class which is used in the app. Stripped out a bunch of other data member and associated functions which aren't used for this app. The code is now crashing in a different manner.

What's happening is that I'm making the following call:

target->propagate_to(Tx, false);

target is a Satellite* object pointing to an SGP4_Sat object.

propagate_to() is a virtual function.

When the call to propagate_to() is made the call actually goes to SGP4_Sat::check_object(), which is a pure virtual function within Satellite.

The call to check_object() completes normally and returns, but the fact that the target object wasn't propagated triggers an assert a few lines later. The earlier error was also an erroneous jump to check_object() but it manifested differently because it was actually crashing within a function called from check object. But that functionality was removed in stripping things down.

When within the check_object() call I looked at *this and everything looked normal.

The SGP4_Sat object is allocated in the main() function as such:

SGP4_Sat sat(line1, line2);

This is a constructor unique to SGP4_Sat.

It becomes a Satellite* when passed to the following function:

find_viz(&sensor, &sat, jd0, jd1, min_elev, max_elev, optical_vizfunc, dbl_args, int_args, vw);

Which has the following prototype:

1
2
3
4
5
6
7
int find_viz(Point *site, Satellite *target, 
                     const JulianDate &start, const JulianDate &end, 
                    double min_elev, double max_elev,
                    bool (viz_func)(const Point *site, const Satellite *target_ecr, 
                                                const vector<double> &dbl_args, const vector<int> &int_args),
                    const vector<double> &dbl_args, const vector<int> &int_args,
                    vector<VizWindow> &final_windows)


I've also tried declaring the SGP4_Sat object on the heap as such:

1
2
3
4
5
SGP4_Sat *sat;
 
sat = new SGP4_Sat;
 
sat->init(line1, line2);


The error behavior is identical.

I stepped through some of the code and found that calls to non-virtual Satellite member functions are being executed properly. Earlier calls to propagate_to() are also being misdirected to check_object() also, but aren't being caught.

I'm about to try two things:

- add a call to another Satellite virtual function and see if it is also mis-directed
- add a call to an SGP4_Sat function which is unique to SGP4_Sat and see if it is resolved correctly.

Any help, observations, suggestions for other things to look at is appreciated!
Last edited on
Ok, forget about all the stuff above about find_viz() etc. Here is the entirety of the driver needed to reproduce the problem:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <stdlib.h>
#include <iostream>
#include "sgp4_sat.h"

using namespace std;

int main(int argc, char *argv[])
{
 JulianDate jd;
 
// SGP4_Sat sat(line1, line2);

 SGP4_Sat *sat;
 
 sat = new SGP4_Sat;

 sat->epoch(); //Satellite function - works properly 

 sat->element_number();  // SGP4_Sat only function - works properly

 sat->propagate(100.0, false); // Satellite pure virtual function - call goes to SGP4_Sat::close() - Satellite pure virtual func

 sat->propagate_to(jd, false); // Satellite pure virtual function - call goes to SGP4_Sat::check_object() - Satellite pur virtual function

 return 0; 
}


So...
- calls to functions defined in Satellite (Satellite::epoch()) work properly
- calls to function only defined in SGP4_Sat (SGP4_Sat::element_number()) work properly
- Calls to Satellite pure virtual functions go to an incorrect Satellite pure virtual function?!?

BUT...
If I modify the above driver to allocate sat on the stack, it works properly.
Last edited on
I've created a very stripped down model of these classes:

satellite_.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#ifndef SATELLITEh
#define SATELLITEh

//An abstract base class for the various type of satellite objects

#include <iostream>

using namespace std;

class Satellite
{
 private:
    unsigned DBID; //Database ID

 protected:
    double Epoch;
    
 public:
    virtual ~Satellite(void) 
        {
         cout << "Satellite::destructor" << endl;
         return;
        }
         
    virtual int init(unsigned dbid, int size_file = -1, int cvid = -1) = 0;
    virtual int init(unsigned catnum, double start, double stop,
                     int size_file = -1, int cvid = -1, int dbid = -1) = 0;

    Satellite(void)
       {
        Epoch     = 0.0;
       }
    
    virtual Satellite* new_satellite() const = 0;
    virtual Satellite* clone() const = 0;
    
    virtual bool check_object(void) = 0;
    
    virtual int propagate(double dt, bool check_energy=false) 
        {
         cout << "Satellite::propagate()" << endl; 
         return 0;
        }
        
    virtual int propagate_to(double t, bool check_energy=false) 
        {
         cout << "Satellite::propagate_to()" << endl; 
         return 0;
        }
        
    double epoch(void)                  const {return Epoch;}

    Satellite &operator=(const Satellite &x)
        {
         Epoch = x.Epoch;
        
         return *this;
        } 
};

#endif 


sgp4_sat_.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
#ifndef SGP4CLASSh
#define SGP4CLASSh

#include "satellite_.h"

class SGP4_Sat : public Satellite
{
 private:
     int catnum;

 public:
    SGP4_Sat(void)
        : Satellite()
        {
         catnum = 0;
         
         return;
        }

    SGP4_Sat(unsigned dbid, int size_file = -1, int cvid = -1)
        : Satellite()
        {
         init(dbid, size_file, cvid);
         
         return;
        }

    int init(unsigned dbid, int size_file = -1, int covgen_id = -1)
        {
         catnum = 0; 
         return 0;
        }

    SGP4_Sat(unsigned Catnum, double Start, double Stop, 
             int size_file = -1, int cvid = -1, int dbid = -1)
        : Satellite()
        {
         init(Catnum, Start, Stop, size_file, cvid, dbid);
         
         return;
        }

    int init(unsigned Catnum, double Start, double Stop, 
             int size_file = -1, int cvid = -1, int dbid = -1)
        {
         catnum = Catnum;
         return 0;
        }

    Satellite* new_satellite() const {return new SGP4_Sat();}
    Satellite* clone() const {return new SGP4_Sat(*this);}

    int propagate_to(double jd, bool check_energy)
        {
         cout << "SGP4_Sat::propagate_to()" << endl;
         Epoch = jd;
         return 0;
        }
    
    int propagate(double dt, bool check_energy)
        {
         cout << "SGP4_Sat::propagate()" << endl;
         return propagate_to(Epoch + dt, check_energy);
        }

        
    bool check_object(void)
        {
         cout << "SGP4_Sat::check_object()" << endl;
         return true;
        }
    
    int catnumber(void)
        {
         return catnum;
        }

    ~SGP4_Sat(void) 
        {
         cout << "SGP4_Sat::destructor" << endl;
         return;
        }

    SGP4_Sat &operator=(const SGP4_Sat &x)
        {
         Satellite::operator=(x);
         
         catnum = x.catnum;
         
         return *this;
        } 
};

#endif 


And the driver:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <stdlib.h>
#include <iostream>
#include "sgp4_sat_.h"

using namespace std;

int main(int argc, char *argv[])
{
 SGP4_Sat *sat;
 
 sat = new SGP4_Sat;
 
 sat->epoch();
 sat->catnumber();
 sat->propagate(100.0, false);
 sat->propagate_to(0.0, false);

 return 0; 
}


Note that this isn't meant to do anything event remotely useful - just a small enough inheritance implementation to wrap my head around, post, etc. And make sure the function calls go where their supposed to!

And, of course, they do. So my simple model is to simple ... grrrr

Last edited on
Does this code yield incorrect results for you?
In my case it seems to be working fine:
SGP4_Sat::propagate()
SGP4_Sat::propagate_to()
SGP4_Sat::propagate_to()
Yea, the stripped down code works fine. Calls go to the correct functions. I'm at a loss as to how to proceed here ... I guess another less stripped-down version.

How the &*%$ can a function call go to the wrong function??
Ok, some more information...

Found a statement on StackOverflow saying that calling virtual functions from constructors or destructors is a bad idea. So I rewrote some stuff in SGP4_Sat where that was happening. No change in behavior.

The, I compiled the entire thing on a single command line - driver, and all the classes. When I do this, it works! So maybe it's something to do with the build process in creating the library or linking it? For what it's worth, I'm doing a clean build every time I test this.
Oh, right. Didn't notice the virtual function call in the constructor. It really is not a good idea. Simply because those virtual calls will not work and will effectively be called as normal member functions.

So in the end it does look like some compiler/linker issue. Unfortunately I am of little help here as I usually work under windows with visual studio, and that is a totally different story. It seems very weird though - g++ is one of the most stable and predictable compilers out there.

When you solve the problem (I'm not saying "if", I say "when" :)) please post the solution - I am curious! And if you manage to strip down the code to some version that can be published - please do so, so I could try and see if there is a similar problem on different platform/compiler.

Again, good luck!
What is the function that IS being called? What is the function you expect? Please include the inheritance hierarchy for both classes, and info on those functions (virtual? not?). I still feel this is some sort of casting problem...something like

1
2
3
4
5
6
7
8
9
void f(Shape * pShape)
{
    cout << ((Circle *)pShape)->radius() << endl;
}

...

Shape * pShape = new Square();
f(pShape);
Last edited on
Ok, here's where I'm at...

Inheritance hierarchy Satellite is an abstract base class
SGP4_Sat inherits solely from Satellite


class SGP4_Sat : public Satellite

Within Satellite, the following functions are pure virtual:

1
2
3
4
5
6
7
virtual int init(MYSQL *db, unsigned dbid, int size_file = -1, int cvid = -1) = 0;
virtual int init(MYSQL *db, unsigned catnum, const JulianDate &start, const JulianDate &stop,
                         int size_file = -1, int cvid = -1, int dbid = -1) = 0;
virtual Satellite* new_satellite() const = 0;
virtual Satellite* clone() const = 0;
virtual bool check_object(void) = 0;
virtual void print(ostream &out) = 0;


and the following functions are virtual:
1
2
3
4
5
6
7
virtual ~Satellite(void)  //inline
virtual int propagate(double dt, bool check_energy=false) //inline
virtual int propagate_to(const JulianDate &t, bool check_energy=false); //inline
virtual int update_cov(void) //inline
virtual unsigned ephemeris_size(void) const  //inline
virtual JulianDate first_epoch(void) //inline
virtual JulianDate last_epoch(void) //inline 


All of the virtual functions are implemented in SGP4_Sat, many of them inline.

These classes, plus about two dozen more are compiled into a static library. Compile command looks like this:

 
g++ -Wall -O0 -fno-inline -Woverloaded-virtual -Wnon-virtual-dtor   -c -g -Werror -DUSE_MYSQL -I/usr/include/mysql -I../Headers -MMD -MP -MF build/Debug_no_pvm/GNU-Linux-x86/_ext/366117772/sgp4_sat.o.d -o build/Debug_no_pvm/GNU-Linux-x86/_ext/366117772/sgp4_sat.o ../Satellite/sgp4_sat.cc


The command that build the library looks like this...
 
ar -rv dist/libtoollibg_nopvm.a build/Debug_no_pvm/GNU-Linux-x86/_ext/761510127/mdistance.o build/Debug_no_pvm/GNU-Linux-x86/_ext/1931116410/novascon.o build/Debug_no_pvm/GNU-Linux-x86/_ext/2066346179/sun_vector.o <snip long list of object files>  build/Debug_no_pvm/GNU-Linux-x86/_ext/1931116410/novas.o build/Debug_no_pvm/GNU-Linux-x86/_ext/761510127/split.o


The driver to exercise the error is as follows (posted previously)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <stdlib.h>
#include <iostream>
#include "sgp4_sat.h"

using namespace std;

int main(int argc, char *argv[])
{
 JulianDate jd;
 
// SGP4_Sat sat(line1, line2);

 SGP4_Sat *sat;
 
 sat = new SGP4_Sat;

 sat->propagate(100.0, false); // Satellite pure virtual function - call goes to SGP4_Sat::clone() - Satellite pure virtual func

 sat->propagate_to(jd, false); // Satellite pure virtual function - call goes to SGP4_Sat::check_object() - Satellite pure virtual function

 return 0; 
}


It is compiled with the following command:

g++ -g -O0 -fno-inline -Wall lost_drvX.cc -o lost_drvX -I ~/Tools/Headers/ -L ~/lib/ -ltoollibg_nopvm -I /usr/include/mysql -lmysqlclient

When you execute the driver and inspect it with the debugger you find that the call to SGP4_Sat::propagate_to() on line 17 actually goes to SGP4_Sat::clone()

And the call to SGP4_Sat::propagate_to() on line 19 actually goes to SGP4_Sat::check_object().

As you can see, there aren't any casts.

Last edited on
One notice. Virtual functions cannot be inlined (because the compiler doesn't know the exact type of the object at compile time, unless it's immensely smart to analyze the whole code and to understand that only a single subclass is ever passed to the function that makes calls to virtual functions, but I really doubt that any compiler does that). Maybe that's the problem? Although "inline" is just a recommendation to the compiler and it can ignore this recommendation (and should ignore it in this case). Try removing inline - maybe it will help.
This is a really interesting problem :) Can you try removing all of your #ifdef MYSQL like statements and rebuilding both the library and driver? My new theory is that your library is being compiled without that pre-processor directive, but your client is being compiled with it, resulting in an effectively different header file.
Last edited on
Topic archived. No new replies allowed.