file sizes

I was raised on Java, but recently I've been coding more and more in C++ for work and there's some things I don't understand and I can't seem to get good non-biased answers to. I have a feeling I won't get them here either, but I'd like to give it a shot.

The main one is file size. When we add new methods to the hpp of a class we're being required to put the body of that function in separate files. So if I have a class call Person which is defined in Person.hpp and I put two methods called jump() and run() on the class I'm being asked to define those methods each in their own .cpp named jump.cpp and run.cpp. Supposedly this is so the compiler only has to bring in the body of the method if it's actually being used in the calling program. What exactly are all the benefits of this? Is this even really necessary? I wouldn't expect this to make that much difference at least not enough to justify the extra hassle during coding.
You don't.

The only time you need to create a new .cpp file is if you are deriving from a class whose cpp file you cannot modify.

Did I understand your question correctly?
I know I don't have to, but according my manager it's absolutely necessary in order to reduce the size of the final compiled program and prevent unnecessary files from getting included.
Sounds like a very confused manager.
Part of it has to do with the fact that our data base supposedly only allows us to have 200 open tables at one time, and we're concerned that if we add things unnecessarily we could go over that, but it seems like there are other concerns that I'm not understanding. There are very few programs we have where we'd even come close to that limit. And in those programs we'd hit it regardless.


Could this have been a bigger problem in the past before increased memory and cpu power?
I don't see how the problems relate.

The number of source code files has nothing to do with the database you are manipulating with the compiled program.

And, for the record, adding a new file for every single update will not reduce compile time. (And will, at some point, actually increase it.)
Supposedly he believes that the way our compiler works is that if I'm accessing data file A in jump.cpp and data file B in run.cpp, but in my main program I only call jump, then the code for run.cpp will not be brought into the final executable and therefore data file B will not be opened when the program runs.

Whereas if I put the body of both methods into Person.cpp the code for both methods would make their way into the final executable even if I only called one of them and as a result both files would open and the size of the final executable would be bigger.
Wait, so you're saying that he believes that in the code below:
1
2
3
4
5
6
// Person.hpp
struct Person
{
    void jump();
    void run();
};
1
2
3
4
5
6
// jump.cpp
#include "Person.hpp"
void Person::jump()
{
    open_my_data_file("SomeDataA");
}
1
2
3
4
5
6
// run.cpp
#include "Person.hpp"
void Person::run()
{
    open_my_data_file("SomeDataB");
}
...that if both jump.cpp and run.cpp are compiled into the final executable, then even if Person::run() never gets called, the "SomeDataB" file will still be opened?

That doesn't make any sense....

Also, I don't see what this has to do with executable file sizes. Sure, it might be a little bigger (depending on whether the linker will actually stick the function in even if it's not referenced anywhere -- I don't recall what the linker does in that case), but that shouldn't have any effect on database accesses...(unless I'm misunderstanding your question).
Your manager is mistaken.

It doesn't matter whether you put Person::jump() and Person::run() into the same .cpp file or into two different ones. When you build and link the code, the object code will all get linked into the same library or executable. Arbitrarily splitting the method implementations into different files makes no difference at all.

With free functions, a linker will almost certainly figure out for itself which functions are called and which aren't, and will only include the appropriate ones in the final executable. With methods of a class, I'm not sure if that's true. But in any case, the behaviour should be the same regardless of whether the code is split between multiple .cpp files or not.

And the files "SomeDataA" and "SomeDataB" won't be opened unless those methods are actually called during the execution of the program.

Unfortunately, it sounds like you don't have the experience and confidence with C++ to authoritatively communicate this to your manager. Is there an experienced C++ developer in your organisation whom you could enlist to help persuade him?
Last edited on
Any ideas on how I would go about proving this?

Both of the most experienced developers in my group seem to be in agreement about this. There must be something special about the type of data file getting opened that I'm not able to explain.
I might see where they are coming from...

1) If you make a change to only Person::run(), you don't have to recompile Person::run() and Person::jump(). It's better for compilation time.
2) If you need to exclusively check out a file in order to edit it, smaller files mean that several people can work on teh same class at one time. This depends on your revision control (if you use Email or shared-point for revision control). Things like Git or SVN will merge automatically.
3) If you have function bodies declared in headers, that is when you run into very large files. You often don't want to put the entire class in a header file since this will duplicate that code into each source file you have included it in. It also significantly increases compilation time, and any changes to the headers will force re-compilation of every dependent object.

I assume that when you say "database" you are referring to the revision control repository.

The main difference between Java and C++ here is that C++ has headers, while Java doesn't. It really make sense to put each method in it's own source file, but it does make sense to separate bodies from headers, so maybe it was just a mis-understanding.
Last edited on
I was also wondering if they are talking about the VCS -- in which case, any decent VCS can handle edits to non-overlapping pieces of the same file. (If you have more than one person working on the same piece of code, then you have an organizational problem with your humans there.)

Also, you should prefer to think that you have misunderstood something before you assume that everyone else is in error.

Maybe you should ask some questions to better understand what they mean.
Supposedly he believes that the way our compiler works is that if I'm accessing data file A in jump.cpp and data file B in run.cpp, but in my main program I only call jump, then the code for run.cpp will not be brought into the final executable and therefore data file B will not be opened when the program runs.


Regardless of what you do from main if your code doesn't call the function that opens data file 'X', then data file 'X' will not be opened. C\C++ is very good at doing exactly what you tell it to. So good in fact that it will not even make the slightest extra effort to do anything beyond exactly what you have written. You can prove this with the "handles" program from SysInternals if you are on Windows, it's probably even easier on *Nix but I can't recall what the name of the tool you should use is.

No matter what compiler you are using, if you declare a member function for an object then that functions definition will be brought into the final compiled binary even if it is never called. Member functions are not optimized out and this is something you can prove with a debugger and some patience.
Are these object files loaded at run-time?
Some kind of delay-load-and-link-object-files-on-demand environment?

In that scenario, if we have
jump.cc compiled to give jump.o
and run.cc compiled to give run.o,
If run() is called, but jump() is never called, jump.o would not be loaded.

On the other hand, if we have a single larger run_plus_jump.o which contains code for both jump() and run(), it would be loaded if either jump() or run() is called.
Loading object files at runtime? Now we're really starting to sound like Java (OP's first language). That leads me further to think that OP just has a mis-understanding of what is required.
Lazy loading (or delay loading) of object files (load automatically at runtime triggered by the first call to a function in that object) is directly supported on many platforms.

Microsoft: http://msdn.microsoft.com/en-us/library/yx9zd12s.aspx
AIX: http://aerodyne.technion.ac.il/InfoPages/faq/programming/aix_linking.pdf
Solaris: http://docs.oracle.com/cd/E26502_01/html/E26507/chapter3-7.html

Even on platforms where lazy loading is not natively supported, frameworks can provide the functionality. For instance, dynamic Shared objects (.dso) in the Apache web server. http://httpd.apache.org/docs/current/dso.html

This is useful only if the size of a delay loaded module is something that matters in an environment where memory is at a premium, and at runtime, there is a reasonably high probability that a function in the delay loaded module will never get called. Or there is the requirement that the delay loaded module can be replaced by a modified version, without having to shut down and restart the server process. Or if the server process is to be dynamically assembled at run time, module by module, based on a configuration file.
Topic archived. No new replies allowed.