Code objects alignment, memory and execution by CPU in C++

Pages: 12
I think it should be noted that C and C++ are high-level languages, and as such the translation from source code to machine code is very loose. It can be useful to inspect the compiler output of a piece of code, but always keep in mind that all you're investigating is what the compiler did with the code in that particular instance, not what every compiler will do in every instance. The same compiler, in the same translation unit, with two identical blocks of source code, can produce two very different machine codes because of the context in which they appear. Let alone if you start to deal with varying compiler options, or even compilers.

For example, thinking "I will use short for this local variable because it'll save memory" is being in the totally wrong mindset. Just because you added a local variable doesn't mean the compiler is definitely going to allocate stack space for it. It could decide to allocate variable as a CPU register, or it could realize that it can simply treat it as an intermediate computation value and eliminate it completely. For example, a compiler could easily rewrite this:
1
2
3
4
int bar = foo();
bar *= 2;
bar += bar;
return bar;
into this:
 
return foo() << 2;

If you want to investigate the low-level execution details of your machine you should study its Assembly language. Trying to study C++ to do this is an exercise in futility. You'll just fool yourself into producing suboptimal brittle code that relies on undefined behavior that you didn't know was undefined because your compiler reliably produced the same output every time.
Last edited on
@kigar64551 As such, short takes 2 bytes and int 4 bytes on memory.
@jonnin, @helios, I mean memory only. So when there's a case the integer value (or values) can be stored in short it's better to use a short instead of int which is larger. This is on memory, but this optimal choice may later have no effect on registers as they're large (compared to the size of short / int) or it may change based on the compiler's decision. Yet while stack memory is very limited, (1 MB on Windows), I suppose the choice makes sense.
A side but related question: Can padding occur in non-class/struct objects? If so how?
A side but related question: Can padding occur in non-class/struct objects? If so how?
Yes it can. The compiler is free to pad POD (plain old data) if it wants, but I don't know why it would want to.

Read Helios's response carefully. All of this stuff is implementation dependent and unless you're writing device drivers or operating systems, it probably doesn't matter. Worrying about it is just a distraction from getting your code to work reliably.

If you're really interested in some of these details, you might consider a course in computer architecture. There you'll learn about the details of how different computers handle different things.
I reread their post above but my word is to write optimal code (in terms of memory and other things as well) on your side and let the compiler do what it wants with the code. If it eliminate parts of your optimal code, no worries. If it doesn't, you wish you'd write code optimally in the first place.
Writing optimal code is not a bad goal to have, as long as you understand how to do it. Blindly compressing the size of local integer variables is not the way to do it. In the best case it doesn't do anything for performance and it makes the code more difficult to understand because the reader has to interpret why the sizes of things keep changing. In a bad case the compiler has to perform additional unnecessary computations to trim register values to the right size, and possibly waste more memory than if you hadn't done anything, to meet alignment requirements. In the worst case you can lose track in the middle and accidentally drop bits you shouldn't have, making your code incorrect. Needless to say, correct slow code is better than fast broken code.

Don't do this:
1
2
3
u32 value = foo();
//The divided value fits in 16 bits, so let's save two bytes.
u16 smaller_value = value / 16;
Do do this:
1
2
3
4
5
u32 value = foo();
//We NEED to truncate to 16 bits, because we don't own anything
//after the first 64K.
u16 address = value;
bar(address); //void bar(u32); 


One place where the size of things does matter is when allocating arrays. For example if you're storing a bitmap in memory, it makes more sense to do this:
1
2
3
4
struct pixel{
    u8 r, g, b, a;
};
std::vector<pixel> bitmap(w * h);
than this:
1
2
3
4
struct pixel{
    u32 r, g, b, a;
};
std::vector<pixel> bitmap(w * h);
(Unless of course you need 32 bits per channel. I think scientific images do get such high dynamic ranges.)
Last edited on
In a bad case the compiler has to perform additional unnecessary computations to trim register values to the right size, and possibly waste more memory than if you hadn't done anything, to meet alignment requirements. In the worst case you can lose track in the middle and accidentally drop bits you shouldn't have, making your code incorrect.
I don't understand the reason for these but anyway.
//We NEED to truncate to 16 bits, because we don't own anything
//after the first 64K.
I didn't get this comment either! What 64K?
Donald Knuth said "premature optimization is the root of all evil." What he meant is that it's a bad idea to optimize code unless you are certain that specific code is causing a performance problem. The right way to go about optimizing is to write the code, measure the performance, and, if necessary, optimize whatever the measurements reveal as a problem.

Low level memory allocation is almost never a problem.
There are good "profiling" tools available that can easily tell you in which functions a program spends most of its time.

It's those "hot" functions which are worth optimizing, because even small improvement there can be significant for the overall runtime of the program. All the rest probably is not worth optimizing, because even a big improvement there will be insignificant for the overall runtime.

You will often see that people re-write "hot" functions in hand-optimized Assembly code, but leave the rest in plain C/C++.

______

Also, before you spend time on "low level" optimizations, think about "high level" optimizations of your algorithm! For example, no matter how much you optimize an O(n²) algorithm, even a badly optimized O(n) algorithm will always be faster – for sufficiently large inputs.
Last edited on
Low level memory allocation is almost never a problem.

This^^
where memory likes to burn you is when novice programmers create and destroy OOP objects like mad where the work is hidden / buried.
As in, for a billion times, call some function that creates and destroys a dozen intermediate/temporary stl containers. Thankfully, as already said, profilers can point at sluggish areas, but when it comes to memory performance, two things bubble to the top for most problems: page faults & spamming allocate & release. And compilers are getting super smart, some of detect foolishness and fix it silently as they generate the machine code.
Topic archived. No new replies allowed.
Pages: 12