Class Representation In Assembly

closed account (DEhqDjzh)
Hi, I am researching how c++ classes represented in the assembly. But articles I found for people who know assembly well. But I don't :( Can someone explain how classes represented in the assembly?

Note: It will be awesome If you post the assembly code with every line explained with a comment :)
write a simple class and compile it to generate the assembly listing. Keep the program small, don't do anything but declare the class and create one instance of it the first time, even just like an int, a double, and a constructor maybe. The better compilers put the C++ code next to the group of statements that it translates to in the assembler listing so you can see what it means.

What you will see is that … its not really represented as an object. The c++ compiler is going to render the class variables and methods into atomic types or pointers to arrays of atomic types and the methods into subroutines or inline code. The class itself, the type, won't exist, and the variable of the class type will explode into atomics and pointers and routines. Some assembly languages may support a crude C-like struct format, others not. If that is supported it will use it.

you are going to need to understand assembly to get very far with this. the compiler will generate addressing statements that are incomprehensible to a beginner. This will be the most frustrating part of it, is the way it throws pointers and offsets around... it takes a while to get used to it. At first, avoid any STL or complex class members and inheritance etc. Start slow, build up... those things will explode your listing and make it hard to unravel.

Last edited on
Well, you have to understand ABIs and mangling, honestly. Basically, C is the standard linking and such is based on (not assembly, since assemblers basically use the C standard, even though C gets compiled int assembly): so everything must use the usual letters, numbers, and certain symbols like underscores only. What C++ compilers do is something called "mangling," which is where it has a ruleset for how it would actually go about doing this. In theory, if you know how to mangle properly, you could write ugly-looking C code that works well with C++ code. Fortunately for you, I already tackled this.

My C++ code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <stdio.h>
#include <stdlib.h>

extern "C" int seed;
extern "C" unsigned short prng();
extern unsigned short mprng();


int main(int argc, char** argv){
        int read;
        seed=0x882829;
        for(int i = 10; i>0; i--){
                read = mprng();
                printf("%i = 0x%04x\n", read, read);
        }
        return 0;
}


The extern means we want to grab the symbols from a .o file (pre-compiled, but pre-linked [linker mixes pre-compiled files and makes an executable] file) or from another source that will be compiled separately, but the code doesn't reference it (like a .S [assembly] or .cpp file). Extern is implicit in some cases (function templates) where you might end up with an "undefined reference" thing.

The "C" part tells the compiler we need to expect the function name rules to conform to the C standard (no overloading, classes, etc) as opposed to the C++ standard (uses mangling to make all that possible).

Now, if you use "g++ main.cpp -c" you'll get a main.o file (assuming you named it main.cpp), then use "nm main.o" you'll see something like
1
2
3
4
5
         U atoi
00000000 T main
         U printf
         U seed
         U _Z5mprngv


What happened here is that I didn't actually use prng, but I did use seed (no mangling assumed since I used "C" after extern) and I used _Z5mprngv, which is the mangled version of mprng. The "_Z" is the hint that it's mangled C++, "5" represents how long the name of the function is ("mprng" is 5 characters), and the "v" probably means short is returned (it's been a while). But, don't take my word for it, try doing that much and experimenting for yourself.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#Note that the cpp file was compiled on a 32bit system, so you might have to change some flags to get it to be happy with 32bit.

.intel_syntax noprefix #Standard MASM syntax, instead of ugly GNU assembler syntax
.code32 #This is 32bit code, not 64bit or 16bit.

.global seed #global means that we want to export that label (things with the syntax of "abcdef:")
.global prng
.global _Z5mprngv

.section .text #executable code below

#Our random number generator, which isn't that good, but it is cheap.
prng:   mov eax, seed   #moves the value of seed into the eax register.
        shr eax, 3      #eax >>= 3;
        xor eax, seed   #eax ^= seed;
        rol eax, 5      #eax = eax << 5 || (unsigned long)eax >> 27;
        mov seed, eax   #seed = eax;
        and eax, 0xFFFF #eax &= 0xFFFF;
        ret             #return eax;

_Z5mprngv: #Same thing, only with the mangled name.
        mov eax, seed
        shr eax, 3
        xor eax, seed
        rol eax, 5
        mov seed, eax
        and eax, 0xFFFF
        ret

.section .data #readable and writeable, but not executable stuff below

seed: .long 0xdeaddead #long (int), as opposed to short (int). 


Now, you'll notice that a good part of this assembly code is actually about formatting, rather than actual executable code. The labels thare there to keep track of pointers, the stuff beginning with periods are all directives to the assembler and linker, not for the ending binary. Also, this wasn't objects and classes, but code I just had laying around for the purpose of discussing this topic. You could think of overloaded functions (functions that share the same name, but have different qualities) as functions of a "global class," if it makes it easier to understand. Experiment with the stuff above the assembly to see how it looks. The big lesson here is that the assembly doesn't care about objects or how they're stored, but the linker does. Now, for fun, look at this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <stdio.h>
#include <stdlib.h>

extern "C" int seed;
extern "C" unsigned short prng();
extern "C" unsigned short _Z5mprngv();


int main(int argc, char** argv){
        int read;
        seed=0x882829;
        for(int i = 10; i>0; i--){
                read = _Z5mprngv();
                printf("%i = 0x%04x\n", read, read);
        }
        return 0;
}


Rather than assembly, if you were to write the function in a .c file and compile with gcc, you could totally do the same thing without learning assembly. Try making a class and some functions, see how it works. Remember, normally you wouldn't export the objects themselves, but, internally, the compiler does this same mangling for when it spits out the assembly.

But that's GCC. The actual mangling methods may be different for other compilers, however, odds are, it's going to be C oriented and otherwise very similar.
closed account (DEhqDjzh)
@kohlrak thank you :)
Topic archived. No new replies allowed.