creating functions in assembly?

closed account (Dy7SLyTq)
so i want to read in a file in assembly but i want to put it in a function. im having a little trouble finding out how to do it. i spent some time on google even. how do you create functions in nasm intel assembly?
closed account (o1vk4iN6)
You'd just need a label and just do "call label". The label is defined the same way as you do "main" for your program to link into.

1
2
3
4
5
6
7
// make "main" visible to linker...
main:
  call myfunc
  ret

myfunc:
  ret


Last edited on
You just make a label and put some code under it.

Before calling it, put your arguments in the correct registers.

In the block of code below your label, you have to follow calling conventions which include preserving the values in certain registers (only if you plan to use them) by pushing their values onto the stack at the beginning, and popping them back off into the registers at the end, and establish a stack frame, "epilog, prolog".

I don't know intel syntax.

It looks sort of like this in AT&T syntax (i didn't preserve any registers values here) ,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

.section .rodata

# how many bytes you need for local variables
.equ stack_size, some number of bytes

.text
.global function
function:

    pushq %rbp
    movq %rsp, %rbp
    subq $stack_size, %rsp

    #your arguments are in certain registers, cannot remember off the top of 
    #my head which ones
   
    ...
    
    movl $0, %eax #where what you're returning goes

    addq $stack_size, %rsp
    leave
    ret


But why not just write a simple function, and use your compiler to generate the assembly. Then you can look at it and try and see what is going on. You might want to disable optimizations, and a few other options I can't recall off the top of my head if you want it to be understandable, and to reflect exactly what you write.

I never did actually do anything with parameters or return values which were not integral values.

I suppose if you want to read a file in a function, you need to have a parameter which is a pointer to an array to store the read bytes in. Then you just start reading bytes and putting them into the correct places in the array.

EDIT: some of what I said applies only to X86-64, not regular X86. The main difference is that function arguments are stored on the stack instead of in registers in regular X86.
Last edited on
closed account (Dy7SLyTq)
@xerxi: thanks good to know
@htirwin: how would i generate intel assembly with gcc?
how would i generate intel assembly with gcc?

http://stackoverflow.com/a/200028
closed account (Dy7SLyTq)
thank you!
closed account (Dy7SLyTq)
ok i did that and now im really confused...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <stdio.h>

int main(int argc, char *argv[])
{
    FILE *File = fopen("jade.asm", "r");
    int c;

    while((c = fgetc(File)) != EOF) putc(c, stdout);
}
[output]
	.file	"test.c"
	.intel_syntax noprefix
	.section	.rodata
.LC0:
	.string	"r"
.LC1:
	.string	"jade.asm"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	push	ebp
	.cfi_def_cfa_offset 8
	.cfi_offset 5, -8
	mov	ebp, esp
	.cfi_def_cfa_register 5
	and	esp, -16
	sub	esp, 32
	mov	edx, OFFSET FLAT:.LC0
	mov	eax, OFFSET FLAT:.LC1
	mov	DWORD PTR [esp+4], edx
	mov	DWORD PTR [esp], eax
	call	fopen
	mov	DWORD PTR [esp+24], eax
	jmp	.L2
.L3:
	mov	eax, DWORD PTR stdout
	mov	DWORD PTR [esp+4], eax
	mov	eax, DWORD PTR [esp+28]
	mov	DWORD PTR [esp], eax
	call	_IO_putc
.L2:
	mov	eax, DWORD PTR [esp+24]
	mov	DWORD PTR [esp], eax
	call	fgetc
	mov	DWORD PTR [esp+28], eax
	cmp	DWORD PTR [esp+28], -1
	jne	.L3
	leave
	.cfi_restore 5
	.cfi_def_cfa 4, 4
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (Ubuntu/Linaro 4.6.4-1ubuntu1~12.04) 4.6.4"
	.section	.note.GNU-stack,"",@progbits
[/output]
Last edited on
Why not start off with a simple main, such as initialising and adding variables. What part of the code confuses you?
@DTSCode:
You need your compiler to output bytecode.
Then, from now on, everything will be OS-dependent.

For Windows, you need to VirtualAlloc enough memory, so it can be run as executable code.

Then you can call it, and VirtualFree it.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <stdio.h>

int main(int argc, char *argv[])
{
    FILE *File = fopen("jade.o", "r");
    fseek(File,0,SEEK_END);
    long pos = ftell(File);
    if(pos == -1)
        return 1;
    fseek(File,0,SEEK_SET);
// The .o file MUST contain RAW ASM BYTECODE! NOT assembly source code!
    char* memory = (char*)VirtualAlloc(0,pos,MEM_COMMIT|MEM_RESERVE,PAGE_EXECUTE_READWRITE);
// PAGE_EXECUTE_READWRITE is Important!
// This will allow the memory chunk to be executed as binary code!
    if(!memory)
        return 1;
    fread(memory,1,pos,File);
    fclose(File);
    File = 0;
    asm("call %1":"r"(memory));
// equivalent to "__asm call memory" in MSVS
    VirtualFree(memory,0,MEM_RELEASE);
    memory = 0;
    return 0;
}


But, at that point, you better use shared libraries (aka DLL's on Windows)
Not to intrude or anything but why do you NEED VirtualAlloc? What will happen if you use malloc/new?
closed account (Dy7SLyTq)
@montario: the whole thing. all of the .Whatevers and the fact that there is no SECTION's. and yeah i guess i could do that and work my way up.

@ssg: im not even going to pretend like i understood that. why do i need my compiler to output byte code? why cant my assembly code be compiled on all operating systems? im using nasm. what does VirtualAlloc do? how does it make the object file contain Raw assembly byte code? what are you passing it? why the assembly call to memory->virtualalloc and not just use memory? and why would i use microsoft techniques when im on linux? and what do you mean by your last line?
what do you mean by your last line?
It's common practice to set pointers to NULL/nullptr after freeing them.

why the assembly call to memory->virtualalloc and not just use memory
Because there wouldn't be any allocated memory if you just used the pointer.
Last edited on
EssGeEich is saying that you can allocate some memory that is allowed to run executable code and by loading the bytecode from a file into that memory, you can use that asm call to tell your program to start executing instructions at that new memory location
But how would windows know that that isn't just allocated with malloc? Is it required?

Could I use VirtualAlloc without making it hold bytecode? What would happen if I free()'d/delete[]'d it?
Last edited on
I cannot help you there, I also use linux almost exclusively. I trust that he is correct in saying that virtualalloc is special in windows, allowing arbitrary code execution and assuming it is allocated in a special way, i bet virtualfree is needed for proper behavior. Sounds like a nice google research project :P
Well, lets start with a simple main such as this:

1
2
3
4
5
6
7
int test() {

   int x = 2;
   x+=7;

    return 0;
}


On a x86 32-bit system, an assembly equivalent could be something along the lines of this: (I am not too proficient with x86 assembly, I only dabble primarily with 8080 asm, so I apologise in advance.)

1
2
3
4
5
6
7
8
9
10
11
12
13
_test:
push ebp ; retain previous value of ebp
mov ebp, esp
sub esp, 4 ; size of(x) bytes - local var x on the stack
mov DWORD PTR [ebp-4], 2 ; x = 2;
mov ebx, [ebp-4]
add ebx, 7 ; x += 7
mov [ebp-4], ebx
mov esp, edp ; restore stack pointer
pop ebp
xor ebx, ebx ; ebx = 0
mov eax, 0 ; ret value of test
ret


Learning how to write subroutines in assembly requires a sound knowledge of the stack data structure, and how a call stack works.

You probably already know this, though the stack pointer is a special-purpose register, and holds the address of the last datum that was pushed onto the stack. The address held at esp is usually copied to edp at the start of a function, and acts as a frame pointer pointing to the top (or, in this case) the bottom of the stack, establishing a 'stack frame'. By x86 conventions, actual value of edp remains constant throughout the routine, though local data and parameters can be accessed through offsets, such as [edp-4], in this case, referring to local variable x.

Usually, the offset of ebp* is negative (up) for accessing local data and positive (down) for accessing parameters.
Last edited on
For Windows, it IS required. Default allocations with malloc/new don't allow code executions on them for security reasons.
To be able to execute code with that memory you need to allocate it with VirtualAlloc and PAGE_EXECUTE_READWRITE.

PAGE_EXECUTE_READ allows Code Execution.
Adding WRITE allows you to write the bytecode from the data to the memory.
For this reasons you just cannot use malloc/new who just have PAGE_READWRITE permissions.

I mentioned Windows because it's in the lounge, not in the unix-specific section, and because I don't know how to do that in Linux.

@Lumpkin I didn't understand the question, but VirtualAlloc doesn't (obviously) allow you to free/delete it, just to VirtualFree it.
And you can use VirtualAlloc to just allocate a memory chunk with PAGE_READWRITE, and it won't be executed (Crash if you try to execute codes in that location).

why do i need my compiler to output byte code?

Unless you want to carry a ASM compiler and deal with licensing issues, I don't suggest you to use source code. Besides, compiling an assembly file before instantly running it could be slow as hell. Compilation is not fast for every PC.

why cant my assembly code be compiled on all operating systems?

One OS could allow some rare instructions to be ran when another may not.
Can't tell tho, but you better compile them three times at least: Windows, Linux and Mac (If you're going to port them, otherwise just Linux could be enough).
Last edited on
DTSCode wrote:
all of the .Whatevers

All the dot-whatevers are assembler directives (instructions to the assembler program, as opposed to CPU instructions). You can read about them in the assembler documentation (info gas, in this case)

and the fact that there is no SECTION's.

But there are, they are just spelled differently. You have a data section (.section .rodata) and the text (code) section, .text)
Last edited on
Topic archived. No new replies allowed.