@LB
Nope, on x86 there are only four 32-bit general-purpose registers E{A,B,C,D}X and at any time any of them could have been clobbered by another instruction. I could use the stack, but that slows down my code because I'll have to use a bunch of MOVs and PUSHes for what should be a single instruction. Temporaries, i.e. local variables, go on the stack. If you look at the assembly code gcc produces you'll see something like
add esp, 4
which is reserving 4 bytes on the stack for a local variable.
CIL is a stack-based language, so where x86 does:
1 2 3
|
mov eax, 0x0f
mov ebx, 0x01
add eax, ebx
|
CIL does something like:
1 2 3
|
push 0x0f
push 0x01
add
|
which I have to somehow translate into the above. The problem is that I can only process one opcode at a time, it would be too complicated for my program to figure out
why the input program is pushing values on the stack. I could just use the stack myself, but it would be much faster to use the registers. The problem arises when I have two input operations that use the same data. Because I'm only looking at one opcode at a time, I can't handle this situation:
1 2 3 4 5
|
push 0x0f
push 0x01
add
push 0x10
add
|
In CIL, this pushes 0x0f and 0x01, adds them, taking them off the stack and putting the result (0x10) on, and then adds 0x10, taking it and the previous result off the stack and putting the result (0x20) back on. A naive dynamic recompiler would generate this:
1 2 3 4 5
|
mov eax, 0x0f
mov eax, 0x01
add eax, ebx
mov eax, 0x10
add eax, ebx
|
the result of which would be unknown and totally useless. When the code generator turns
push 0x01
into
mov eax, 0x01
, it doesn't know that eax was just loaded with the value 0x0f, or that the next instruction wants to add the previous EAX and the new one together. It's pretty easy to mark registers as "in use", which
appears to solve the problem, but there's no obvious way of marking a register as no longer in use -- the function that generated the opcode that's using it doesn't know when it'll be finished -- and since x86 only has 4 registers for general use, I'd run out in less than a millisecond.
It wouldn't be a problem if I just used the stack, but that would be a lot slower.
[edit]
@Zhuge
Thanks! I should've figured there would be existing algorithms for this. Hopefully that solves my problem.