Best way writing an assembler for a virtal machine?

Hello there!

I wrote a virtual machine, just for fun ( https://github.com/NicoSchumann/virtual_machine ).

Now I need still an assembler which translates assembly instructions to machine code.
Is there a better way than parsing line per line and throwing it through a switch directive?

I have to deal with different amounts of tokens. Also, I'm unsure how I could implement labels for jumping.

Here is a synopsis of the instructions which I have implemented so far at the machine:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
enum OP {    // defined operations

    NOP  // No operation
    
  , MOV  // MOV rs,  rt: Move source register to target register.
  , LD   // LD  val, rt: Load value to target register
  , LDR  // LDR mem, rt: Load value of memory address to target register.
  , LDM  // LDM rs, mem: Save target register to memory address.
  
  , ADD  // ADD rs, rt: Add source register to target register.
  , SUB  // SUB rs, rt: Subtract source register from target register.
  , MUL  // MUL rs, rt: Multiply source register to target register.
  , DIV  // DIV rs, rt: Divide target register by source register. 
  , MOD  // MOD rs, rt: Divide target register modulo by source register.
  , INC  // INC rt
  , DEC  // DEC rt
  
  , AND  // AND rs, rt:  rt &= rs
  , OR   // OR  rs, rt:  rt |= rs
  , XOR  // XOR rs, rt:  rt ^= rs
  , NOT
  
  , ROR  // ROR rs, rt: Right rotation of rt by value of rs.
  , ROL  // ROL rs, rt: Left rotation of rt by value of rs.
  , SR   // SR  rs, rt: Right shift of rt by value of rs.
  , SL   // SL  ls, rt, Left shift of rt by value of rs
  
  , JMP  // JMP mem: Loads ip with value of memory address.
  , JZ   // JZ  mem: Loads ip with value of memory address if zero flag == 0
  , JNZ  // JNZ mem: Loads ip with value of memory address if zero flag == 1
  , CMP  // CMP rx, ry: If rx == ry, the cmp-flag == 1.
  , GT   // GT rx, ry: If rx is greater than ry, cmp-flag == 1.
  , LT   // LT rx, ry: If rx is greater than ry, cmp-flag == 1.
  
  , POP   // POP  rs:  // Remove stack value and store into register.
  , PUSH  // PUSH rt:  // Add register value to stack.
  , PEEK  // PEEK rt:  // Load stack value into register.
};
Last edited on
a simple assembler should be able to do it on a per-statement parse and generate. It depends on how complicated you want your language to be, though. If you get exotic, you need to become more and more like a compiler (has to look at the code holistically instead of line by line).

pure jumps are generally just moving the instruction pointer (you need to know the offset/address to go to here); if that is a sub-routine, you also store the register state, set it up for the routine, and restore it after it pops back.

you usually only need 1 compare statement. 3 possible results. extras don't hurt, of course.

you usually want a push-all and pop-all (again, for subroutine type work).

I have not done anything like this in a long while, but last time I did I just stored identifier/address pairs and tracked them carefully. If you support pointers and references you need to be able to resolve multiple things to one address and change a token from one address to another etc. If you support it, you may have to deal with multiple symbols of the same name in different scopes. It can get nasty, or you can disallow some features to keep it simpler.

if you keep your assembly keywords small, you can shovel them into a 64 bit integer (8 letters supported) and use that in your switch.
Last edited on
Thanks for your advices.
I think not that the virtual machine will ever compete with any real virtual machine in the wilderness. I write it just for testing some stuff at Donald Knuth's book 'The Art Of Computer Programming, Vol 1: Fundamental Algorithms'.
Topic archived. No new replies allowed.