Writing a virtual machine

23. April 2017

In the last couple months, my main focus was on writing a virtual machine from scratch. What I wanted was a fully turing-complete machine which would support the following features right out of the box:

Besides the actual instruction set, I’ve also created my own version of an assembly language. The syntax of the language is strongly inspired by the AVR assembly language.

Overview

The general principle behind a virtual machine is pretty easy to grasp. What you have is a list of instructions, some registers of a given size and memory. Not all virtual machines have the same amount of registers and not all registers have the same meaning assigned to them.

One of the registers is the ip (Instruction Pointer) register. It points to the current instruction. At the beginning of each cycle, the machine reads an opcode from the address pointed to by the ip register. It then runs the assigned task of that instruction.

The meaning of each opcode is completly up to the semantic design of the machine. One could assign any meaning to an opcode.

Encoding

The executable format the machine expects is divided into two parts.

+--------------------------------------------------------+
| - Header section                                       |
|                                                        |
| magic            : ascii encoded NICE                  |
| entry_addr       : initial value of the ip register    |
| load_table_size  : number of entries in the load table |
| load_table       : load table                          |
+--------------------------------------------------------+
| - Segments section                                     |
+--------------------------------------------------------+

Each entry consts of three integers.

+--------------------------------------------------------+
| - Load table entry                                     |
|                                                        |
| offset  : offset into the executables segments section |
| size    : size of the segment                          |
| address : target address in the machines memory        |
+--------------------------------------------------------+

Given the following assembly file:

.org 0x500
.label entry_addr
  loadi r0, 25
  loadi r1, 25
  add r0, r0, r1

.org 0x600
.db msg_welcome 11 "hello world"

The machine would load the three instructions onto address 0x500 and the "hello world" constant at address 0x600.

Registers

Available registers:

The machine’s registers are able to hold a 64-bit value. By default however, only the lower 32-bits are targeted. The below code reads 16 bytes from a buffer at 0x500 and writes each dword into the registers r0 to r3. It also demonstrates the automatic label resolution and offset calculation features in the assembler.

.def BUFFER_SIZE 16

.label entry_addr
  readc r0, (buffer + 0)
  readc r1, (buffer + 4)
  readc r2, (buffer + 8)
  readc r3, (buffer + 12)

.org 0x500
.db buffer BUFFER_SIZE 0

Virtual Display

The last 38’800 bytes in memory are reserved for VRAM. The monitor process, which is also built into the vm, displays the contents of this area in memory in color. The monitor supports 255 different colors. A single byte encodes the color of a single pixel as rrrgggbb.

Virtual Display

Roadmap

In the months to come, I want to focus on writing some built-in methods for doing graphics stuff. I’d also like to experiment with writing a simple compiler to be able to use a subset of C on my vm.

I’m open for contributions of any kind and I’d love to get some feedback on the design of my vm.

The repository is on GitHub at github.com/KCreate/stackvm and is licensed under the MIT license.


Copyright © 2024 Leonard Schütz | Attributions