|
| 1 | +#set page( |
| 2 | + paper: "a4", |
| 3 | + header: align(right)[The Soil Specification v1.0], |
| 4 | + numbering: "1", |
| 5 | +) |
| 6 | +#set heading( |
| 7 | + numbering: "1." |
| 8 | +) |
| 9 | + |
| 10 | +#align(center, text(17pt)[*The Soil Specification*]) |
| 11 | + |
| 12 | +Version: 1.0 |
| 13 | + |
| 14 | +#outline() |
| 15 | + |
| 16 | += Introduction <intro> |
| 17 | + |
| 18 | +Soil is a general-purpose, register-based instruction set and bytecode. |
| 19 | + |
| 20 | +== Preliminaries <prelims> |
| 21 | + |
| 22 | +Soil uses two types of integer values. *Bytes* are 8 bits and *words* are 64 bits. |
| 23 | +All integers are encoded as little-endian. |
| 24 | + |
| 25 | +Floating point values in the Soil VM are 64-bit values according to IEEE-754. |
| 26 | + |
| 27 | +Strings are encoded as UTF-8 and never null-terminated. |
| 28 | + |
| 29 | +In the specification, list types are used as an abstraction. A `List<Type>` is encoded |
| 30 | +as multiple instances of *Type* directly after each other in memory. |
| 31 | + |
| 32 | +== About this document |
| 33 | + |
| 34 | +This document provides a specification of the Soil VM. @structures defines the data structures |
| 35 | +and encodings used by a Soil VM. @model describes the execution model |
| 36 | +and environment of a Soil VM. @instructions provides an overview of all instructions defined by Soil |
| 37 | +and their behavior. |
| 38 | + |
| 39 | += Data Structures <structures> |
| 40 | + |
| 41 | +== Programs <programs> |
| 42 | + |
| 43 | +A Soil VM executes _programs_. A program consists of one or more sections, as defined in @sections. |
| 44 | +A valid Soil program contains exactly one bytecode section. |
| 45 | + |
| 46 | +=== Encoding |
| 47 | + |
| 48 | +A program is encoded as follows: |
| 49 | + |
| 50 | +#table( |
| 51 | + columns: (auto, auto, auto, 1fr), |
| 52 | + table.header([*Bytes*], [*Field*], [*Type*], [*Notes*]), |
| 53 | + [00 .. 03], [`"soil"`], [`String`], [Magic Bytes to identify a Soil program.], |
| 54 | + [04 .. n], [Sections], [`List<Section>`], [] |
| 55 | +) |
| 56 | + |
| 57 | +== Sections <sections> |
| 58 | + |
| 59 | +A Soil program is organized into sections. Soil supports four types of sections: |
| 60 | + |
| 61 | +#table( |
| 62 | + columns: (auto, auto, 1fr), |
| 63 | + table.header([*Section Kind*], [*Encoding*], [*Description*]), |
| 64 | + [Bytecode], [0], [Contains a list of Soil instructions], |
| 65 | + [Initial Memory], [1], [Contains the memory the VM is initialized with], |
| 66 | + [Name], [2], [Provides a name for the Soil program], |
| 67 | + [Labels], [3], [Contains a list of labels (cf. @labels)], |
| 68 | + [Description], [4], [Provides a description for the Soil program], |
| 69 | +) |
| 70 | + |
| 71 | +A program may contain *at most* one section of each kind. |
| 72 | + |
| 73 | +=== Labels <labels> |
| 74 | + |
| 75 | +A label is a combination of a string and an offset into the bytecode. |
| 76 | + |
| 77 | +==== Encoding |
| 78 | + |
| 79 | +#table( |
| 80 | + columns: (auto, auto, 1fr), |
| 81 | + table.header([*Bytes*], [*Field*], [*Type*]), |
| 82 | + [00 .. 07], [Offset], [`Word`], |
| 83 | + [08 .. 15], [Length], [`Word`], |
| 84 | + [16 .. 16 + Length ], [Label], [`String`] |
| 85 | +) |
| 86 | + |
| 87 | +=== Encoding |
| 88 | + |
| 89 | +A section is encoded as follows: |
| 90 | + |
| 91 | +#table( |
| 92 | + columns: (auto, auto, auto, 1fr), |
| 93 | + table.header([*Bytes*], [*Field*], [*Type*], [*Notes*]), |
| 94 | + [00 .. 01], [Kind], [`Byte`], [], |
| 95 | + [02 .. 09], [Length], [`Word`], [], |
| 96 | + [10 .. 10 + Length], [Content], [`List<Byte>`], [Depending on the kind, this may be a `List<Instruction>`, `List<Label>` or `String`] |
| 97 | +) |
| 98 | + |
| 99 | += VM Model <model> |
| 100 | + |
| 101 | +A Soil VM provides a set of registers and byte-addressable memory. |
| 102 | + |
| 103 | +=== Registers <registers> |
| 104 | + |
| 105 | +A register in Soil is word-sized. A Soil VM has the following registers: |
| 106 | + |
| 107 | +#table( |
| 108 | + columns: (auto, auto, 1fr), |
| 109 | + table.header([*Name*], [*Encoding*], [*Purpose*]), |
| 110 | + [SP], [0], [Stack Pointer], |
| 111 | + [ST], [1], [Status Register], |
| 112 | + [A], [2], [General Purpose], |
| 113 | + [B], [3], [General Purpose], |
| 114 | + [C], [4], [General Purpose], |
| 115 | + [D], [5], [General Purpose], |
| 116 | + [E], [6], [General Purpose], |
| 117 | + [F], [7], [General Purpose], |
| 118 | +) |
| 119 | + |
| 120 | +=== Memory <memory> |
| 121 | + |
| 122 | +Memory is a zero-initialized `List<Byte>`. The size of the memory is implementation-defined. |
| 123 | +Accessing an out-of-bounds memory address is illegal and causes a VM panic. |
| 124 | + |
| 125 | +=== Execution Model |
| 126 | + |
| 127 | +==== Initialization |
| 128 | + |
| 129 | +1. The VM initializes the memory (cf. @memory) and registers to zero. |
| 130 | +2. If the program defines initial memory, the VM copies the initial memory definition into its own memory. If the VM's memory is smaller than the initial memory, the VM must panic. |
| 131 | +3. The VM initialized the `SP` register to the size of its memory. |
| 132 | + |
| 133 | +==== Execution |
| 134 | + |
| 135 | +Conceptually, the Soil VM has an instruction pointer that is initially at the start of a program's |
| 136 | +bytecode section. The instruction pointer advances until the VM has read a full instruction and executes |
| 137 | +it. Generally, instructions are executed linearly one after another. However, control flow instructions |
| 138 | +(cf. @instructions) may change the instruction pointer's location in the bytecode. |
| 139 | + |
| 140 | +=== System Calls |
| 141 | + |
| 142 | += Instructions <instructions> |
| 143 | + |
| 144 | +Soil supports the following instructions: |
| 145 | + |
| 146 | +#table( |
| 147 | + columns: (auto, auto, auto, auto, 1fr), |
| 148 | + table.header([*Instruction*], [*Opcode*], [*Operand 1*], [*Operand 2*], [*Description*]), |
| 149 | +[`nop`], [0x00], [-], [-], [Does nothing.], |
| 150 | +[`panic`], [0xe0], [-], [-], [End program execution with an error.], |
| 151 | +[`trystart`], [0xe1], [`catch:word`], [-], [If a panic occurs, catches it, resets `sp`, and jumps to the `catch` address.], |
| 152 | +[`tryend`], [0xe2], [-], [-], [Ends a scope started by `trystart`.], |
| 153 | +[`move`], [0xd0], [`to:reg`], [`from:reg`], [Sets `to` to `from`.], |
| 154 | +[`movei`], [0xd1], [`to:reg`], [`value:word`], [Sets `to` to `from`.], |
| 155 | +[`moveib`], [0xd2], [`to:reg`], [`value:word`], [Sets `to` to `from`.], |
| 156 | +[`load`], [0xd3], [`to:reg`], [`from:reg`], [Interprets `from` as an address and sets `to` to the word at that address in memory.], |
| 157 | +[`loadb`], [0xd4], [`to:reg`], [`from:reg`], [Interprets `from` as an address and sets `to` to the byte at that address in memory.], |
| 158 | +[`store`], [0xd5], [`to:reg`], [`from:reg`], [Interprets `to` as an address and sets the 64 bits at that address in memory to `from`.], |
| 159 | +[`storeb`], [0xd6], [`to:reg`], [`from:reg`], [Interprets `to` as an address and sets the 8 bits at that address in memory to `from`.], |
| 160 | +[`push`], [0xd7], [`reg:reg`], [-], [Decreases `sp` by 8, then runs `store sp reg`.], |
| 161 | +[`pop`], [0xd8], [`reg:reg`], [-], [Runs `load reg sp`, then increases `sp` by 8.], |
| 162 | +[`jump`], [0xf0], [`to:word`], [-], [Continues executing at the `to`th byte.], |
| 163 | +[`cjump`], [0xf1], [`to:word`], [-], [Runs `jump to` if `st` is not 0.], |
| 164 | +[`call`], [0xf2], [`target:word`], [-], [Runs `jump target`. Saves the formerly next instruction on an internal stack so that `ret` returns.], |
| 165 | +[`ret`], [0xf3], [-], [-], [Returns to the instruction after the matching `call`.], |
| 166 | +[`syscall`], [0xf4], [`number:byte`], [-], [Performs a syscall. Behavior depends on the syscall. The syscall can access all registers and memory.], |
| 167 | +[`cmp`], [0xc0], [`left:reg`], [`right:reg`], [Saves `left` - `right` in `st`.], |
| 168 | +[`isequal`], [0xc1], [-], [-], [If `st` is 0, sets `st` to 1, otherwise to 0.], |
| 169 | +[`isless`], [0xc2], [-], [-], [If `st` is less than 0, sets `st` to 1; otherwise, sets it to 0.], |
| 170 | +[`isgreater`], [0xc3], [-], [-], [If `st` is greater than 0, sets `st` to 1; otherwise, sets it to 0.], |
| 171 | +[`islessequal`], [0xc4], [-], [-], [If `st` is 0 or less, sets `st` to 1; otherwise, sets it to 0.], |
| 172 | +[`isgreaterequal`], [0xc5], [-], [-], [If `st` is 0 or greater, sets `st` to 1; otherwise, sets it to 0.], |
| 173 | +[`isnotequal`], [0xc6], [-], [-], [If `st` is 0, sets `st` to 0; otherwise, sets it to 1.], |
| 174 | +[`fcmp`], [0xc7], [`left:regt`], [`right:reg`], [Compares `left` and `right` by subtracting `right` from `left` and saving the result in `st`.], |
| 175 | +[`fisequal`], [0xc8], [-], [-], [If `st` is 0, sets `st` to 1; otherwise, sets it to 0.], |
| 176 | +[`fisless`], [0xc9], [-], [-], [If `st` is less than 0, sets `st` to 1; otherwise, sets it to 0.], |
| 177 | +[`fisgreater`], [0xca], [-], [-], [If `st` is greater than 0, sets `st` to 1; otherwise, sets it to 0.], |
| 178 | +[`fislessequal`], [0xcb], [-], [-], [If `st` is 0 or less, sets `st` to 1; otherwise, sets it to 0.], |
| 179 | +[`fisgreaterequal`], [0xcc], [-], [-], [If `st` is 0 or greater, sets `st` to 1; otherwise, sets it to 0.], |
| 180 | +[`fisnotequal`], [0xcd], [-], [-], [If `st` is 0, sets `st` to 0; otherwise, sets it to 1.], |
| 181 | +[`inttofloat`], [0xce], [`reg:reg`], [-], [Interprets `reg` as an integer and sets it to a float of about the same value. TODO: specify edge cases.], |
| 182 | +[`floattoint`], [0xcf], [`reg:reg`], [-], [Interprets `reg` as a float and sets it to its integer representation, rounded down. TODO: specify edge cases.], |
| 183 | +[`add`], [0xa0], [`to:reg`], [`from:reg`], [Adds `from` to `to`.], |
| 184 | +[`sub`], [0xa1], [`to:reg`], [`from:reg`], [Subtracts `from` from `to`.], |
| 185 | +[`mul`], [0xa2], [`to:reg`], [`from:reg`], [Multiplies `from` and `to`, saving the result in `to`.], |
| 186 | +[`div`], [0xa3], [`dividend:reg`], [`divisor:reg`], [Divides `dividend` by `divisor`, saving the quotient in `dividend`.], |
| 187 | +[`rem`], [0xa4], [`dividend:reg`], [`divisor:reg`], [Divides `dividend` by `divisor`, saving the remainder in `dividend`.], |
| 188 | +[`fadd`], [0xa5], [`to:reg`], [`from:reg`], [Adds `from` to `to`, interpreting both as floats.], |
| 189 | +[`fsub`], [0xa6], [`to:reg`], [`from:reg`], [Subtracts `from` from `to`, interpreting both as floats.], |
| 190 | +[`fmul`], [0xa7], [`to:reg`], [`from:reg`], [Multiplies `from` and `to`, interpreting both as floats, and saves the result in `to`.], |
| 191 | +[`fdiv`], [0xa8], [`dividend:reg`], [`divisor:reg`], [Divides `dividend` by `divisor`, interpreting both as floats, and saves the quotient in `dividend`.], |
| 192 | +[`and`], [0xb0], [`to:reg`], [`from:reg`], [Performs a binary AND on `to` and `from`, saving the result in `to`.], |
| 193 | +[`or`], [0xb1], [`to:reg`], [`from:reg`], [Performs a binary OR on `to` and `from`, saving the result in `to`.], |
| 194 | +[`xor`], [0xb2], [`to:reg`], [`from:reg`], [Performs a binary XOR on `to` and `from`, saving the result in `to`.], |
| 195 | +[`not`], [0xb3], [`to:reg`], [-], [Inverts the bits of `to`.] |
| 196 | +) |
| 197 | + |
| 198 | +== Encoding |
| 199 | + |
| 200 | +Instructions are encoded as follows: |
| 201 | + |
| 202 | +#table( |
| 203 | + columns: (auto, auto, auto, 1fr), |
| 204 | + table.header([*Bytes*], [*Field*], [*Type*], [*Notes*]), |
| 205 | + [00 .. 01], [Opcode], [`Byte`], [], |
| 206 | + [02 .. n], [Operands], [Operands], [], |
| 207 | +) |
| 208 | + |
| 209 | +A Soil instruction may have between zero and two operands. For one instruction, the number and types |
| 210 | +of operands are always the same. If an instruction operates on two registers, they are encoded in a |
| 211 | +single byte with the first register encoded in the lower four bits and the second register encoded in |
| 212 | +the upper four bits. |
0 commit comments