Skip to content

Commit 9811e6b

Browse files
authored
Rewrite the Soil specification (#2)
* add clarification on data types * initial version of soil spec
1 parent 808e77e commit 9811e6b

File tree

3 files changed

+231
-7
lines changed

3 files changed

+231
-7
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,5 @@ zig/zig-cache
1212
zig/zig-out
1313

1414
rust/target
15+
16+
*.pdf

spec.md

+17-7
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@ This gives Soil implementations the freedom to compile the byte code to machine
99

1010
Soil binaries are files that contain byte code and initial memory.
1111

12+
## Data Types
13+
14+
Soil uses two types of integers. A _byte_ is 8 bits and a _word_ is 8 bytes. Data types in Soil
15+
are little endian. Registers and pointers are word-sized.
16+
1217
## Registers
1318

1419
Soil has 8 registers, all of which hold 64 bits.
@@ -24,19 +29,24 @@ Soil has 8 registers, all of which hold 64 bits.
2429
| `e` | general-purpose register |
2530
| `f` | general-purpose register |
2631

27-
Initially, `sp` is the memory size.
28-
All other registers are zero.
32+
Initially, `sp` holds the memory size.
33+
All other registers are initialized to zero.
2934

3035
## Memory
3136

32-
It also has byte-addressed memory.
33-
For now, the size of the memory is hardcoded to something big.
37+
Soil uses byte-addressed memory.
38+
The memory size is defined by the VM implementation.
39+
40+
## System Calls
41+
42+
The Soil VM is sandboxed and only allows interactions with the outside world via system calls ("syscalls").
43+
Syscalls are identified by a byte-sized syscall number. The canonical set of syscalls is defined in [another part of the specification](syscalls.md).
3444

35-
## Byte Code
45+
## Instruction Set
3646

3747
Byte code consists of a sequence of instructions.
3848

39-
Soil runs the instructions in sequence, starting from the first.
49+
Soil executes the instructions in sequence, starting from the first.
4050
Some instructions alter control flow by jumping to other instructions.
4151

4252
These are all instructions:
@@ -47,7 +57,7 @@ Does nothing.
4757

4858
### `panic`
4959

50-
Panics.
60+
End program execution with an error.
5161

5262
### `trystart catch:word`
5363

spec.typ

+212
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
#set page(
2+
paper: "a4",
3+
header: align(right)[The Soil Specification v1.0],
4+
numbering: "1",
5+
)
6+
#set heading(
7+
numbering: "1."
8+
)
9+
10+
#align(center, text(17pt)[*The Soil Specification*])
11+
12+
Version: 1.0
13+
14+
#outline()
15+
16+
= Introduction <intro>
17+
18+
Soil is a general-purpose, register-based instruction set and bytecode.
19+
20+
== Preliminaries <prelims>
21+
22+
Soil uses two types of integer values. *Bytes* are 8 bits and *words* are 64 bits.
23+
All integers are encoded as little-endian.
24+
25+
Floating point values in the Soil VM are 64-bit values according to IEEE-754.
26+
27+
Strings are encoded as UTF-8 and never null-terminated.
28+
29+
In the specification, list types are used as an abstraction. A `List<Type>` is encoded
30+
as multiple instances of *Type* directly after each other in memory.
31+
32+
== About this document
33+
34+
This document provides a specification of the Soil VM. @structures defines the data structures
35+
and encodings used by a Soil VM. @model describes the execution model
36+
and environment of a Soil VM. @instructions provides an overview of all instructions defined by Soil
37+
and their behavior.
38+
39+
= Data Structures <structures>
40+
41+
== Programs <programs>
42+
43+
A Soil VM executes _programs_. A program consists of one or more sections, as defined in @sections.
44+
A valid Soil program contains exactly one bytecode section.
45+
46+
=== Encoding
47+
48+
A program is encoded as follows:
49+
50+
#table(
51+
columns: (auto, auto, auto, 1fr),
52+
table.header([*Bytes*], [*Field*], [*Type*], [*Notes*]),
53+
[00 .. 03], [`"soil"`], [`String`], [Magic Bytes to identify a Soil program.],
54+
[04 .. n], [Sections], [`List<Section>`], []
55+
)
56+
57+
== Sections <sections>
58+
59+
A Soil program is organized into sections. Soil supports four types of sections:
60+
61+
#table(
62+
columns: (auto, auto, 1fr),
63+
table.header([*Section Kind*], [*Encoding*], [*Description*]),
64+
[Bytecode], [0], [Contains a list of Soil instructions],
65+
[Initial Memory], [1], [Contains the memory the VM is initialized with],
66+
[Name], [2], [Provides a name for the Soil program],
67+
[Labels], [3], [Contains a list of labels (cf. @labels)],
68+
[Description], [4], [Provides a description for the Soil program],
69+
)
70+
71+
A program may contain *at most* one section of each kind.
72+
73+
=== Labels <labels>
74+
75+
A label is a combination of a string and an offset into the bytecode.
76+
77+
==== Encoding
78+
79+
#table(
80+
columns: (auto, auto, 1fr),
81+
table.header([*Bytes*], [*Field*], [*Type*]),
82+
[00 .. 07], [Offset], [`Word`],
83+
[08 .. 15], [Length], [`Word`],
84+
[16 .. 16 + Length ], [Label], [`String`]
85+
)
86+
87+
=== Encoding
88+
89+
A section is encoded as follows:
90+
91+
#table(
92+
columns: (auto, auto, auto, 1fr),
93+
table.header([*Bytes*], [*Field*], [*Type*], [*Notes*]),
94+
[00 .. 01], [Kind], [`Byte`], [],
95+
[02 .. 09], [Length], [`Word`], [],
96+
[10 .. 10 + Length], [Content], [`List<Byte>`], [Depending on the kind, this may be a `List<Instruction>`, `List<Label>` or `String`]
97+
)
98+
99+
= VM Model <model>
100+
101+
A Soil VM provides a set of registers and byte-addressable memory.
102+
103+
=== Registers <registers>
104+
105+
A register in Soil is word-sized. A Soil VM has the following registers:
106+
107+
#table(
108+
columns: (auto, auto, 1fr),
109+
table.header([*Name*], [*Encoding*], [*Purpose*]),
110+
[SP], [0], [Stack Pointer],
111+
[ST], [1], [Status Register],
112+
[A], [2], [General Purpose],
113+
[B], [3], [General Purpose],
114+
[C], [4], [General Purpose],
115+
[D], [5], [General Purpose],
116+
[E], [6], [General Purpose],
117+
[F], [7], [General Purpose],
118+
)
119+
120+
=== Memory <memory>
121+
122+
Memory is a zero-initialized `List<Byte>`. The size of the memory is implementation-defined.
123+
Accessing an out-of-bounds memory address is illegal and causes a VM panic.
124+
125+
=== Execution Model
126+
127+
==== Initialization
128+
129+
1. The VM initializes the memory (cf. @memory) and registers to zero.
130+
2. If the program defines initial memory, the VM copies the initial memory definition into its own memory. If the VM's memory is smaller than the initial memory, the VM must panic.
131+
3. The VM initialized the `SP` register to the size of its memory.
132+
133+
==== Execution
134+
135+
Conceptually, the Soil VM has an instruction pointer that is initially at the start of a program's
136+
bytecode section. The instruction pointer advances until the VM has read a full instruction and executes
137+
it. Generally, instructions are executed linearly one after another. However, control flow instructions
138+
(cf. @instructions) may change the instruction pointer's location in the bytecode.
139+
140+
=== System Calls
141+
142+
= Instructions <instructions>
143+
144+
Soil supports the following instructions:
145+
146+
#table(
147+
columns: (auto, auto, auto, auto, 1fr),
148+
table.header([*Instruction*], [*Opcode*], [*Operand 1*], [*Operand 2*], [*Description*]),
149+
[`nop`], [0x00], [-], [-], [Does nothing.],
150+
[`panic`], [0xe0], [-], [-], [End program execution with an error.],
151+
[`trystart`], [0xe1], [`catch:word`], [-], [If a panic occurs, catches it, resets `sp`, and jumps to the `catch` address.],
152+
[`tryend`], [0xe2], [-], [-], [Ends a scope started by `trystart`.],
153+
[`move`], [0xd0], [`to:reg`], [`from:reg`], [Sets `to` to `from`.],
154+
[`movei`], [0xd1], [`to:reg`], [`value:word`], [Sets `to` to `from`.],
155+
[`moveib`], [0xd2], [`to:reg`], [`value:word`], [Sets `to` to `from`.],
156+
[`load`], [0xd3], [`to:reg`], [`from:reg`], [Interprets `from` as an address and sets `to` to the word at that address in memory.],
157+
[`loadb`], [0xd4], [`to:reg`], [`from:reg`], [Interprets `from` as an address and sets `to` to the byte at that address in memory.],
158+
[`store`], [0xd5], [`to:reg`], [`from:reg`], [Interprets `to` as an address and sets the 64 bits at that address in memory to `from`.],
159+
[`storeb`], [0xd6], [`to:reg`], [`from:reg`], [Interprets `to` as an address and sets the 8 bits at that address in memory to `from`.],
160+
[`push`], [0xd7], [`reg:reg`], [-], [Decreases `sp` by 8, then runs `store sp reg`.],
161+
[`pop`], [0xd8], [`reg:reg`], [-], [Runs `load reg sp`, then increases `sp` by 8.],
162+
[`jump`], [0xf0], [`to:word`], [-], [Continues executing at the `to`th byte.],
163+
[`cjump`], [0xf1], [`to:word`], [-], [Runs `jump to` if `st` is not 0.],
164+
[`call`], [0xf2], [`target:word`], [-], [Runs `jump target`. Saves the formerly next instruction on an internal stack so that `ret` returns.],
165+
[`ret`], [0xf3], [-], [-], [Returns to the instruction after the matching `call`.],
166+
[`syscall`], [0xf4], [`number:byte`], [-], [Performs a syscall. Behavior depends on the syscall. The syscall can access all registers and memory.],
167+
[`cmp`], [0xc0], [`left:reg`], [`right:reg`], [Saves `left` - `right` in `st`.],
168+
[`isequal`], [0xc1], [-], [-], [If `st` is 0, sets `st` to 1, otherwise to 0.],
169+
[`isless`], [0xc2], [-], [-], [If `st` is less than 0, sets `st` to 1; otherwise, sets it to 0.],
170+
[`isgreater`], [0xc3], [-], [-], [If `st` is greater than 0, sets `st` to 1; otherwise, sets it to 0.],
171+
[`islessequal`], [0xc4], [-], [-], [If `st` is 0 or less, sets `st` to 1; otherwise, sets it to 0.],
172+
[`isgreaterequal`], [0xc5], [-], [-], [If `st` is 0 or greater, sets `st` to 1; otherwise, sets it to 0.],
173+
[`isnotequal`], [0xc6], [-], [-], [If `st` is 0, sets `st` to 0; otherwise, sets it to 1.],
174+
[`fcmp`], [0xc7], [`left:regt`], [`right:reg`], [Compares `left` and `right` by subtracting `right` from `left` and saving the result in `st`.],
175+
[`fisequal`], [0xc8], [-], [-], [If `st` is 0, sets `st` to 1; otherwise, sets it to 0.],
176+
[`fisless`], [0xc9], [-], [-], [If `st` is less than 0, sets `st` to 1; otherwise, sets it to 0.],
177+
[`fisgreater`], [0xca], [-], [-], [If `st` is greater than 0, sets `st` to 1; otherwise, sets it to 0.],
178+
[`fislessequal`], [0xcb], [-], [-], [If `st` is 0 or less, sets `st` to 1; otherwise, sets it to 0.],
179+
[`fisgreaterequal`], [0xcc], [-], [-], [If `st` is 0 or greater, sets `st` to 1; otherwise, sets it to 0.],
180+
[`fisnotequal`], [0xcd], [-], [-], [If `st` is 0, sets `st` to 0; otherwise, sets it to 1.],
181+
[`inttofloat`], [0xce], [`reg:reg`], [-], [Interprets `reg` as an integer and sets it to a float of about the same value. TODO: specify edge cases.],
182+
[`floattoint`], [0xcf], [`reg:reg`], [-], [Interprets `reg` as a float and sets it to its integer representation, rounded down. TODO: specify edge cases.],
183+
[`add`], [0xa0], [`to:reg`], [`from:reg`], [Adds `from` to `to`.],
184+
[`sub`], [0xa1], [`to:reg`], [`from:reg`], [Subtracts `from` from `to`.],
185+
[`mul`], [0xa2], [`to:reg`], [`from:reg`], [Multiplies `from` and `to`, saving the result in `to`.],
186+
[`div`], [0xa3], [`dividend:reg`], [`divisor:reg`], [Divides `dividend` by `divisor`, saving the quotient in `dividend`.],
187+
[`rem`], [0xa4], [`dividend:reg`], [`divisor:reg`], [Divides `dividend` by `divisor`, saving the remainder in `dividend`.],
188+
[`fadd`], [0xa5], [`to:reg`], [`from:reg`], [Adds `from` to `to`, interpreting both as floats.],
189+
[`fsub`], [0xa6], [`to:reg`], [`from:reg`], [Subtracts `from` from `to`, interpreting both as floats.],
190+
[`fmul`], [0xa7], [`to:reg`], [`from:reg`], [Multiplies `from` and `to`, interpreting both as floats, and saves the result in `to`.],
191+
[`fdiv`], [0xa8], [`dividend:reg`], [`divisor:reg`], [Divides `dividend` by `divisor`, interpreting both as floats, and saves the quotient in `dividend`.],
192+
[`and`], [0xb0], [`to:reg`], [`from:reg`], [Performs a binary AND on `to` and `from`, saving the result in `to`.],
193+
[`or`], [0xb1], [`to:reg`], [`from:reg`], [Performs a binary OR on `to` and `from`, saving the result in `to`.],
194+
[`xor`], [0xb2], [`to:reg`], [`from:reg`], [Performs a binary XOR on `to` and `from`, saving the result in `to`.],
195+
[`not`], [0xb3], [`to:reg`], [-], [Inverts the bits of `to`.]
196+
)
197+
198+
== Encoding
199+
200+
Instructions are encoded as follows:
201+
202+
#table(
203+
columns: (auto, auto, auto, 1fr),
204+
table.header([*Bytes*], [*Field*], [*Type*], [*Notes*]),
205+
[00 .. 01], [Opcode], [`Byte`], [],
206+
[02 .. n], [Operands], [Operands], [],
207+
)
208+
209+
A Soil instruction may have between zero and two operands. For one instruction, the number and types
210+
of operands are always the same. If an instruction operates on two registers, they are encoded in a
211+
single byte with the first register encoded in the lower four bits and the second register encoded in
212+
the upper four bits.

0 commit comments

Comments
 (0)