-
Notifications
You must be signed in to change notification settings - Fork 104
SVML Specification
The Source Virtual Machine Language defines a virtual machine with a set of instructions, and an associated JSON assembly and binary format.
It is intended as a compilation target for the Source language that is suitable not only for efficient execution on normal PCs, but also low-power machines like microcontrollers (e.g. Arduino).
All code in an SVM program exists within functions. A function (distinct from a function value) consists of
- the stack size for the function (number of entries)
- the environment size for the function (number of entries)
- the number of arguments expected by the function, which must be less than or equal to the environment size for the function
- the function's code
An SVM program consists of functions. One function is designated as the entry point of the program.
Execution of a program begins by calling the entry point with zero arguments. The result of the program is then the value returned from the entry point function. (This avoids having special instructions to start and end a program.)
Instructions are executed sequentially, unless an instruction that transfers execution to another location in the program is executed.
Functions are called using the call
or call.t
instructions. When a function
is called,
- the arguments are popped off the caller's stack, as detailed in the entries for the
call
andcall.t
instructions - a new frame is created, based on the information in the function's header
- the arguments passed by the caller are inserted into the new environment, with the first argument in slot 0 of the environment, the second in slot 1, and so on
- execution is transferred to the first instruction of the function
When a function returns,
- the function's return value is pushed onto the stack of the caller, as detailed in the entries for the
call
andcall.t
instructions - execution returns to the instruction in the caller immediately after the
call
orcall.t
instruction
Some instructions may produce faults. When a fault occurs, execution halts.
Implementations are recommended to track the reason for a fault, and provide some mechanism for debugging a fault (such as providing a stack trace, etc).
Faults are documented within each instruction's entry.
A frame consists of an operand stack (hereafter referred to as just a stack) and an environment.
A frame is created each time a function is called. When the function returns, its stack is destroyed; its environment may persist if there are new function values created that refer to its environment as their parent environment.
All SVM instructions do one of the following:
- load values from an environment onto the executing function's environment
- store values into an environment from the stack
- operate on values on the stack only
All values on stacks and in environments are either booleans, numbers, or boxed
values (which may be any of the 7 SVM types, including booleans or numbers).
Implementations are not required to be able to differentiate booleans, numbers
or boxed values on stacks and in environments, but are required to track the
types of values stored within a boxed value; this is required to implement
instructions such as add.g
that depend on the actual types of their operands.
A stack is used to store intermediate results and pass operands to instructions.
When a function is called, an empty stack is created with size as specified in the function's header.
Stacks are accessed in a strictly LIFO manner; all instructions only push or pop from the top of the executing function's stack.
An environment is used to store local variables and function arguments.
When a function is called, a environment is created with size as specified in the function's header. The arguments passed to the function are placed in the first slots in the environment, with the first argument in slot 0, the second in slot 1, and so on.
Environments are accessed only by the ldl
, ldp
, stl
and stp
instruction
families, which load and store values in environments.
The SVM recognises these distinct types:
- undefined, a singleton
- null, also a singleton
- boolean, either
true
orfalse
- number
- string
- array
- function
Implementations are recommended to implement number semantics following the IEEE 754 double-precision floating point specification where possible. As an allowance for platforms for which this would be too expensive, implementations may implement number semantics following the single-precision floating point specification instead.
Strings are arbitrary sequences of bytes (including the zero byte). SVM defines no operations on strings other than concatenation, so character encoding does not affect SVM string semantics.
Arrays are maps from non-negative integer numbers (indexes) to any value.
Loading an unassigned index results in undefined
.
Arrays have a length
property, accessed by the primitive function
array_length
, that returns one plus the highest index that has been assigned
to, or 0 if no index has been assigned to.
Note: assigning undefined
to an array index is indistinguishable except for
the effect of the assignment on the array's length
.
A function value is a tuple consisting of a pointer to the function's location in the program, and the environment in which the function value was created.
(click on the link)
There are two standard representations of a SVML program. VM implementations are free to accept the representation that works best for them.
A program is an array consisting of two values:
- the index of the entry point function in the following array
- an array of functions
A function is an array consisting of four values:
- the stack size for the function, as a number
- the environment size for the function, as a number
- the number of arguments expected by the function, as a number
- the function's code, an array of instructions
An instruction is an array consisting of:
- the opcode in position 0
- any arguments which may include numbers, boolean values, or strings
For instructions new.c
and jmp
which take <address>
es, an array consisting of the following is specified instead:
- the index of the function
- the index of the instruction in the function (for
jmp
; ignored and optional fornew.c
)
For instructions br.t
and br
which take <offset>
s, a number specifying the number of instructions to skip is specified instead.
As a TypeScript type definition:
type Offset = number; // instructions to skip
type Address = [
number, // function index
Offset // instruction index within function; optional
];
type Instruction = [
number, // opcode
number | boolean | string | // arguments
Offset | Address
];
type SVMFunction = [
number, // stack size
number, // environment size
number, // number of arguments
Instruction[] // code
];
type Program = [
number, // index of entry point function
SVMFunction[]
];
- There is no padding between any values unless explicitly specified.
- All instruction opcodes are one byte long.
- All values are in little-endian.
- We use the integer and float type names from Rust to denote operand types in instruction entries.
- E.g.
u8
refers to an 8-bit unsigned integer;i32
refers to a 32-bit signed integer;f32
refers to a 32-bit (single-precision) floating point.
- E.g.
- An
address
is a 32-bit unsigned integeru32
that refers to an offset from the start of the program. - An
offset
is a 32-bit signed integeru8
that refers to an offset from the start of the next instruction. - A structure aligned to N bytes means:
- if
address % N == 0
, then the structure may begin there - otherwise, skip
N - address % N
bytes, and then begin the structure
- if
A program is a Program
.
Field | Type |
---|---|
Header | Header |
Constant | Constant[] |
Alignment | to 4-bytes |
Functions | Function[] |
Each Constant
and Function
is aligned to 4 bytes.
Field | Type |
---|---|
Magic | u32 |
Major version | u16 |
Minor version | u16 |
Entry point | address |
Constant pool count | u32 |
- Magic is the value
0x5005ACAD
- The entry point must point to a
Function
Header
is 16 bytes.
Each Constant
should be aligned to 4 bytes.
Field | Type |
---|---|
Type | u16 |
Length | u32 |
Data | Depends on type |
Constant
is 6 + Length
bytes.
Field | Type |
---|---|
Data | u8[] |
- The length of Data is equal to Length in the constant header.
- Data must be null-terminated; the null terminator should be included in its length.
Each Function
should be aligned to 4 bytes.
Field | Type |
---|---|
Stack size | u8 |
Environment size | u8 |
Number of arguments | u8 |
Padding (alignment) | u8 |
Code | Instruction[] |
An instruction consists of the u8
opcode plus any arguments. There is no
padding or alignment between arguments. Instructions are concatenated with no
padding or alignment between instructions.
For example, the following instructions
ldc.i 123
pop.f
should result in the following (hex) bytes:
01 7B 00 00 00 10