This RFC proposes implementing a enum VM. VRL will be compiled to a list of instructions and executed by this VM, with the aim to significantly improve the performance of executing VRL programs.
This RFC is a follow on from rfcs/2021-10-14-9811-vrl-performance.md
This RFC is purely about developing a enum VM for VRL. It will discuss the risks involved in running VRL with a VM and the measures we will take to mitigate those risks.
Any other performance issues relating to VRL will be discussed in rfcs/2021-10-14-9811-vrl-performance.md
Vrl compiles to an AST that is then walked during resolution. Each node in that tree is boxed and stored in disparate regions of memory. As a result walking the tree means that the CPU caches must be constantly swapped.
Instead we can create an enum VM to store the execution of the Vrl program.
The enum is essentially a big enum of instructions:
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum OpCode {
Return,
Constant,
Negate,
Add,
Subtract,
Multiply,
Divide,
Print,
Not,
Greater,
GreaterEqual,
Less,
LessEqual,
NotEqual,
Equal,
Pop,
JumpIfFalse,
Jump,
SetPath,
GetPath,
Call,
...
}
enum Instruction {
Opcode(Opcode),
Literal(LiteralIndex),
}
pub struct LiteralIndex(usize);
The Vm is a struct comprising of the following fields:
#[derive(Clone, Debug, Default)]
pub struct Vm {
instructions: Vec<Instruction>,
constants: Vec<Literal>,
targets: Vec<Variable>,
stack: Vec<Value>,
parameter_stack: Vec<Option<Value>>,
error: Option<Error>,
instruction_pointer: usize,
}
- instructions
The instructions field is a Vec
of Instruction
. An instruction can be
either an OpCode or some data for that OpCode. For example, the instructions
[.., OpCode(Constant), Literal(12), ..]
when evaluated will load the constant
stored in the values
Vec
that is found in position 12 onto the stack.
- constants
A list of constant values found in the program. Since the bytecode only contains integers any actual values must be stored here. This also allows literals to be deduped.
- targets
A list of paths used in the program, similar to values
.
- stack
The Vm is a stack based Vm. Every expression that is evaluated pushes the result on the stack. Every operation pulls the values it uses from the stack.
- parameter_stack
For function calls, the parameters need to be evaluated and passed to the
function. The stack is a Vec
of parameter name and value. We need an OpCode
that will copy a value from the main stack to the parameter stack tagged with
the parameter name. The FunctionArgs
compiler will dump this OpCode after the
code to evaluate the given arg is dumped.
- error
If an operation errors, it needs to populate this field. Operations such as
the error assignment (result, error = thing()
) will need to check if this
field has been populated and action accordingly.
- instruction_pointer
The instruction pointer points to the next instruction to evaluate.
Calling functions in the stdlib will be a case of evaluating each parameter with the results pushed onto the parameter stack.
Since VRL allows for named parameters, parameters need to be pushed on to the
parameter stack in the order that they are specified in the Function
implementation.
Parameters are optional and may not be specified. However, to avoid passing
parameters to the wrong function, an OpCode still needs to be emitted to move
a placeholder value - None
- to the parameter stack.
Take the following hypothetical situation:
thing(surname: "nong", name: thang(name: "nork"))
Supposing function thang
also took an optional parameter surname
, when
thang
is called, we will have a surname
parameter on the parameter stack.
Without the placeholder, thang
would thing that surname
was being passed to
it. If the placeholder is pushed, it would consume this instead.
We do not want stdlib
functions to have access to the VM since that risks a
rogue function destroying the VM's state.
Current stdlib
functions are composed of a combination of compile
and
resolve
functions. These function will need to be combined into a single
function call
.
The ArgumentList
parameter that is passed into call
will have access to the
parameter stack and the parameter list exposed by the Function
. This can use
these to return the appropriate values for the required
, optional
etc..
functions.
Since the compile
function was also used by some stdlib
functions to
validate the parameters we may also need a function validate_parameters
that
these functions can use to perform extra validation on the parameters.
We need debugging tools to help us introspect and debug the compiled bytecode.
Output the compiled code in readable format to the console.
Output each step of the VM to the console, print the Stack and other state variables in the VM. Allow the user to press a key to move to the next step.
Check each instruction to ensure:
- each Jump Opcode jumps to a valid location in the bytecode.
- each constant references a valid index in the constants list
- each path index references a valid path.
It should be possible to run the linter after every compile (at Vector boot time), so the linter should run relatively fast.
The VM will need thorough testing since it is possible to generate invalid bytecode. For example, the following bytecode is invalid:
OpCode(CONSTANT)
Primitive(1)
OpCode(ADD)
The ADD
OpCode expects there to be two values on the stack to add
together. We should never get into the situation where the VRL compiler can
generate this.
With property testing we can generate a vast combination of valid VRL ASTs, compile this to bytecode and then run the VM to ensure we get no buffer underflows, invalid constant locations or other invalid bytecode.
Given that implementing the VM involves adding methods to the Expression trait, it should be possible to run the existing AST alongside the VM. We can then ensure that we get exactly the same results from each approach.
With each node of the AST compiled down to just a few bytes and all instructions held in contiguous memory evaluation of the program should be able to take full advantage of the CPU cache which should result in much faster execution.
Downsides to using a Vm:
-
The code is a bit more complex. With an AST that is walked it is fairly apparent what the code will be doing at any point. With a Vm, this is not the case, it is harder to look at the instructions in the Vm and follow back to what part of the VRL code is being evaluated. We will need to write some extensive debugging tools to allow for decent introspection into the Vm.
However, VRL is a very simple language, which does also mean the VM will be simple.
-
We lose some safety that we get from the Rust compiler. There will need to be significant fuzz testing to ensure that the code runs correctly under all circumstances.
It should be noted however that the entire VM will not require any unsafe Rust code.
-
Currently each stdlib function is responsible for evaluating their own parameters. This allows parameters to be lazily evaluated. With the Vm, the parameters will need to be evaluated up front and the stack passed into the function. This could impact performance.
-
GoScript - An implementation of Go using a bytecode Vm, https://github.com/oxfeeefeee/goscript
-
CPython - https://github.com/python/cpython
Instead of writing our own VM we could transpile VRL code to WASM. WASM is a mature and performant VM that could give us near native speeds.
The issues with using WASM are that it would require data to be serialized in order to move between Vector and WASM. This would incur a significant performance penalty.
WASM provides a lot more functionality and complexity than is required for VRL. By developing our own VM we will be able to customize it purely for the usecase required by VRL. This allows us to implement very VRL specific OpCodes which should allow the bytecode stay simple.
The current suggestion represents each OpCode as a single instruction in the instruction list. An OpCode with it's data occupies two instructions.
For example, with the current suggestion to load a constant three things have to happen:
- Add the Constant OpCode to the instructions.
- Add the actual constant value to the list of constants.
- Add the index of the constant in the constant list to the list of constants.
These three things could be represented as a single instruction if we included the constant value in the OpCode:
enum OpCode {
...
Constant(Box<Value>)
...
}
This will increase size_of::<OpCode>
from 8 to 16, potentially doubling the
memory size of the instruction list.
It also means we do not need to store a separate list of constants. On the plus side, this is one less lookup at runtime to load the constant. On the down side, we wouldn't be able to dedupe the constant, so for example, if the code uses the same string twice, it would be represented twice in the bytecode.
Since we will never have usize
OpCodes, or usize
constants in our constant
list, we could potentially combine these two values into a single usize
and
bitmask each part.
OpCode = FromPrimitive::from_usize(instruction && (1_usize << (usize::BITS / 2)) - 1 << (usize::BITS / 2)))
data = instruction && (1_usize << (usize::BITS / 2)) - 1
This would allow us to keep size_of::<OpCode>()
to an absolute minimum.
It should be noted that the representation of the OpCodes
is an optimisation
that can occur later on without having a significant impact on the core
functionality.
The implementation needs to occur in well defined stages to prevent dumping a massive PR that never gets over the line.
- Submit a PR with spike-level code roughly demonstrating the change for the VM. here
- Incorporate (unit and property) tests.
- Document and comment the VM and the functions used to emit OpCodes.
- Develop a safe API surrounding the VM (in particular function calls).
- Implement the VM and the VRL to VM compiler for expressions (the
functions in the
stdlib
will be moved over in a later stage, the default call for these functions will just be a noop). - Test. We will be able to run both the VM and the Expression walker simultaneously which will allow us to ensure we still get the same results.
- Implement the
call
function on thestdlib
functions.
With the code as a single dimension array of Bytecode, it could be possible to scan the code for patterns and reorganise the Bytecode so it can run in a more optimal way.
A lot more thought and research needs to go into this before we can consider implementing these changes.