RFC 9811 - 2021-11-12 - VRL enum VM (VenuM)

This RFC proposes implementing a enum VM. VRL will be compiled to a list of instructions and executed by this VM, with the aim to significantly improve the performance of executing VRL programs.

Context

This RFC is a follow on from rfcs/2021-10-14-9811-vrl-performance.md

Scope

In scope

This RFC is purely about developing a enum VM for VRL. It will discuss the risks involved in running VRL with a VM and the measures we will take to mitigate those risks.

Out of scope

Any other performance issues relating to VRL will be discussed in rfcs/2021-10-14-9811-vrl-performance.md

Pain

Vrl compiles to an AST that is then walked during resolution. Each node in that tree is boxed and stored in disparate regions of memory. As a result walking the tree means that the CPU caches must be constantly swapped.

Proposal

Implementation

Instead we can create an enum VM to store the execution of the Vrl program.

The enum is essentially a big enum of instructions:

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum OpCode {
    Return,
    Constant,
    Negate,
    Add,
    Subtract,
    Multiply,
    Divide,
    Print,
    Not,
    Greater,
    GreaterEqual,
    Less,
    LessEqual,
    NotEqual,
    Equal,
    Pop,
    JumpIfFalse,
    Jump,
    SetPath,
    GetPath,
    Call,
    ...
}

enum Instruction {
    Opcode(Opcode),
    Literal(LiteralIndex),
}

pub struct LiteralIndex(usize);

The Vm is a struct comprising of the following fields:

#[derive(Clone, Debug, Default)]
pub struct Vm {
    instructions: Vec<Instruction>,
    constants: Vec<Literal>,
    targets: Vec<Variable>,
    stack: Vec<Value>,
    parameter_stack: Vec<Option<Value>>,
    error: Option<Error>,
    instruction_pointer: usize,
}

instructions

The instructions field is a Vec of Instruction. An instruction can be either an OpCode or some data for that OpCode. For example, the instructions [.., OpCode(Constant), Literal(12), ..] when evaluated will load the constant stored in the values Vec that is found in position 12 onto the stack.

constants

A list of constant values found in the program. Since the bytecode only contains integers any actual values must be stored here. This also allows literals to be deduped.

targets

A list of paths used in the program, similar to values.

stack

The Vm is a stack based Vm. Every expression that is evaluated pushes the result on the stack. Every operation pulls the values it uses from the stack.

parameter_stack

For function calls, the parameters need to be evaluated and passed to the function. The stack is a Vec of parameter name and value. We need an OpCode that will copy a value from the main stack to the parameter stack tagged with the parameter name. The FunctionArgs compiler will dump this OpCode after the code to evaluate the given arg is dumped.

error

If an operation errors, it needs to populate this field. Operations such as the error assignment (result, error = thing()) will need to check if this field has been populated and action accordingly.

instruction_pointer

The instruction pointer points to the next instruction to evaluate.

Calling functions

Calling functions in the stdlib will be a case of evaluating each parameter with the results pushed onto the parameter stack.

Since VRL allows for named parameters, parameters need to be pushed on to the parameter stack in the order that they are specified in the Function implementation.

Parameters are optional and may not be specified. However, to avoid passing parameters to the wrong function, an OpCode still needs to be emitted to move a placeholder value - None - to the parameter stack.

Take the following hypothetical situation:

thing(surname: "nong", name: thang(name: "nork"))

Supposing function thang also took an optional parameter surname, when thang is called, we will have a surname parameter on the parameter stack. Without the placeholder, thang would thing that surname was being passed to it. If the placeholder is pushed, it would consume this instead.

We do not want stdlib functions to have access to the VM since that risks a rogue function destroying the VM's state.

Current stdlib functions are composed of a combination of compile and resolve functions. These function will need to be combined into a single function call.

The ArgumentList parameter that is passed into call will have access to the parameter stack and the parameter list exposed by the Function. This can use these to return the appropriate values for the required, optional etc.. functions.

Since the compile function was also used by some stdlib functions to validate the parameters we may also need a function validate_parameters that these functions can use to perform extra validation on the parameters.

Debugging

We need debugging tools to help us introspect and debug the compiled bytecode.

Disassembler

Output the compiled code in readable format to the console.

Step debugger

Output each step of the VM to the console, print the Stack and other state variables in the VM. Allow the user to press a key to move to the next step.

Linter

Check each instruction to ensure:

each Jump Opcode jumps to a valid location in the bytecode.
each constant references a valid index in the constants list
each path index references a valid path.

It should be possible to run the linter after every compile (at Vector boot time), so the linter should run relatively fast.

Testing

The VM will need thorough testing since it is possible to generate invalid bytecode. For example, the following bytecode is invalid:

OpCode(CONSTANT)
Primitive(1)
OpCode(ADD)

The ADD OpCode expects there to be two values on the stack to add together. We should never get into the situation where the VRL compiler can generate this.

With property testing we can generate a vast combination of valid VRL ASTs, compile this to bytecode and then run the VM to ensure we get no buffer underflows, invalid constant locations or other invalid bytecode.

Given that implementing the VM involves adding methods to the Expression trait, it should be possible to run the existing AST alongside the VM. We can then ensure that we get exactly the same results from each approach.

Rationale

With each node of the AST compiled down to just a few bytes and all instructions held in contiguous memory evaluation of the program should be able to take full advantage of the CPU cache which should result in much faster execution.

Drawbacks

Downsides to using a Vm:

The code is a bit more complex. With an AST that is walked it is fairly apparent what the code will be doing at any point. With a Vm, this is not the case, it is harder to look at the instructions in the Vm and follow back to what part of the VRL code is being evaluated. We will need to write some extensive debugging tools to allow for decent introspection into the Vm.

However, VRL is a very simple language, which does also mean the VM will be simple.
We lose some safety that we get from the Rust compiler. There will need to be significant fuzz testing to ensure that the code runs correctly under all circumstances.

It should be noted however that the entire VM will not require any unsafe Rust code.
Currently each stdlib function is responsible for evaluating their own parameters. This allows parameters to be lazily evaluated. With the Vm, the parameters will need to be evaluated up front and the stack passed into the function. This could impact performance.

Prior Art

GoScript - An implementation of Go using a bytecode Vm, https://github.com/oxfeeefeee/goscript
CPython - https://github.com/python/cpython

Alternatives

WASM

Instead of writing our own VM we could transpile VRL code to WASM. WASM is a mature and performant VM that could give us near native speeds.

The issues with using WASM are that it would require data to be serialized in order to move between Vector and WASM. This would incur a significant performance penalty.

WASM provides a lot more functionality and complexity than is required for VRL. By developing our own VM we will be able to customize it purely for the usecase required by VRL. This allows us to implement very VRL specific OpCodes which should allow the bytecode stay simple.

Represent additional data in the OpCode

The current suggestion represents each OpCode as a single instruction in the instruction list. An OpCode with it's data occupies two instructions.

For example, with the current suggestion to load a constant three things have to happen:

Add the Constant OpCode to the instructions.
Add the actual constant value to the list of constants.
Add the index of the constant in the constant list to the list of constants.

Include value in the OpCode

These three things could be represented as a single instruction if we included the constant value in the OpCode:

enum OpCode {
    ...
    Constant(Box<Value>)
    ...
}

This will increase size_of::<OpCode> from 8 to 16, potentially doubling the memory size of the instruction list.

It also means we do not need to store a separate list of constants. On the plus side, this is one less lookup at runtime to load the constant. On the down side, we wouldn't be able to dedupe the constant, so for example, if the code uses the same string twice, it would be represented twice in the bytecode.

Bitmask the OpCode data

Since we will never have usize OpCodes, or usize constants in our constant list, we could potentially combine these two values into a single usize and bitmask each part.

OpCode = FromPrimitive::from_usize(instruction && (1_usize << (usize::BITS / 2)) - 1 << (usize::BITS / 2)))
data = instruction && (1_usize << (usize::BITS / 2)) - 1

This would allow us to keep size_of::<OpCode>() to an absolute minimum.

It should be noted that the representation of the OpCodes is an optimisation that can occur later on without having a significant impact on the core functionality.

Plan Of Attack

The implementation needs to occur in well defined stages to prevent dumping a massive PR that never gets over the line.

Submit a PR with spike-level code roughly demonstrating the change for the VM. here
Incorporate (unit and property) tests.
Document and comment the VM and the functions used to emit OpCodes.
Develop a safe API surrounding the VM (in particular function calls).
Implement the VM and the VRL to VM compiler for expressions (the functions in the stdlib will be moved over in a later stage, the default call for these functions will just be a noop).
Test. We will be able to run both the VM and the Expression walker simultaneously which will allow us to ensure we still get the same results.
Implement the call function on the stdlib functions.

Future Improvements

Optimization

With the code as a single dimension array of Bytecode, it could be possible to scan the code for patterns and reorganise the Bytecode so it can run in a more optimal way.

A lot more thought and research needs to go into this before we can consider implementing these changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2021-11-12-9811-vrl-vm.md

2021-11-12-9811-vrl-vm.md

RFC 9811 - 2021-11-12 - VRL enum VM (VenuM)

Context

Scope

In scope

Out of scope

Pain

Proposal

Implementation

Calling functions

Debugging

Disassembler

Step debugger

Linter

Testing

Rationale

Drawbacks

Prior Art

Alternatives

WASM

Represent additional data in the OpCode

Include value in the OpCode

Bitmask the OpCode data

Plan Of Attack

Future Improvements

Optimization

Files

2021-11-12-9811-vrl-vm.md

Latest commit

History

2021-11-12-9811-vrl-vm.md

File metadata and controls

RFC 9811 - 2021-11-12 - VRL enum VM (VenuM)

Context

Scope

In scope

Out of scope

Pain

Proposal

Implementation

Calling functions

Debugging

Disassembler

Step debugger

Linter

Testing

Rationale

Drawbacks

Prior Art

Alternatives

WASM

Represent additional data in the OpCode

Include value in the OpCode

Bitmask the OpCode data

Plan Of Attack

Future Improvements

Optimization