Skip to content

Commit

Permalink
docs(yellowpaper): avm call pointers, bytecode lookups, circuit io (A…
Browse files Browse the repository at this point in the history
  • Loading branch information
dbanks12 authored Jan 10, 2024
1 parent f127e5a commit 45e1ed2
Showing 1 changed file with 75 additions and 0 deletions.
75 changes: 75 additions & 0 deletions yellow-paper/docs/public-vm/avm-circuit.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,27 @@ sidebar_position: 1

# AVM Circuit

## Call pointer
Each message call processed within a single VM circuit execution is assigned a unique **call pointer**. There is certain information that must be tracked by the VM circuit on a per-call basis. For example, each call will correspond to the execution of a different contract's bytecode, and each call will access call-specific memory. As a per-call unique identifier, call pointer enables bytecode and memory lookups, among other things, on a per-call basis.

Call pointers are assigned based on execution order. A request's initial message call is assigned call pointer of `1`. The first nested message call encountered during execution is assigned call pointer of `2`. The VM circuit tracks the highest call pointer assigned thus far, and whenever a nested call instruction is encountered, it increments that value and assigns the result to that call.

### "Input" and "output" call pointers
It is important to note that the initial call's pointer is `1`, not `0`. The zero call pointer is a special case known as the "input" call pointer.

As expanded on later, the VM circuit memory table has a separate section for each call pointer. The memory table section for the **input call pointer** is reserved for the initial call's `ExecutionEnvironment` and initial `MachineState` as they appear in the circuit's inputs. This will be expanded on later.

## Bytecode
The VM circuit's primary purpose is to prove execution of the correct sequence of instructions given a message call's bytecode and inputs. The circuit will prove correct execution of any nested message calls as well. Each nested call will have its own bytecode and inputs, but will be processed within the same circuit.

Thus, a circuit column is assembled to contain the bytecode for all of a request's message calls (initial and nested). If a request's execution contains message call's to contracts A, B, C, and D (in that order), the VM circuit's bytecode column will contain A's bytecode, followed by B's, C's, and finally D's. Each one will be zero-padded to some constant length `CONTRACT_BYTECODE_MAX_LENGTH`.

The bytecode column will be paired with a call pointer column and program counter column. These three columns make up the **bytecode table**, where an instruction is paired with the call pointer and program counter it corresponds to.

Each row in the execution trace will also contain a call pointer and program counter, enabling a lookup into the bytecode table to retrieve the proper instruction (opcode and arguments). Through this mechanism, the VM circuit enforces that every executed instruction corresponds to the correct entry in the bytecode column.

Each contract's public bytecode is committed to during contract deployment. As part of the AVM circuit verification algorithm, the bytecode column (as a concatenation of all relevant contract bytecodes) is verified against the corresponding bytecode commitments. This is expanded on in ["Bytecode Validation Circuit"](./bytecode-validation-circuit.md).

## Memory
To process a public execution request, the AVM executes the request's initial message call along with any nested calls it encounters. Execution of a message call requires some context including an `ExecutionEnvironment` and `MachineState`. Separate instances of these constructs must exist for each message call.

Expand Down Expand Up @@ -51,3 +72,57 @@ An instruction like `ADDRESS` serves as great example because it performs a read
- memory write
- flags: `callPointer`, `userMemory = 1` (user memory access)
- offset: `dstOffset`

## Circuit I/O

### How do "Public Inputs" work in the AVM circuit?
ZK circuit proof systems generally define some mechanism for "public inputs" for which witness values must be communicated in full to a verifier. The AVM proof system defines its own mechanism for public inputs in which it flags certain trace columns as "public input columns". Any public input columns must be communicated in full to a verifier.

### AVM public inputs structure
The VM circuit's I/O is defined as the `AvmPublicInputs` structure detailed below:
```
AvmSideEffects {
newNoteHashes,
newNullifiers,
newL2ToL1Messages,
unencryptedLogs,
}
AvmPublicInputs {
initialEnvironment: ExecutionEnvironment & {l1GasLeft, l2GasLeft, daGasLeft},
calldata: [],
sideEffects: AvmSideEffects,
storageAccesses,
gasResults: {l1GasLeft, l2GasLeft, daGasLeft},
}
```

### AVM public input columns
The `AvmPublicInputs` structure is represented in the VM trace via the following public input columns:
1. `initialEnvironment` has a dedicated column and is used to initialize the initial call's `ExecutionEnvironment` and `MachineState`
1. `calldata` has its own dedicated public input column
1. `sideEffects: AvmSideEffects`
- This represents the final `AccruedSubstate` of the initial message call
- There is a separate sub-table (columns) for each side-effect vector
- Each row in the `newNoteHashes` sub-table contains `{contractAddress, noteHash}`
- Each row in the `newNullifiers` sub-table contains `{contractAddress, nullifier}`
- Each row in the `newL2ToL1Messages` sub-table contains `{contractAddress, wordIndex, messageWord}`
- where a message containing N words takes up N entries with increasing `wordIndex`
- Each row in the `unencryptedLogs` sub-table contains `{contractAddress, wordIndex, logWord}`
- where a log containing N words takes up N entries with increasing `wordIndex`
- Side effects are present in the trace in execution-order
1. `storageAccesses`
- This contains the first and last public storage access for each slot that is accessed during execution
- Each row in the `storageAccesses` sub-table contains `{contractAddress, slot, value}`
- Storage accesses are present in the trace in execution-order
1. `gasResults: AvmGasResults`
- This is derived from the _final_ `MachineState` of the initial message call

### Initial call's protected memory
Any lookup into protected memory from a request's initial message call must retrieve a value matching the `initialEnvironment` public inputs column\*. To enforce this, an equivalence check is applied between the `initialEnvironment` column and the memory trace for protected memory accesses that use call pointer `1`.

> \* `MachineState` has entries (`pc`, `internalCallStack`) that are not initialized from inputs. Accesses to these entries from the initial message call do _not_ trigger lookups into a public inputs column.
> Note: protected memory is irrelevant for the "input call pointer" itself (`0`). The initial call's protected memory (call pointer `1`) is constructed to match the public inputs column. The "input call pointer" is only relevant for `calldata` as explained next.
### Initial call's calldata
Similarly, any lookup into calldata from a request's initial message call must retrieve a value matching the `calldata` public inputs column. To enforce this, an equivalence check is applied between the `calldata` column and the memory trace for user memory accesses that use "input call pointer".

0 comments on commit 45e1ed2

Please sign in to comment.