Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(yellow-paper): Add pseudocode for verifying broadcasted functions in contract deployment #4431

Merged
merged 8 commits into from
Mar 14, 2024
86 changes: 63 additions & 23 deletions yellow-paper/docs/contract-deployment/classes.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,14 @@ unconstrained_functions_artifact_tree_root = merkleize(unconstrained_functions_a
artifact_hash = sha256(
private_functions_artifact_tree_root,
unconstrained_functions_artifact_tree_root,
artifact_metadata,
artifact_metadata_hash,
)
```

For the artifact hash merkleization and hashing is done using sha256, since it is computed and verified outside of circuits and does not need to be SNARK friendly. Fields are left-padded with zeros to 256 bits before being hashed. Function leaves are sorted in ascending order before being merkleized, according to their function selectors. Note that a tree with dynamic height is built instead of having a tree with a fixed height, since the merkleization is done out of a circuit.

<!-- TODO: Sure, sha256 is nice, but its output does not fit in a single field. Is it ok to wrap around the field modulus? Should we use sha224 instead? Should we use pedersen (or poseidon) everywhere instead? -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think wrapping around the modulus should be ok in this case, because that should still give us collision resistance. But perhaps "Poseidon everywhere" is an easier approach.

Fields are left-padded with zeros to 256 bits before being hashed.
Aside (and irrelevant if we move to Poseidon): this wouldn't be necessary with sha256. sha256 deals with inputs specified in bits, so concatenating 254-bit inputs would work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll still leave a TODO for verifying it with the crypto team, just in case.

As for not padding to 256 bits, that's good to know! Still, I think it's best to pad anyway for consistency: pretty much everywhere we represent fields using 32 bytes.


Bytecode for private functions is a mix of ACIR and Brillig, whereas unconstrained function bytecode is Brillig exclusively, as described on the [bytecode section](../bytecode/index.md).

The metadata hash for each function is suggested to be computed as the sha256 of all JSON-serialized fields in the function struct of the compilation artifact, except for bytecode and debug symbols. The metadata is JSON-serialized using no spaces, and sorting ascending all keys in objects before serializing them.
Expand Down Expand Up @@ -124,13 +126,16 @@ In pseudocode:
function register(
artifact_hash: Field,
private_functions_root: Field,
public_bytecode_commitment: Field,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be an encoding of a point, so at least 1 Field + 1 bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use the "all points are represented on one side of the curve" to avoid that extra bit?

Copy link
Contributor

@iAmMichaelConnor iAmMichaelConnor Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know 😅 Potentially. We'd need to check with the crypto team, I think.
A follow-up thought that just popped into my mind. The bytecode commitment might be an altBN254 point, rather than a Grumpkin point, so the altBN254 coordinates would be (F_q, F_q) instead of Grumpkin coordinates (F_r, F_r). I believe q > r, so even a single x-coordinate wouldn't fit into an F_r field. (The Field type in Noir is F_r).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redefining it to be a Point for now, until we know for sure what shape it'll have

packed_public_bytecode: Field[],
)
assert is_valid_packed_public_bytecode(packed_public_bytecode)

version = 1
bytecode_commitment = calculate_commitment(packed_public_bytecode)
contract_class_id = pedersen([version, artifact_hash, private_functions_root, bytecode_commitment], GENERATOR__CLASS_IDENTIFIER)

assert is_valid_packed_public_bytecode(packed_public_bytecode)
computed_bytecode_commitment = calculate_commitment(packed_public_bytecode)
assert public_bytecode_commitment == computed_bytecode_commitment

contract_class_id = pedersen([version, artifact_hash, private_functions_root, computed_bytecode_commitment], GENERATOR__CLASS_IDENTIFIER)

emit_nullifier contract_class_id
emit_unencrypted_event ContractClassRegistered(contract_class_id, version, artifact_hash, private_functions_root, packed_public_bytecode)
Expand All @@ -157,13 +162,13 @@ Broadcasted contract artifacts that do not match with their corresponding `artif
```
function broadcast_all_private_functions(
contract_class_id: Field,
artifact_metadata: Field,
artifact_metadata_hash: Field,
unconstrained_functions_artifact_tree_root: Field,
functions: { selector: Field, metadata: Field, vk_hash: Field, bytecode: Field[] }[],
functions: { selector: Field, metadata_hash: Field, vk_hash: Field, bytecode: Field[] }[],
)
emit_unencrypted_event ClassPrivateFunctionsBroadcasted(
contract_class_id,
artifact_metadata,
artifact_metadata_hash,
unconstrained_functions_artifact_tree_root,
functions,
)
Expand All @@ -172,19 +177,19 @@ function broadcast_all_private_functions(
```
function broadcast_all_unconstrained_functions(
contract_class_id: Field,
artifact_metadata: Field,
artifact_metadata_hash: Field,
private_functions_artifact_tree_root: Field,
functions:{ selector: Field, metadata: Field, bytecode: Field[] }[],
functions:{ selector: Field, metadata_hash: Field, bytecode: Field[] }[],
)
emit_unencrypted_event ClassUnconstrainedFunctionsBroadcasted(
contract_class_id,
artifact_metadata,
artifact_metadata_hash,
unconstrained_functions_artifact_tree_root,
functions,
)
```

<!-- TODO: What representation of bytecode can we use here? -->
<!-- TODO: What representation of bytecode can we use here? -->

The broadcast functions are split between private and unconstrained to allow for private bytecode to be broadcasted, which is valuable for composability purposes, without having to also include unconstrained functions, which could be costly to do due to data broadcasting costs. Additionally, note that each broadcast function must include enough information to reconstruct the `artifact_hash` from the Contract Class, so nodes can verify it against the one previously registered.

Expand All @@ -193,35 +198,70 @@ The `ContractClassRegisterer` contract also allows broadcasting individual funct
```
function broadcast_private_function(
contract_class_id: Field,
artifact_metadata: Field,
artifact_metadata_hash: Field,
unconstrained_functions_artifact_tree_root: Field,
function_leaf_sibling_path: Field,
function: { selector: Field, metadata: Field, vk_hash: Field, bytecode: Field[] },
private_function_tree_sibling_path: Field[],
artifact_function_tree_sibling_path: Field[],
function: { selector: Field, metadata_hash: Field, vk_hash: Field, bytecode: Field[] },
)
emit_unencrypted_event ClassPrivateFunctionBroadcasted(
contract_class_id,
artifact_metadata,
artifact_metadata_hash,
unconstrained_functions_artifact_tree_root,
function_leaf_sibling_path,
private_function_tree_sibling_path,
artifact_function_tree_sibling_path,
function,
)
```

```
function broadcast_unconstrained_function(
contract_class_id: Field,
artifact_metadata: Field,
artifact_metadata_hash: Field,
private_functions_artifact_tree_root: Field,
function_leaf_sibling_path: Field,
function: { selector: Field, metadata: Field, bytecode: Field[] }[],
artifact_function_tree_sibling_path: Field[],
function: { selector: Field, metadata_hash: Field, bytecode: Field[] }[],
)
emit_unencrypted_event ClassUnconstrainedFunctionBroadcasted(
contract_class_id,
artifact_metadata,
unconstrained_functions_artifact_tree_root,
function_leaf_sibling_path: Field,
artifact_metadata_hash,
private_functions_artifact_tree_root,
artifact_function_tree_sibling_path,
function,
)
```

A node that captures a `ClassPrivateFunctionBroadcasted` should perform the following validation steps before storing the private function information in its database:

```
// Load contract class from local db
contract_class = db.get_contract_class(contract_class_id)

// Compute function leaf and assert it belongs to the private functions tree
function_leaf = pedersen([selector as Field, vk_hash], GENERATOR__FUNCTION_LEAF)
computed_private_function_tree_root = compute_root(function_leaf, private_function_tree_sibling_path)
assert computed_private_function_tree_root == contract_class.private_function_root

// Compute artifact leaf and assert it belongs to the artifact
artifact_function_leaf = sha256(selector, metadata_hash, sha256(bytecode))
computed_artifact_private_function_tree_root = compute_root(artifact_function_leaf, artifact_function_tree_sibling_path)
computed_artifact_hash = sha256(computed_artifact_private_function_tree_root, unconstrained_functions_artifact_tree_root, artifact_metadata_hash)
assert computed_artifact_hash == contract_class.artifact_hash
```

<!-- TODO: Requiring two sibling paths isn't nice. This is because we are splitting private function information across two trees: one for the protocol, that deals only with selectors and vk hashes, and one for the artifact, which deals with bytecode and metadata. If we are fine adding a `function_stuff_hash` to the function leaf that goes into the protocol tree, we could get rid of the second sibling path, but that introduces stuff into the private function tree that is not strictly needed and requires unnecessary hashing in the kernel. -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon aiming to reduce any in-circuit hashing is the best approach, here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Moved this to a "discarded approaches" section.


The check for an unconstrained function is similar:

```
// Load contract class from local db
contract_class = db.get_contract_class(contract_class_id)

// Compute artifact leaf and assert it belongs to the artifact
artifact_function_leaf = sha256(selector, metadata_hash, sha256(bytecode))
computed_artifact_unconstrained_function_tree_root = compute_root(artifact_function_leaf, artifact_function_tree_sibling_path)
computed_artifact_hash = sha256(private_functions_artifact_tree_root, computed_artifact_unconstrained_function_tree_root, artifact_metadata_hash)
assert computed_artifact_hash == contract_class.artifact_hash
```

It is strongly recommended for developers registering new classes to broadcast the code for `compute_hash_and_nullifier`, so any private message recipients have the code available to process their incoming notes. However, the `ContractClassRegisterer` contract does not enforce this during registration, since it is difficult to check the multiple signatures for `compute_hash_and_nullifier` as they may evolve over time to account for new note sizes.
16 changes: 8 additions & 8 deletions yellow-paper/docs/contract-deployment/instances.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,9 @@ A contract instance at a given address can be either Initialized or not. An addr

### Uninitialized

The instance has not yet been initialized, meaning its constructor has not been called. This is the default state for any given address. A user who knows the preimage of the address can still issue a private call into a function in the contract, as long as that function does not assert that the contract has been initialized by checking the Initialization Nullifier.
The default state for any given address is to be uninitialized, meaning its constructor has not been called. A user who knows the preimage of the address can still issue a private call into a function in the contract, as long as that function does not assert that the contract has been initialized by checking the Initialization Nullifier.

All public function calls to an Uninitialized address _must_ fail, since the Contract Class for it is not known to the network. If the Class is not known to the network, then an Aztec Node, whether it is the elected sequencer or a full node following the chain, may not be able to execute the bytecode for a public function call, which is undesirable. The failing of public function calls to Uninitialized addresses is enforced by having the Public Kernel Circuit check that the Deployment Nullifier for the instance has been emitted.
All function calls to an Uninitialized contract that depend on the contract being initialized should fail, to prevent the contract from being used in an invalid state.

This state allows using a contract privately before it has been initialized or deployed, which is used in [diversified and stealth accounts](../addresses-and-keys/diversified-and-stealth.md).

Expand All @@ -58,8 +58,6 @@ An instance is Initialized when a constructor for the instance has been invoked,

The Initialization Nullifier is defined as the contract address itself. Note that the nullifier later gets [siloed by the Private Kernel Circuit](../circuits/private-kernel-tail.md#siloing-values) before it gets broadcasted in a transaction.

In this state, public functions must still fail, for the same reason as for Uninitialized instances. This state then allows using a contract privately before it has been publicly deployed, which is useful for working on private contracts between a small set of parties.

:::warning
It may be the case that it is not possible to read a nullifier in the same transaction that it was emitted due to protocol limitations. That would lead to a contract not being callable in the same transaction as it is initialized. To work around this, we can emit an Initialization Commitment along with the Initialization Nullifier, which _can_ be read in the same transaction as it is emitted. If needed, the Initialization Commitment is defined exactly as the Initialization Nullifier.
:::
Expand All @@ -81,11 +79,13 @@ Removing constructors from the protocol itself simplifies the kernel circuit, an

## Public Deployment

A Contract Instance is considered to be Publicly Deployed when it has been broadcasted to the network via a canonical `ContractInstanceDeployer` contract, which also emits a Deployment Nullifier associated to the deployed instance. A contract needs to be Publicly Deployed for any of its public functions to be called. Note that this last restriction makes Public Deployment a protocol-level concern, whereas Initialization is an application-level concern.
A Contract Instance is considered to be Publicly Deployed when it has been broadcasted to the network via a canonical `ContractInstanceDeployer` contract, which also emits a Deployment Nullifier associated to the deployed instance.

The Deployment Nullifier is defined as the address of the contract being deployed. Note that it later gets [siloed](../circuits/private-kernel-tail.md#siloing-values) using the `ContractInstanceDeployer` address by the Kernel Circuit, so this nullifier is effectively the hash of the deployed contract address and the `ContractInstanceDeployer` address.
All public function calls to an Undeployed address _must_ fail, since the Contract Class for it is not known to the network. If the Class is not known to the network, then an Aztec Node, whether it is the elected sequencer or a full node following the chain, may not be able to execute the bytecode for a public function call, which is undesirable.

Only in this state public function calls are valid. The Public Kernel Circuit validates that the Deployment Nullifier has been emitted by the `ContractInstanceDeployer` as part of its checks. Note that this requires hardcoding the address of an application-level contract in a protocol circuit.
The failing of public function calls to Undeployed addresses is enforced by having the Public Kernel Circuit check that the Deployment Nullifier for the instance has been emitted. Note that makes Public Deployment a protocol-level concern, whereas Initialization is purely an application-level concern. Also, note that this requires hardcoding the address of the `ContractInstanceDeployer` contract in a protocol circuit.

The Deployment Nullifier is defined as the address of the contract being deployed. Note that it later gets [siloed](../circuits/private-kernel-tail.md#siloing-values) using the `ContractInstanceDeployer` address by the Kernel Circuit, so this nullifier is effectively the hash of the deployed contract address and the `ContractInstanceDeployer` address.

### Canonical Contract Instance Deployer

Expand Down Expand Up @@ -122,7 +122,7 @@ function deploy (

Upon seeing a `ContractInstanceDeployed` event from the canonical `ContractInstanceDeployer` contract, nodes are expected to store the address and preimage, so they can verify executed code during public code execution as described in the next section.

The `ContractInstanceDeployer` contract provides two implementations of the `deploy` function: a private and a public one. Contracts with a private constructor are expected to use the former, and contracts with public constructors expected to use the latter. Contracts that have already been privately Initialized can use either.
The `ContractInstanceDeployer` contract provides two implementations of the `deploy` function: a private and a public one.

### Genesis

Expand Down
Loading