Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(avm executor): kernel outputs & execution hints in executor #6769

Merged
merged 19 commits into from
Jun 3, 2024

Conversation

Maddiaa0
Copy link
Member

Please read contributing guidelines and remove this line.

@Maddiaa0 Maddiaa0 force-pushed the md/kernel-inputs-executor branch 2 times, most recently from 42c3d82 to d2355ee Compare May 30, 2024 15:28
@Maddiaa0 Maddiaa0 force-pushed the md/kernel-outputs-executor branch from e28c940 to 5941499 Compare May 30, 2024 15:28
Base automatically changed from md/kernel-inputs-executor to md/05-30-chore_add_l1_to_l2_msg_read_requests_to_public_circuit_public_inputs May 30, 2024 15:31
Base automatically changed from md/05-30-chore_add_l1_to_l2_msg_read_requests_to_public_circuit_public_inputs to master May 30, 2024 16:25
@AztecBot
Copy link
Collaborator

AztecBot commented May 30, 2024

Benchmark results

Metrics with a significant change:

  • proof_construction_time_poseidon_hash_ms (4): 47.0 (+34%)
  • protocol_circuit_witness_generation_time_in_ms (base-parity): 1,776 (+97%)
Detailed results

All benchmarks are run on txs on the Benchmarking contract on the repository. Each tx consists of a batch call to create_note and increment_balance, which guarantees that each tx has a private call, a nested private call, a public call, and a nested public call, as well as an emitted private note, an unencrypted log, and public storage read and write.

This benchmark source data is available in JSON format on S3 here.

Proof generation

Each column represents the number of threads used in proof generation.

Metric 1 threads 4 threads 16 threads 32 threads 64 threads
proof_construction_time_sha256_ms 5,705 (-2%) 1,546 (-1%) 695 (-2%) 756 776 (-1%)
proof_construction_time_sha256_30_ms 11,702 (-4%) 3,163 (-1%) 1,400 (-1%) 1,422 (-2%) 1,461 (-2%)
proof_construction_time_sha256_100_ms 43,636 (-5%) 11,734 (-5%) 5,411 (-6%) 5,389 (-4%) 5,343 (-4%)
proof_construction_time_poseidon_hash_ms 78.0 (-1%) ⚠️ 47.0 (+34%) 34.0 57.0 (-2%) 87.0
proof_construction_time_poseidon_hash_30_ms 1,516 (-1%) 417 (-2%) 201 (-1%) 231 (+2%) 267 (-1%)
proof_construction_time_poseidon_hash_100_ms 5,712 (-2%) 1,560 (-2%) 715 (-2%) 760 (-7%) 795 (-1%)

L2 block published to L1

Each column represents the number of txs on an L2 block published to L1.

Metric 8 txs 32 txs 64 txs
l1_rollup_calldata_size_in_bytes 1,412 1,412 1,412
l1_rollup_calldata_gas 9,452 9,464 9,476
l1_rollup_execution_gas 607,988 608,000 608,012
l2_block_processing_time_in_ms 1,296 (-1%) 4,861 9,657
l2_block_building_time_in_ms 44,330 175,342 350,313
l2_block_rollup_simulation_time_in_ms 44,135 174,614 348,874
l2_block_public_tx_process_time_in_ms 40,887 171,326 345,521

L2 chain processing

Each column represents the number of blocks on the L2 chain where each block has 16 txs.

Metric 3 blocks 5 blocks
node_history_sync_time_in_ms 9,433 (-3%) 14,391 (-2%)
node_database_size_in_bytes 14,495,824 21,389,392
pxe_database_size_in_bytes 18,071 29,868

Circuits stats

Stats on running time and I/O sizes collected for every kernel circuit run across all benchmarks.

Circuit simulation_time_in_ms witness_generation_time_in_ms proving_time_in_ms input_size_in_bytes output_size_in_bytes proof_size_in_bytes num_public_inputs size_in_gates
private-kernel-init 137 (+2%) 500 (+1%) 12,194 (-1%) 20,634 64,614 89,536 2,731 524,288
private-kernel-inner 409 1,087 (+8%) 45,014 (+1%) 92,326 64,614 89,536 2,731 2,097,152
private-kernel-tail 588 2,959 (+5%) 46,943 (-1%) 96,545 77,732 11,648 297 2,097,152
base-parity 6.50 (-1%) ⚠️ 1,776 (+97%) 2,575 128 64.0 2,208 2.00 131,072
root-parity 49.7 (-1%) 49.6 (-1%) 34,225 (-1%) 27,100 64.0 2,720 18.0 2,097,152
base-rollup 12,065 (+1%) 2,336 (-1%) 65,331 (+1%) 119,738 756 3,648 47.0 4,194,304
root-rollup 112 62.0 (-3%) 19,705 (+1%) 25,309 620 3,456 41.0 1,048,576
public-kernel-app-logic 575 3,507 (-1%) 38,745 (-2%) 108,073 86,550 116,768 3,582 2,097,152
public-kernel-tail 1,134 22,336 (-4%) 153,928 (-1%) 403,238 7,646 11,648 297 8,388,608
private-kernel-reset-small 592 2,167 (+3%) 40,789 (+1%) 120,737 64,614 89,536 2,731 2,097,152
public-kernel-setup 672 (-1%) 2,655 (-1%) 37,429 108,073 86,550 116,768 3,582 2,097,152
public-kernel-teardown 575 (-1%) 3,532 (-1%) 39,413 108,073 86,550 116,768 3,582 2,097,152
merge-rollup 29.8 (+3%) N/A N/A 16,542 756 N/A N/A N/A
private-kernel-tail-to-public N/A 8,736 86,953 (+1%) N/A N/A 116,768 3,582 4,194,304

Stats on running time collected for app circuits

Function input_size_in_bytes output_size_in_bytes witness_generation_time_in_ms proof_size_in_bytes proving_time_in_ms size_in_gates num_public_inputs
ContractClassRegisterer:register 1,344 9,944 464 (-1%) N/A N/A N/A N/A
ContractInstanceDeployer:deploy 1,408 9,944 42.2 N/A N/A N/A N/A
MultiCallEntrypoint:entrypoint 1,920 9,944 1,800 N/A N/A N/A N/A
SchnorrAccount:constructor 1,312 9,944 1,467 (+1%) N/A N/A N/A N/A
SchnorrAccount:entrypoint 2,304 9,944 2,751 16,768 50,077 (-1%) 2,097,152 457
Token:privately_mint_private_note 1,280 9,944 1,657 (+1%) N/A N/A N/A N/A
FPC:fee_entrypoint_public 1,344 9,944 1,152 (+1%) 16,768 10,127 (+1%) 524,288 457
Token:transfer 1,376 9,944 5,460 16,768 48,604 (+2%) 2,097,152 457
Benchmarking:create_note 1,344 9,944 1,389 (-1%) N/A N/A N/A N/A
SchnorrAccount:spend_private_authwit 1,280 9,944 76.3 (-7%) N/A N/A N/A N/A
Token:unshield 1,376 9,944 3,874 (-4%) N/A N/A N/A N/A
FPC:fee_entrypoint_private 1,376 9,944 4,771 (-5%) N/A N/A N/A N/A

Tree insertion stats

The duration to insert a fixed batch of leaves into each tree type.

Metric 1 leaves 16 leaves 64 leaves 128 leaves 512 leaves 1024 leaves 2048 leaves 4096 leaves 32 leaves
batch_insert_into_append_only_tree_16_depth_ms 10.5 17.4 N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_16_depth_hash_count 16.7 31.8 N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_16_depth_hash_ms 0.609 0.522 N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_32_depth_ms N/A N/A 48.8 (-1%) 76.8 (+1%) 248 475 (-1%) 934 1,861 N/A
batch_insert_into_append_only_tree_32_depth_hash_count N/A N/A 95.9 159 543 1,055 2,079 4,127 N/A
batch_insert_into_append_only_tree_32_depth_hash_ms N/A N/A 0.497 0.473 (+1%) 0.449 0.443 (-1%) 0.442 0.443 N/A
batch_insert_into_indexed_tree_20_depth_ms N/A N/A 58.6 113 358 698 (-1%) 1,411 (+1%) 2,787 N/A
batch_insert_into_indexed_tree_20_depth_hash_count N/A N/A 106 208 692 1,363 2,707 5,395 N/A
batch_insert_into_indexed_tree_20_depth_hash_ms N/A N/A 0.506 0.505 0.485 0.479 (-1%) 0.480 0.483 N/A
batch_insert_into_indexed_tree_40_depth_ms N/A N/A N/A N/A N/A N/A N/A N/A 63.0
batch_insert_into_indexed_tree_40_depth_hash_count N/A N/A N/A N/A N/A N/A N/A N/A 107
batch_insert_into_indexed_tree_40_depth_hash_ms N/A N/A N/A N/A N/A N/A N/A N/A 0.556

Miscellaneous

Transaction sizes based on how many contract classes are registered in the tx.

Metric 0 registered classes 1 registered classes
tx_size_in_bytes 84,053 665,267

Transaction size based on fee payment method

| Metric | |
| - | |

@Maddiaa0 Maddiaa0 force-pushed the md/kernel-outputs-executor branch from 5941499 to 27392a6 Compare May 31, 2024 13:03
@@ -35,4 +36,31 @@ static const size_t NUM_MEM_SPACES = 256;
static const uint8_t INTERNAL_CALL_SPACE_ID = 255;
static const uint32_t MAX_SIZE_INTERNAL_STACK = 1 << 16;

// TODO: find a more suitable place for these
/**
* Auxiliary hints are required when the executor is unable to work out a witness value
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag

@Maddiaa0 Maddiaa0 changed the title feat(avm executor): kernel outputs in executor feat(avm executor): kernel outputs & execution hints in executor May 31, 2024
pol START_L1_TO_L2_MSG_EXISTS_WRITE_OFFSET = 16;
pol START_EMIT_UNENCRYPTED_LOG_WRITE_OFFSET = 20;
pol START_EMIT_L2_TO_l1_MSG = 24;
pol START_NULLIFIER_EXISTS_OFFSET = 32; // START_NOTE_HASH_EXISTS_WRITE_OFFSET + MAX_NOTE_HASH_READ_REQUESTS_PER_CALL TODO: exists and non exists
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted these values to line up with the true offsets when using the real PER_CALL limits, previously they were just small ranges as an MVP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you now autogen these from TS?

Comment on lines 57 to 63
ExecutionHints()
{
storage_values = std::unordered_map<uint32_t, FF>();
note_hash_exists = std::unordered_map<uint32_t, bool>();
nullifier_exists = std::unordered_map<uint32_t, bool>();
l1_to_l2_msg_exists = std::unordered_map<uint32_t, bool>();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. They are objects, not pointers, so they get default-constructed. This is extra work.

Copy link
Contributor

@dbanks12 dbanks12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in a position to "approve", but this mostly makes sense to me. Nice work! Left some comments to primarily confirm my understanding

Comment on lines 41 to 42
info("output writing side effect counter: ", side_effect_counter);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to your changes, but can we handle side-effect-counters across nested calls? So a nested call starts with parent's counter+1, and then after it completes, parent absorbs child's counter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, it just starts from 0, but there is an issue to constrain it to be the same as the value provided within the public inputs

Comment on lines +28 to +41
// Exists checks
pol START_NOTE_HASH_EXISTS_WRITE_OFFSET = 0;
pol START_EMIT_NOTE_HASH_WRITE_OFFSET = 4;
pol START_NULLIFIER_EXISTS_OFFSET = 8;
pol START_EMIT_NULLIFIER_WRITE_OFFSET = 12;
pol START_L1_TO_L2_MSG_EXISTS_WRITE_OFFSET = 16;
pol START_EMIT_UNENCRYPTED_LOG_WRITE_OFFSET = 20;
pol START_EMIT_L2_TO_l1_MSG = 24;
pol START_NULLIFIER_EXISTS_OFFSET = 32; // START_NOTE_HASH_EXISTS_WRITE_OFFSET + MAX_NOTE_HASH_READ_REQUESTS_PER_CALL TODO: exists and non exists
pol START_L1_TO_L2_MSG_EXISTS_WRITE_OFFSET = 96; // START_NULLIFIER_EXISTS_OFFET + (MAX_NULLIFIER_READ_REQUESTS_PER_CALL + MAX_NULLIFIER_NON_EXISTENT_READ_REQUESTS_PER_CALL)

// Public storage requests
pol START_SLOAD_WRITE_OFFSET = 112; // START_L1_TO_L2_MSG_EXISTS_WRITE_OFFSET + MAX_L1_TO_L2_MSG_READ_REQUESTS_PER_CALL
pol START_SSTORE_WRITE_OFFSET = 144; // START_SSTORE_WRITE_OFFSET + MAX_PUBLIC_DATA_UPDATE_REQUESTS_PER_CALL

// Emit data
pol START_EMIT_NOTE_HASH_WRITE_OFFSET = 160; // START_SLOAD_WRITE_OFFSET + MAX_PUBLIC_DATA_READS_PER_CALL
pol START_EMIT_NULLIFIER_WRITE_OFFSET = 176; // START_EMIT_NOTE_HASH_WRITE_OFFSET + MAX_NEW_NOTE_HASHES_PER_CALL
pol START_EMIT_L2_TO_l1_MSG = 192; // START_EMIT_NULLIFIER_WRITE_OFFSET + MAX_NEW_NULLIFIERS_PER_CALL
pol START_EMIT_UNENCRYPTED_LOG_WRITE_OFFSET = 194; // START_EMIT_L2_TO_L1_MSG + MAX_NEW_L2_TO_L1_MSGS_PER_CALL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these hand-set or code-generated?

Copy link
Member Author

@Maddiaa0 Maddiaa0 May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now hand set, but they mirror values in cpp, so you just hover over the value in the LSP and copy it over, our pil does not compile time evaluate constants yet

Comment on lines 1252 to 1339
// TODO: fix the naming here - we need it to be different as we are writing a hint
Row AvmTraceBuilder::create_sload(
uint32_t clk, uint32_t data_offset, FF const& data_value, FF const& slot_value, uint32_t slot_offset)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function just populating memory rows for slot and value? Is the row that is returned here, the sload's row in the main trace or is it a row dedicated just to loading the hint into memory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary as sload had slightly differing behavior to the existing ones, it is called within op_sload which orchestrates it all

Comment on lines 1368 to 1528
void AvmTraceBuilder::op_sload(uint32_t slot_offset, uint32_t value_offset)
void AvmTraceBuilder::op_sload(uint32_t slot_offset, uint32_t write_offset)
{
auto const clk = static_cast<uint32_t>(main_trace.size());

Row row =
create_kernel_output_opcode_with_metadata(clk, value_offset, AvmMemoryTag::FF, slot_offset, AvmMemoryTag::FF);
kernel_trace_builder.op_sload(clk, row.avm_main_ib, row.avm_main_ia);
// Read the slot
AvmMemTraceBuilder::MemRead slot_read = mem_trace_builder.read_and_load_from_memory(
call_ptr, clk, IntermRegister::IB, slot_offset, AvmMemoryTag::FF, AvmMemoryTag::FF);

// Get the data value from the execution_hints
// TODO: for now the hints are being offset by the offset - this will NOT fly, but i struggled to get the hash
// working for FF
FF value = execution_hints.storage_values.at(write_offset);
// TODO: throw error if the hint does not exist

Row row = create_sload(clk, write_offset, value, slot_read.val, slot_offset);
kernel_trace_builder.op_sload(clk, row.avm_main_ib, value);

row.avm_main_sel_op_sload = FF(1);

main_trace.push_back(row);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My attempt at understanding this:

  1. Use the instruction's slot_offset operand and read slot from memory (adds row to memory trace)
  2. Use that slot to retrieve a storage value from execution_hints.storage_values
  3. Write the hinted value into memory (adds row to memory trace)
  4. Add row to kernel trace for this storage slot->value "kernel output lookup"
  5. Create a new row in the main trace for the sload, including mem read for slot, mem write for value, and flag to indicate presence of a "kernel output lookup"
  6. Activate the selector for sload in this row

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this is pretty spot on

Comment on lines 3414 to 3688
// Increment the write offset counter for the following row
next.avm_kernel_note_hash_exist_write_offset =
curr.avm_kernel_note_hash_exist_write_offset + static_cast<const int>(src.op_note_hash_exists);
next.avm_kernel_emit_note_hash_write_offset =
curr.avm_kernel_emit_note_hash_write_offset + static_cast<const int>(src.op_emit_note_hash);
next.avm_kernel_emit_nullifier_write_offset =
curr.avm_kernel_emit_nullifier_write_offset + static_cast<const int>(src.op_emit_nullifier);
next.avm_kernel_nullifier_exists_write_offset =
curr.avm_kernel_nullifier_exists_write_offset + static_cast<const int>(src.op_nullifier_exists);
next.avm_kernel_l1_to_l2_msg_exists_write_offset =
curr.avm_kernel_l1_to_l2_msg_exists_write_offset + static_cast<const int>(src.op_l1_to_l2_msg_exists);
next.avm_kernel_emit_l2_to_l1_msg_write_offset =
curr.avm_kernel_emit_l2_to_l1_msg_write_offset + static_cast<const int>(src.op_emit_l2_to_l1_msg);
next.avm_kernel_emit_unencrypted_log_write_offset =
curr.avm_kernel_emit_unencrypted_log_write_offset + static_cast<const int>(src.op_emit_unencrypted_log);
next.avm_kernel_sload_write_offset =
curr.avm_kernel_sload_write_offset + static_cast<const int>(src.op_sload);
next.avm_kernel_sstore_write_offset =
curr.avm_kernel_sstore_write_offset + static_cast<const int>(src.op_sstore);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We increment the write offset counter so that the next kernel lookup in the main trace is constrained to the proper entry in the kernel trace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@Maddiaa0 Maddiaa0 mentioned this pull request Jun 2, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean these

@Maddiaa0 Maddiaa0 force-pushed the md/kernel-outputs-executor branch from a89221e to aca948c Compare June 3, 2024 15:15

pol START_SLOAD_WRITE_OFFSET = 28;
pol START_SSTORE_WRITE_OFFSET = 32;

// TODO(https://github.com/AztecProtocol/aztec-packages/issues/6465): Must constrain write_offset counters to be less than side effect MAX
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reordered

@@ -33,16 +33,17 @@ FF AvmKernelTraceBuilder::perform_kernel_input_lookup(uint32_t selector)
return result;
}

void AvmKernelTraceBuilder::perform_kernel_output_lookup(uint32_t write_offset, const FF& value, const FF& metadata)
void AvmKernelTraceBuilder::perform_kernel_output_lookup(uint32_t write_offset,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side effect counter has been moved into the main trace

@Maddiaa0 Maddiaa0 marked this pull request as ready for review June 3, 2024 15:59
@Maddiaa0 Maddiaa0 enabled auto-merge (squash) June 3, 2024 17:09
@Maddiaa0 Maddiaa0 merged commit 6ab7360 into master Jun 3, 2024
111 checks passed
@Maddiaa0 Maddiaa0 deleted the md/kernel-outputs-executor branch June 3, 2024 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants