Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove finalize from acir create circuit #6585

Merged
merged 9 commits into from
May 22, 2024
Merged

Conversation

ledwards2225
Copy link
Contributor

@ledwards2225 ledwards2225 commented May 21, 2024

Largely undoing the changes made in this PR that added a call to finalize_circuit in the create_circuit methods in acir_format. This change was originally made in order to facilitate accurate final gate counts (previously handled via a hack). The finalize model has been reverted and the hack has been replaced with a method that computes the number of gates added by the add_gates_to_ensure.. methods in the builders. This change was necessary in order to allow gates to be added to a circuit generated from acir, which is needed in ClientIvc accumulation.

Copy link
Contributor

github-actions bot commented May 21, 2024

Changes to circuit sizes

Generated at commit: dc319260d391bc77361166cf866ab891849ee5c6, compared to commit: 172e4150d0750825063ad76be0c73e27fe48f3a9

🧾 Summary (100% most significant diffs)

Program ACIR opcodes (+/-) % Circuit size (+/-) %
parity_root 0 ➖ 0.00% +13,425 ❌ +1.27%
rollup_root 0 ➖ 0.00% +3,332 ❌ +1.13%
private_kernel_init 0 ➖ 0.00% +3,117 ❌ +0.37%
private_kernel_inner 0 ➖ 0.00% +5,787 ❌ +0.34%
private_kernel_reset_small 0 ➖ 0.00% +3,195 ❌ +0.31%
private_kernel_reset_medium 0 ➖ 0.00% +3,139 ❌ +0.25%
private_kernel_tail 0 ➖ 0.00% +2,742 ❌ +0.21%
private_kernel_reset_big 0 ➖ 0.00% +3,027 ❌ +0.18%
private_kernel_tail_to_public 0 ➖ 0.00% +3,147 ❌ +0.15%
private_kernel_reset 0 ➖ 0.00% +2,803 ❌ +0.11%
parity_base 0 ➖ 0.00% -10 ✅ -0.01%
public_kernel_tail 0 ➖ 0.00% -765 ✅ -0.02%
rollup_merge 0 ➖ 0.00% -10 ✅ -0.02%
rollup_base 0 ➖ 0.00% -520 ✅ -0.03%
public_kernel_app_logic 0 ➖ 0.00% -1,364 ✅ -0.17%
public_kernel_teardown 0 ➖ 0.00% -1,166 ✅ -0.17%
public_kernel_setup 0 ➖ 0.00% -1,166 ✅ -0.17%
private_kernel_tail_to_public_simulated 0 ➖ 0.00% -11 ✅ -0.31%
public_kernel_app_logic_simulated 0 ➖ 0.00% -11 ✅ -0.31%
public_kernel_setup_simulated 0 ➖ 0.00% -11 ✅ -0.31%
public_kernel_teardown_simulated 0 ➖ 0.00% -11 ✅ -0.31%
private_kernel_init_simulated 0 ➖ 0.00% -11 ✅ -0.42%
private_kernel_inner_simulated 0 ➖ 0.00% -11 ✅ -0.42%
private_kernel_reset_simulated 0 ➖ 0.00% -11 ✅ -0.42%
private_kernel_reset_simulated_big 0 ➖ 0.00% -11 ✅ -0.42%
private_kernel_reset_simulated_medium 0 ➖ 0.00% -11 ✅ -0.42%
private_kernel_reset_simulated_small 0 ➖ 0.00% -11 ✅ -0.42%
private_kernel_tail_simulated 0 ➖ 0.00% -11 ✅ -4.15%
public_kernel_tail_simulated 0 ➖ 0.00% -11 ✅ -4.15%
rollup_base_simulated 0 ➖ 0.00% -11 ✅ -23.40%

Full diff report 👇
Program ACIR opcodes (+/-) % Circuit size (+/-) %
parity_root 2,140 (0) 0.00% 1,067,587 (+13,425) +1.27%
rollup_root 1,407 (0) 0.00% 299,402 (+3,332) +1.13%
private_kernel_init 228,119 (0) 0.00% 838,870 (+3,117) +0.37%
private_kernel_inner 262,668 (0) 0.00% 1,718,162 (+5,787) +0.34%
private_kernel_reset_small 137,627 (0) 0.00% 1,044,498 (+3,195) +0.31%
private_kernel_reset_medium 154,744 (0) 0.00% 1,256,847 (+3,139) +0.25%
private_kernel_tail 195,031 (0) 0.00% 1,319,627 (+2,742) +0.21%
private_kernel_reset_big 188,976 (0) 0.00% 1,681,543 (+3,027) +0.18%
private_kernel_tail_to_public 621,552 (0) 0.00% 2,150,433 (+3,147) +0.15%
private_kernel_reset 257,443 (0) 0.00% 2,530,938 (+2,803) +0.11%
parity_base 347 (0) 0.00% 79,811 (-10) -0.01%
public_kernel_tail 1,027,305 (0) 0.00% 3,704,148 (-765) -0.02%
rollup_merge 304 (0) 0.00% 45,970 (-10) -0.02%
rollup_base 191,041 (0) 0.00% 1,805,991 (-520) -0.03%
public_kernel_app_logic 251,697 (0) 0.00% 795,232 (-1,364) -0.17%
public_kernel_teardown 223,363 (0) 0.00% 666,415 (-1,166) -0.17%
public_kernel_setup 223,157 (0) 0.00% 666,122 (-1,166) -0.17%
private_kernel_tail_to_public_simulated 1 (0) 0.00% 3,573 (-11) -0.31%
public_kernel_app_logic_simulated 1 (0) 0.00% 3,573 (-11) -0.31%
public_kernel_setup_simulated 1 (0) 0.00% 3,573 (-11) -0.31%
public_kernel_teardown_simulated 1 (0) 0.00% 3,573 (-11) -0.31%
private_kernel_init_simulated 1 (0) 0.00% 2,632 (-11) -0.42%
private_kernel_inner_simulated 1 (0) 0.00% 2,632 (-11) -0.42%
private_kernel_reset_simulated 1 (0) 0.00% 2,632 (-11) -0.42%
private_kernel_reset_simulated_big 1 (0) 0.00% 2,632 (-11) -0.42%
private_kernel_reset_simulated_medium 1 (0) 0.00% 2,632 (-11) -0.42%
private_kernel_reset_simulated_small 1 (0) 0.00% 2,632 (-11) -0.42%
private_kernel_tail_simulated 1 (0) 0.00% 254 (-11) -4.15%
public_kernel_tail_simulated 1 (0) 0.00% 254 (-11) -4.15%
rollup_base_simulated 1 (0) 0.00% 36 (-11) -23.40%

@AztecBot
Copy link
Collaborator

AztecBot commented May 22, 2024

Benchmark results

Metrics with a significant change:

  • app_circuit_proving_time_in_ms (Test:emit_nullifier): 2,553 (+17%)
  • protocol_circuit_witness_generation_time_in_ms (private-kernel-reset-small): 2,693 (+32%)
  • protocol_circuit_witness_generation_time_in_ms (private-kernel-tail): 3,973 (+47%)
  • protocol_circuit_witness_generation_time_in_ms (base-parity): 1,743 (+89%)
  • protocol_circuit_witness_generation_time_in_ms (merge-rollup): 82.4 (+146%)
  • protocol_circuit_proving_time_in_ms (public-kernel-app-logic): 799 (+27%)
  • protocol_circuit_proving_time_in_ms (public-kernel-setup): 867 (+28%)
  • protocol_circuit_proving_time_in_ms (merge-rollup): 2,250 (+60%)
  • protocol_circuit_size_in_gates (public-kernel-tail): 256 (-50%)
Detailed results

All benchmarks are run on txs on the Benchmarking contract on the repository. Each tx consists of a batch call to create_note and increment_balance, which guarantees that each tx has a private call, a nested private call, a public call, and a nested public call, as well as an emitted private note, an unencrypted log, and public storage read and write.

This benchmark source data is available in JSON format on S3 here.

Proof generation

Each column represents the number of threads used in proof generation.

Metric 1 threads 4 threads 16 threads 32 threads 64 threads
proof_construction_time_sha256 5,729 (+1%) 1,567 (+1%) 718 (+1%) 753 (-5%) 783 (+2%)

L2 block published to L1

Each column represents the number of txs on an L2 block published to L1.

Metric 8 txs 32 txs 64 txs
l1_rollup_calldata_size_in_bytes 772 772 772
l1_rollup_calldata_gas 6,856 6,868 6,868
l1_rollup_execution_gas 587,396 587,408 587,408
l2_block_processing_time_in_ms 1,374 (+1%) 5,118 (+1%) 10,089 (-1%)
l2_block_building_time_in_ms 32,617 (+1%) 127,576 255,060
l2_block_rollup_simulation_time_in_ms 32,443 (+1%) 126,939 253,794
l2_block_public_tx_process_time_in_ms 16,970 (+2%) 70,415 142,513 (+1%)

L2 chain processing

Each column represents the number of blocks on the L2 chain where each block has 16 txs.

Metric 5 blocks 10 blocks
node_history_sync_time_in_ms 15,340 (+1%) 29,362 (+4%)
node_database_size_in_bytes 21,241,936 37,888,080
pxe_database_size_in_bytes 29,868 12,535

Circuits stats

Stats on running time and I/O sizes collected for every kernel circuit run across all benchmarks.

Circuit protocol_circuit_simulation_time_in_ms protocol_circuit_witness_generation_time_in_ms protocol_circuit_proving_time_in_ms protocol_circuit_input_size_in_bytes protocol_circuit_output_size_in_bytes protocol_circuit_proof_size_in_bytes protocol_circuit_num_public_inputs protocol_circuit_size_in_gates
private-kernel-init 160 3,638 (+5%) 20,977 (-13%) 19,985 61,999 86,720 2,643 1,048,576
private-kernel-inner 606 4,826 (+9%) 40,255 (-13%) 89,053 61,999 86,720 2,643 2,097,152
private-kernel-tail 531 ⚠️ 3,973 (+47%) 37,306 (-9%) 86,849 78,838 10,624 265 2,097,152
base-parity 6.61 (+2%) ⚠️ 1,743 (+89%) 3,083 (+2%) 128 64.0 2,208 2.00 131,072
root-parity 49.5 (+1%) 62.3 (-12%) 41,233 (-8%) 27,064 64.0 2,720 18.0 2,097,152
base-rollup 748 (-1%) 2,447 (-3%) 43,253 (-11%) 112,602 957 3,136 31.0 2,097,152
root-rollup 94.3 (+3%) 49.6 (-19%) 8,266 (-6%) 11,518 821 3,456 41.0 524,288
public-kernel-app-logic 243 131 (-9%) ⚠️ 799 (+27%) 96,850 84,967 116,320 3,568 4,096
public-kernel-tail 872 662 (-6%) 1,065 (-3%) 388,084 7,691 10,112 249 ⚠️ 256 (-50%)
private-kernel-reset-small 580 ⚠️ 2,693 (+32%) 23,205 (-14%) 118,369 61,999 86,720 2,643 1,048,576
public-kernel-setup 218 (+2%) 194 (+1%) ⚠️ 867 (+28%) 143,141 84,967 116,320 3,568 4,096
public-kernel-teardown 225 (+3%) 167 (-26%) 903 (+6%) 143,141 84,967 116,320 3,568 4,096
merge-rollup 6.54 ⚠️ 82.4 (+146%) ⚠️ 2,250 (+60%) 2,760 957 3,136 31.0 65,536
private-kernel-tail-to-public N/A 9,266 (-4%) 73,461 (-11%) N/A N/A 116,832 3,584 4,194,304

Stats on running time collected for app circuits

Function app_circuit_proof_size_in_bytes app_circuit_proving_time_in_ms app_circuit_size_in_gates app_circuit_num_public_inputs
SchnorrAccount:entrypoint 16,128 48,547 (-6%) 2,097,152 437
Test:emit_nullifier 16,128 ⚠️ 2,553 (+17%) 65,536 437
FPC:fee_entrypoint_public 16,128 16,282 (+10%) 524,288 437
FPC:fee_entrypoint_private 16,128 9,048 (-5%) 524,288 437
Token:unshield 16,128 50,764 (-6%) 2,097,152 437
SchnorrAccount:spend_private_authwit 16,128 2,649 (+2%) 131,072 437
Token:transfer 16,128 34,544 (-13%) 2,097,152 437

Tree insertion stats

The duration to insert a fixed batch of leaves into each tree type.

Metric 1 leaves 16 leaves 64 leaves 128 leaves 512 leaves 1024 leaves 2048 leaves 4096 leaves 32 leaves
batch_insert_into_append_only_tree_16_depth_ms 11.3 (+3%) 18.5 (+3%) N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_16_depth_hash_count 16.7 31.8 N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_16_depth_hash_ms 0.660 (+3%) 0.568 (+3%) N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_32_depth_ms N/A N/A 52.4 (+2%) 89.3 (+11%) 267 (+2%) 507 994 (+1%) 1,962 N/A
batch_insert_into_append_only_tree_32_depth_hash_count N/A N/A 95.9 159 543 1,055 2,079 4,127 N/A
batch_insert_into_append_only_tree_32_depth_hash_ms N/A N/A 0.536 (+2%) 0.551 (+11%) 0.484 (+2%) 0.474 0.470 0.468 N/A
batch_insert_into_indexed_tree_20_depth_ms N/A N/A 62.8 (+2%) 123 (+3%) 379 (+1%) 744 (+1%) 1,481 (+1%) 2,941 N/A
batch_insert_into_indexed_tree_20_depth_hash_count N/A N/A 106 208 692 1,363 2,707 5,395 N/A
batch_insert_into_indexed_tree_20_depth_hash_ms N/A N/A 0.547 (+2%) 0.550 (+3%) 0.515 (+1%) 0.512 (+1%) 0.512 (+1%) 0.511 N/A
batch_insert_into_indexed_tree_40_depth_ms N/A N/A N/A N/A N/A N/A N/A N/A 68.9 (+3%)
batch_insert_into_indexed_tree_40_depth_hash_count N/A N/A N/A N/A N/A N/A N/A N/A 108
batch_insert_into_indexed_tree_40_depth_hash_ms N/A N/A N/A N/A N/A N/A N/A N/A 0.604 (+3%)

Miscellaneous

Transaction sizes based on how many contract classes are registered in the tx.

Metric 0 registered classes 1 registered classes
tx_size_in_bytes 84,613 664,966

Transaction size based on fee payment method

| Metric | |
| - | |

Transaction processing duration by data writes.

Metric 0 new note hashes 1 new note hashes 2 new note hashes
tx_pxe_processing_time_ms 27,138 (-9%) 4,235 122,340 (-8%)
Metric 0 public data writes 1 public data writes 2 public data writes 3 public data writes 4 public data writes 8 public data writes
tx_sequencer_processing_time_ms 1,347 (+1%) 2,221 1,523 (+3%) 4,111 (+5%) 1,693 (+2%) 1,979 (+2%)

@ledwards2225 ledwards2225 marked this pull request as ready for review May 22, 2024 15:41
@ledwards2225 ledwards2225 self-assigned this May 22, 2024
Copy link
Contributor

@lucasxia01 lucasxia01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but seems to be double counting these extra gates?

@@ -164,7 +164,8 @@ bool proveAndVerifyHonkAcirFormat(acir_format::AcirFormat constraint_system, aci
// Construct a bberg circuit from the acir representation
auto builder = acir_format::create_circuit<Builder>(constraint_system, 0, witness);

size_t srs_size = builder.get_circuit_subgroup_size(builder.get_total_circuit_size());
auto num_extra_gates = builder.get_num_gates_added_to_ensure_nonzero_polynomials();
size_t srs_size = builder.get_circuit_subgroup_size(builder.get_total_circuit_size() + num_extra_gates);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this not double count the ensure_nonzero gates? Since get_num_gates_added_to_ensure_nonzero_polynomials will add these gates in, and then get_total_circuit_size would already account for these extra gates.

Copy link
Contributor Author

@ledwards2225 ledwards2225 May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_total_circuit_size doesn't account for the added gates for non-zero polys, only the gates that have been added explicitly plus those that will be added via the finalize method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm saying is that calling get_num_gates_added_to_ensure_nonzero_polynomials will add the non-zero gates and then get_total_circuit_size calls get_num_gates which calls get_num_gates_split_into_components which sets the count to be this->num_gates which should include the added non-zero gates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the added non-zero gates aren't added until we construct an instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe the confusion is this: get_num_gates_added_to_ensure_nonzero_polynomials doesn't actually add the gates. It creates a new builder, adds them to that, then counts them. The original builder is unaffected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah got it. I didn't look hard enough at that function oops

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to make that a bit more explicit

process_non_native_field_multiplications();
process_ROM_arrays();
process_RAM_arrays();
process_range_lists();
circuit_finalized = true;
} else {
// Gates added after first call to finalize will not be processed since finalization is only performed once
info("WARNING: Redudant call to finalize_circuit(). Is this intentional?");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@ledwards2225 ledwards2225 enabled auto-merge (squash) May 22, 2024 17:08
@ledwards2225 ledwards2225 merged commit f45d20d into master May 22, 2024
73 checks passed
@ledwards2225 ledwards2225 deleted the lde/finalize_fix branch May 22, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants