Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: deterministically deduplicate cached_partial_non_native_field_multiplication across wasm32 and native compilations #3425

Merged
merged 6 commits into from
Nov 27, 2023

Conversation

kevaundray
Copy link
Contributor

@kevaundray kevaundray commented Nov 25, 2023

This essentially fixes the implementation of < on the struct in question

Read comment below for more context.

Checklist:

Remove the checklist to signal you've completed it. Enable auto-merge if the PR is ready to merge.

  • If the pull request requires a cryptography review (e.g. cryptographic algorithm implementations) I have added the 'crypto' tag.
  • I have reviewed my diff in github, line by line and removed unexpected formatting changes, testing logs, or commented-out code.
  • Every change is related to the PR description.
  • I have linked this pull request to relevant issues (if any exist).

@kevaundray
Copy link
Contributor Author

kevaundray commented Nov 25, 2023

Copying from slack:

Context

We have two ways to create proofs, either natively or via javascript
We have three ways to verify proofs, either natively, via javascript or via a solidity verifier

Symptoms

We had a bug only with ecdsa signature verification

  • Creating an ecdsa verification proof natively and verifying it natively, would work
  • Creating an ecdsa verification proof in javascript and verifying in javascript would work
  • Creating an ecdsa verification proof natively and verifying in solidity would work
  • Creating an ecdsa verification proof natively and verifying in javascript would not work
  • Creating an ecdsa verification proof in javascript and verifying natively or via solidity would not work

Preliminary conclusions

  • There must be special about the codepath that ecdsa uses, since everything works when we don't use ecdsa verification.
  • The javascript implementation is consistent with itself, since it can verify proofs created with it.
  • The native implementation is consistent with itself
  • The solidity implementation is consistent with the native implementation which makes sense, since the verification key that gets embedded into the solidity verifier is created from the native implementation

Methodology

  • We can recreate the bug in the barretenberg repository by creating the verification key natively for a circuit which uses ecdsa verification, then creating the verification key using the bb.js node implementation.
  • Using bash to hash the verification keys shows different verification key hashes.
  • Then its mostly a game of:
  • Bisect the code
    • Simplify loops and logic to reduce the noise
    • Recompile js and native
    • Check if verification key hashes are the same
    • If hashes are the same, then inspect the gates being added in both js and native just before the code bisection
      Bug

High Level: The < operator on cached_partial_non_native_field_multiplication was not implemented properly.

This struct looks like this:

   struct cached_partial_non_native_field_multiplication {
        std::array<uint32_t, 5> a;
        std::array<uint32_t, 5> b;
        FF lo_0;
        FF hi_0;
        FF hi_1;
}

There were two reasons it was not implemented properly:

Using == in less than operator

(benign) The first is the fact it was doing an equality check, which technically it should not be doing since we are doing implementing < and not <=.

The fix is to implement < as:

            if (a < other.a) {
                return true;
            }
            if (other.a < a) {
                return false;
            }
            if (b < other.b) {
                return true;
            }
            if (other.b < b) {
                return false;
            }

Notice that this does not check lo_0, hi_0 and hi_1. This is the second bug.

Comparing Fields

Because we were not explicitly comparing fields, we were essentially leaving this to compiler to do. I checked the cpp docs and it will actually just ignore the field types, but because the sort is not stable, it will not preserve the order of items where the a and b vectors are the same.

This is a problem because we sort a vector of these structs and then iterate over the vector and for each element, we create a gate. So if the order is not deterministic, we create a different circuit.

Note: I'm assuming that it was done this way because the implementor assumed that just be comparing the non-field elements would be enough to create deterministic ordering. I manually printed out every element and double checked that this is not correct.

The fix:

            if (uint256_t(lo_0) < uint256_t(other.lo_0)) {
                return true;
            }
            if (uint256_t(other.lo_0) < uint256_t(lo_0)) {
                return false;
            }

            if (uint256_t(hi_0) < uint256_t(other.hi_0)) {
                return true;
            }
            if (uint256_t(other.hi_0) < uint256_t(hi_0)) {
                return false;
            }

            return uint256_t(hi_1) < uint256_t(other.hi_1);

Retro conclusion

The above struct is used for non native arithmetic, so it makes sense that for algorithms that do not require it, the bug did not show up. I imagine that this would also pop up for recursion. However, I am not 100% certain because while bisecting and simplifying, there were red-herrings where the ordering coincidentally lined up resulting in the same circuit

@AztecBot
Copy link
Collaborator

AztecBot commented Nov 25, 2023

Benchmark results

Metrics with a significant change:

  • circuit_simulation_time_in_ms (private-kernel-init): 399 (-48%)
  • circuit_simulation_time_in_ms (base-rollup): 3,179 (+81%)
  • circuit_simulation_time_in_ms (root-rollup): 89.6 (-47%)
  • circuit_simulation_time_in_ms (public-kernel-private-input): 417 (-27%)
  • circuit_simulation_time_in_ms (merge-rollup): 10.8 (-30%)
  • circuit_input_size_in_bytes (private-kernel-init): 43,077 (-30%)
  • circuit_input_size_in_bytes (private-kernel-inner): 64,484 (-21%)
  • circuit_input_size_in_bytes (public-kernel-private-input): 25,203 (-39%)
  • circuit_input_size_in_bytes (public-kernel-non-first-iteration): 25,245 (-39%)
  • circuit_output_size_in_bytes (private-kernel-ordering): 9,689 (+19%)
  • tx_size_in_bytes (0): 10,323 (+17%)
  • l2_block_building_time_in_ms (8): 25,300 (+23%)
  • l2_block_building_time_in_ms (32): 101,256 (+24%)
  • l2_block_building_time_in_ms (128): 401,752 (+23%)
  • l2_block_rollup_simulation_time_in_ms (8): 17,718 (+48%)
  • l2_block_rollup_simulation_time_in_ms (32): 71,115 (+50%)
  • l2_block_rollup_simulation_time_in_ms (128): 280,646 (+49%)
Detailed results

All benchmarks are run on txs on the Benchmarking contract on the repository. Each tx consists of a batch call to create_note and increment_balance, which guarantees that each tx has a private call, a nested private call, a public call, and a nested public call, as well as an emitted private note, an unencrypted log, and public storage read and write.

This benchmark source data is available in JSON format on S3 here.

Values are compared against data from master at commit 2001d474 and shown if the difference exceeds 1%.

L2 block published to L1

Each column represents the number of txs on an L2 block published to L1.

Metric 8 txs 32 txs 128 txs
l1_rollup_calldata_size_in_bytes 45,444 179,588 716,132
l1_rollup_calldata_gas 222,900 868,328 3,449,312
l1_rollup_execution_gas 841,987 3,595,436 22,204,681
l2_block_processing_time_in_ms 2,030 7,667 29,832 (-1%)
note_successful_decrypting_time_in_ms 303 (+1%) 900 (+2%) 3,277 (+2%)
note_trial_decrypting_time_in_ms 7.41 (+29%) 109 (+19%) 142 (+1%)
l2_block_building_time_in_ms ⚠️ 25,300 (+23%) ⚠️ 101,256 (+24%) ⚠️ 401,752 (+23%)
l2_block_rollup_simulation_time_in_ms ⚠️ 17,718 (+48%) ⚠️ 71,115 (+50%) ⚠️ 280,646 (+49%)
l2_block_public_tx_process_time_in_ms 7,537 (-13%) 29,997 (-13%) 120,550 (-13%)

L2 chain processing

Each column represents the number of blocks on the L2 chain where each block has 16 txs.

Metric 5 blocks 10 blocks
node_history_sync_time_in_ms 21,889 42,760 (+1%)
note_history_successful_decrypting_time_in_ms 2,082 (+2%) 4,097 (+3%)
note_history_trial_decrypting_time_in_ms 126 158 (+2%)
node_database_size_in_bytes 1,627,939 1,097,726
pxe_database_size_in_bytes 29,748 59,307

Circuits stats

Stats on running time and I/O sizes collected for every circuit run across all benchmarks.

Circuit circuit_simulation_time_in_ms circuit_input_size_in_bytes circuit_output_size_in_bytes
private-kernel-init ⚠️ 399 (-48%) ⚠️ 43,077 (-30%) 20,441 (+8%)
private-kernel-ordering 126 (-1%) 25,833 (+6%) ⚠️ 9,689 (+19%)
base-rollup ⚠️ 3,179 (+81%) 659,500 873
root-rollup ⚠️ 89.6 (-47%) 4,072 1,097
private-kernel-inner 826 (+4%) ⚠️ 64,484 (-21%) 20,441 (+8%)
public-kernel-private-input ⚠️ 417 (-27%) ⚠️ 25,203 (-39%) 20,441 (+8%)
public-kernel-non-first-iteration 411 ⚠️ 25,245 (-39%) 20,441 (+8%)
merge-rollup ⚠️ 10.8 (-30%) 2,592 873

Miscellaneous

Transaction sizes based on how many contracts are deployed in the tx.

Metric 0 deployed contracts 1 deployed contracts
tx_size_in_bytes ⚠️ 10,323 (+17%) 29,083 (+6%)

@kevaundray kevaundray changed the title chore: deterministically sort cached_partial_non_native_field_multiplication chore: deterministically sort cached_partial_non_native_field_multiplication across wasm32 and native compilations Nov 25, 2023
@kevaundray
Copy link
Contributor Author

Longer term, we should remove the operator< and instead have something like:

   template <typename T> std::vector<T> make_unique_and_preserve_order(const std::vector<T>& vec)
    {
        std::unordered_set<T> seen;
        std::vector<T> unique_vec;

        for (const auto& item : vec) {
            if (seen.insert(item).second) {
                unique_vec.push_back(item);
            }
        }

        return unique_vec;
    }
    ```
    This would require Hash and operator==.
    
    instead of std::sort, std::unique, because this is he only reason for the operator< implementation (we never compare this struct in any other scenario)

@kevaundray
Copy link
Contributor Author

Seems changes to this branch are not showing up

Screenshot 2023-11-27 at 19 41 48

@kevaundray kevaundray changed the title chore: deterministically sort cached_partial_non_native_field_multiplication across wasm32 and native compilations chore: deterministically deduplicate cached_partial_non_native_field_multiplication across wasm32 and native compilations Nov 27, 2023
@kevaundray kevaundray enabled auto-merge (squash) November 27, 2023 22:03
@kevaundray kevaundray merged commit 5524933 into master Nov 27, 2023
80 checks passed
@kevaundray kevaundray deleted the kw/fix-non-determinism-between-wasm-native branch November 27, 2023 22:36
kevaundray pushed a commit that referenced this pull request Nov 27, 2023
🤖 I have created a release *beep* *boop*
---


<details><summary>aztec-packages: 0.16.0</summary>

##
[0.16.0](aztec-packages-v0.15.1...aztec-packages-v0.16.0)
(2023-11-27)


### ⚠ BREAKING CHANGES

* deprecate circuits/cpp
([#3421](#3421))
* call stack validation optimisation.
([#3387](#3387))

### Features

* Base rollup in noir
([#3257](#3257))
([4a1e9c3](4a1e9c3))
* Call stack validation optimisation.
([#3387](#3387))
([d06d5db](d06d5db))
* Goblin proof construction
([#3332](#3332))
([6a7ebb6](6a7ebb6))
* More logs relevant for debugging failures of 2 pixies test
([#3370](#3370))
([683a0f3](683a0f3))
* Noir subrepo.
([#3369](#3369))
([d94d88b](d94d88b))
* Noir_wasm compilation of noir programs
([#3272](#3272))
([f9981d5](f9981d5))
* Rollback public state changes on failure
([#3393](#3393))
([0e276fb](0e276fb))


### Bug Fixes

* **docs:** Doc explaining noir debug_log
([#3322](#3322))
([eed023d](eed023d))
* Naming inconsistency in private kernel
([#3384](#3384))
([4743486](4743486))
* Race condition in `PXE.getTxReceipt(...)`
([#3411](#3411))
([9557a66](9557a66))


### Miscellaneous

* Deprecate circuits/cpp
([#3421](#3421))
([4973cfb](4973cfb))
* Deterministically deduplicate
`cached_partial_non_native_field_multiplication` across wasm32 and
native compilations
([#3425](#3425))
([5524933](5524933))
* **docs:** Common patterns and anti patterns in aztec.nr
([#3413](#3413))
([65bd855](65bd855))
* Fix and reenable e2e quick start
([#3403](#3403))
([112740e](112740e)),
closes
[#3356](#3356)
* Fix intermittent failures for block-building e2e test
([#3404](#3404))
([e76e2d4](e76e2d4)),
closes
[#3358](#3358)
* Formatted `noir-contracts` and `aztec-nr`
([a73c4aa](a73c4aa))
* Initial clone of noir to subrepo
([#3409](#3409))
([8f1cb83](8f1cb83))
* **noir-contracts:** Remove redundant return value of 1
([#3415](#3415))
([2001d47](2001d47)),
closes
[#2615](#2615)
* Plumbs noir subrepo into yarn-project.
([#3420](#3420))
([63173c4](63173c4))
* Remove pxe / node /p2p-bootstrap docker images
([#3396](#3396))
([c236143](c236143))
* Skip artifacts for prettier
([#3399](#3399))
([98d9e04](98d9e04))
* Update path to acir artifacts
([#3426](#3426))
([f56f88d](f56f88d))
</details>

<details><summary>barretenberg.js: 0.16.0</summary>

##
[0.16.0](barretenberg.js-v0.15.1...barretenberg.js-v0.16.0)
(2023-11-27)


### Miscellaneous

* Plumbs noir subrepo into yarn-project.
([#3420](#3420))
([63173c4](63173c4))
</details>

<details><summary>barretenberg: 0.16.0</summary>

##
[0.16.0](barretenberg-v0.15.1...barretenberg-v0.16.0)
(2023-11-27)


### Features

* Goblin proof construction
([#3332](#3332))
([6a7ebb6](6a7ebb6))
* Noir subrepo.
([#3369](#3369))
([d94d88b](d94d88b))


### Miscellaneous

* Deterministically deduplicate
`cached_partial_non_native_field_multiplication` across wasm32 and
native compilations
([#3425](#3425))
([5524933](5524933))
* Plumbs noir subrepo into yarn-project.
([#3420](#3420))
([63173c4](63173c4))
* Update path to acir artifacts
([#3426](#3426))
([f56f88d](f56f88d))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
AztecBot added a commit to AztecProtocol/barretenberg that referenced this pull request Nov 28, 2023
🤖 I have created a release *beep* *boop*
---


<details><summary>aztec-packages: 0.16.0</summary>

##
[0.16.0](AztecProtocol/aztec-packages@aztec-packages-v0.15.1...aztec-packages-v0.16.0)
(2023-11-27)


### ⚠ BREAKING CHANGES

* deprecate circuits/cpp
([#3421](AztecProtocol/aztec-packages#3421))
* call stack validation optimisation.
([#3387](AztecProtocol/aztec-packages#3387))

### Features

* Base rollup in noir
([#3257](AztecProtocol/aztec-packages#3257))
([4a1e9c3](AztecProtocol/aztec-packages@4a1e9c3))
* Call stack validation optimisation.
([#3387](AztecProtocol/aztec-packages#3387))
([d06d5db](AztecProtocol/aztec-packages@d06d5db))
* Goblin proof construction
([#3332](AztecProtocol/aztec-packages#3332))
([6a7ebb6](AztecProtocol/aztec-packages@6a7ebb6))
* More logs relevant for debugging failures of 2 pixies test
([#3370](AztecProtocol/aztec-packages#3370))
([683a0f3](AztecProtocol/aztec-packages@683a0f3))
* Noir subrepo.
([#3369](AztecProtocol/aztec-packages#3369))
([d94d88b](AztecProtocol/aztec-packages@d94d88b))
* Noir_wasm compilation of noir programs
([#3272](AztecProtocol/aztec-packages#3272))
([f9981d5](AztecProtocol/aztec-packages@f9981d5))
* Rollback public state changes on failure
([#3393](AztecProtocol/aztec-packages#3393))
([0e276fb](AztecProtocol/aztec-packages@0e276fb))


### Bug Fixes

* **docs:** Doc explaining noir debug_log
([#3322](AztecProtocol/aztec-packages#3322))
([eed023d](AztecProtocol/aztec-packages@eed023d))
* Naming inconsistency in private kernel
([#3384](AztecProtocol/aztec-packages#3384))
([4743486](AztecProtocol/aztec-packages@4743486))
* Race condition in `PXE.getTxReceipt(...)`
([#3411](AztecProtocol/aztec-packages#3411))
([9557a66](AztecProtocol/aztec-packages@9557a66))


### Miscellaneous

* Deprecate circuits/cpp
([#3421](AztecProtocol/aztec-packages#3421))
([4973cfb](AztecProtocol/aztec-packages@4973cfb))
* Deterministically deduplicate
`cached_partial_non_native_field_multiplication` across wasm32 and
native compilations
([#3425](AztecProtocol/aztec-packages#3425))
([5524933](AztecProtocol/aztec-packages@5524933))
* **docs:** Common patterns and anti patterns in aztec.nr
([#3413](AztecProtocol/aztec-packages#3413))
([65bd855](AztecProtocol/aztec-packages@65bd855))
* Fix and reenable e2e quick start
([#3403](AztecProtocol/aztec-packages#3403))
([112740e](AztecProtocol/aztec-packages@112740e)),
closes
[#3356](AztecProtocol/aztec-packages#3356)
* Fix intermittent failures for block-building e2e test
([#3404](AztecProtocol/aztec-packages#3404))
([e76e2d4](AztecProtocol/aztec-packages@e76e2d4)),
closes
[#3358](AztecProtocol/aztec-packages#3358)
* Formatted `noir-contracts` and `aztec-nr`
([a73c4aa](AztecProtocol/aztec-packages@a73c4aa))
* Initial clone of noir to subrepo
([#3409](AztecProtocol/aztec-packages#3409))
([8f1cb83](AztecProtocol/aztec-packages@8f1cb83))
* **noir-contracts:** Remove redundant return value of 1
([#3415](AztecProtocol/aztec-packages#3415))
([2001d47](AztecProtocol/aztec-packages@2001d47)),
closes
[#2615](AztecProtocol/aztec-packages#2615)
* Plumbs noir subrepo into yarn-project.
([#3420](AztecProtocol/aztec-packages#3420))
([63173c4](AztecProtocol/aztec-packages@63173c4))
* Remove pxe / node /p2p-bootstrap docker images
([#3396](AztecProtocol/aztec-packages#3396))
([c236143](AztecProtocol/aztec-packages@c236143))
* Skip artifacts for prettier
([#3399](AztecProtocol/aztec-packages#3399))
([98d9e04](AztecProtocol/aztec-packages@98d9e04))
* Update path to acir artifacts
([#3426](AztecProtocol/aztec-packages#3426))
([f56f88d](AztecProtocol/aztec-packages@f56f88d))
</details>

<details><summary>barretenberg.js: 0.16.0</summary>

##
[0.16.0](AztecProtocol/aztec-packages@barretenberg.js-v0.15.1...barretenberg.js-v0.16.0)
(2023-11-27)


### Miscellaneous

* Plumbs noir subrepo into yarn-project.
([#3420](AztecProtocol/aztec-packages#3420))
([63173c4](AztecProtocol/aztec-packages@63173c4))
</details>

<details><summary>barretenberg: 0.16.0</summary>

##
[0.16.0](AztecProtocol/aztec-packages@barretenberg-v0.15.1...barretenberg-v0.16.0)
(2023-11-27)


### Features

* Goblin proof construction
([#3332](AztecProtocol/aztec-packages#3332))
([6a7ebb6](AztecProtocol/aztec-packages@6a7ebb6))
* Noir subrepo.
([#3369](AztecProtocol/aztec-packages#3369))
([d94d88b](AztecProtocol/aztec-packages@d94d88b))


### Miscellaneous

* Deterministically deduplicate
`cached_partial_non_native_field_multiplication` across wasm32 and
native compilations
([#3425](AztecProtocol/aztec-packages#3425))
([5524933](AztecProtocol/aztec-packages@5524933))
* Plumbs noir subrepo into yarn-project.
([#3420](AztecProtocol/aztec-packages#3420))
([63173c4](AztecProtocol/aztec-packages@63173c4))
* Update path to acir artifacts
([#3426](AztecProtocol/aztec-packages#3426))
([f56f88d](AztecProtocol/aztec-packages@f56f88d))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants