Skip to content
This repository has been archived by the owner on Jul 5, 2024. It is now read-only.

Implement ADDMOD #564

Merged
merged 15 commits into from
Jun 21, 2022
Merged

Implement ADDMOD #564

merged 15 commits into from
Jun 21, 2022

Conversation

adria0
Copy link
Member

@adria0 adria0 commented Jun 10, 2022

Specs in https://github.com/privacy-scaling-explorations/zkevm-specs/blob/master/specs/opcode/08ADDMOD.md. It shares some code with MULMOD

@github-actions github-actions bot added the crate-zkevm-circuits Issues related to the zkevm-circuits workspace member label Jun 10, 2022
@adria0 adria0 force-pushed the feature/addmod branch 3 times, most recently from a55976b to b83316d Compare June 20, 2022 08:53
@CPerezz
Copy link
Member

CPerezz commented Jun 20, 2022

So for the curiuos ones:

After checking the issue with @ed255 we saw that adding codegen-units=1 as it is in the bench profile solves the issue.

That basically tells us that the error is on the compilation pipeline of rustc. Which leads to a machine code that tries to access a wrong memory section.

We've obtained LLDB and gdb traces and will submit an issue into rustlang/rust

What that means for us?

  • Meanwhile the issue is not solved, we need to compile our code with codegen-units=1 (which will cause the CI to be significantly slower).
    We do have the alternative which is #[ignore] all the bytecode tests that lead to this issue until the error in rust is solved and we have a new release for it.

We need to try one last thing which is try the latest nightly and stable toolchains and compile with them.

This might show that newer releases come with this issue solved and therefore we can ignore this (although we fill the issue anyway).

This shares a lot of similarities with rust-lang/rust#62896 and might have to do with a misscompilation that rustc is doing.

Copy link
Member

@ed255 ed255 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good!
Please take a look at my comments.

The current concern right now for this PR is resolving the segfault, which seems unrelated to this PR.

zkevm-circuits/src/evm_circuit/util/math_gadget.rs Outdated Show resolved Hide resolved
zkevm-circuits/src/evm_circuit/execution/addmod.rs Outdated Show resolved Hide resolved
zkevm-circuits/src/evm_circuit/execution/addmod.rs Outdated Show resolved Hide resolved
@ed255
Copy link
Member

ed255 commented Jun 20, 2022

In my current investigation of the segfault I made the hypothesis that the issue comes from a stack overflow.

Analyzing the segfault with gdb we see that it happens in the function probestack. The code in this function says:

// Our goal here is to touch each page between %rsp+8 and %rsp+8-%rax,
// ensuring that if any pages are unmapped we'll make a page fault.

I was puzzled because there was no report of stack overflow, only invalid memory. But maybe it's the case that a stack overflow is only reported for the main thread, and other threads just crash when accessing the memory outside of the stack?

To test this hypothesis I ran the test with an increased minimum stack for threads (default is 2MB, I use 16MB):

RUST_BACKTRACE=1 RUST_MIN_STACK=16777216 cargo test --release addmod_simple -- --nocapture

And the tests passes without crash!

@CPerezz
Copy link
Member

CPerezz commented Jun 20, 2022

In my current investigation of the segfault I made the hypothesis that the issue comes from a stack overflow.

Analyzing the segfault with gdb we see that it happens in the function probestack. The code in this function says:

// Our goal here is to touch each page between %rsp+8 and %rsp+8-%rax,
// ensuring that if any pages are unmapped we'll make a page fault.

I was puzzled because there was no report of stack overflow, only invalid memory. But maybe it's the case that a stack overflow is only reported for the main thread, and other threads just crash when accessing the memory outside of the stack?

To test this hypothesis I ran the test with an increased minimum stack for threads (default is 2MB, I use 16MB):

RUST_BACKTRACE=1 RUST_MIN_STACK=16777216 cargo test --release addmod_simple -- --nocapture

And the tests passes without crash!

Should we make a PR including this env var to .cargo/config.toml?

@adria0
Copy link
Member Author

adria0 commented Jun 20, 2022

And the tests passes without crash!

Should we make a PR including this env var to .cargo/config.toml?

Awesome @ed255! Confirmed that also works with M1 :)

self.muladd_d_n_r.assign(
region,
offset,
[d, n, a_reduced_plus_b, a_reduced_plus_b_overflow],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[d, n, a_reduced_plus_b, a_reduced_plus_b_overflow],
[d, n, a_reduced_plus_b_overflow. a_reduced_plus_b],

The overflow bit should come first

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, bad news, all tests passes both for

[d, n, a_reduced_plus_b, a_reduced_plus_b_overflow] and
[d, n, a_reduced_plus_b_overflow, a_reduced_plus_b]

🧐

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... We need to figure out why! 🕵️‍♀️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a test case missing!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the missing test is not related...

@davidnevadoc, the MulAdd512WordGadgets is ok with the following witness

MulAddWords512Gadget([1,10,12,0], Some(2))

could you check it? Maybe I'm wrong somewhere but I am not able to find it

Copy link
Member

@ed255 ed255 Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found the reason why this works when passing swapped d and e in the assign function of MulAddWords512Gadget.
On one hand, MulAddWords512Gadget only assigns to its internal witness which are carry_0, carry_1 and carry_2. And those values are calculated from a, b, c, d, e. If I remember correctly, for ADDMOD we only needed 257 bit arithmetic, but we're using MulAddWords512Gadget for convenience. This means that the possible values we will encounter for the carry witnesses is more limited (I think they are always 0 or 1). And it seems that swapping d and e (using values from ADDMOD), the calculated carry witnesses stay the same! Or at least that's the case with the tests.

I'm not sure if there's a case where swapping d and e in the assign function for MulAddWords512Gadget could change the carry witnesses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see what @davidnevadoc says, @ed255 !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's a case where swapping d and e in the assign function for MulAddWords512Gadget could change the carry witnesses.

I've tried to find some combination of inputs that calculates in different carry witnesses when swapping d and e, but I didn't find any. It's a bit tricky because at this point the inputs are derived from other calculations. So even after trying I'm not convinced that such inputs don't exist.

Anyway, I think we can merge this PR even if this is not yet resolved!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this is what @ed255 said plus the fact that we are using saturating_sub in assign.
Turns out that in this example the correct values for the 3 carry are 0. When d and e are swapped, the computation for these carries would result in an underflow and a panic. Instead of this, the use of saturating_sub gives a 0 (the correct value by chance).
The constraints pass because the rest of the cells (d and e included) are placed correctly.

@ed255
Copy link
Member

ed255 commented Jun 20, 2022

I've been debugging for a while the current code trying to figure out what is causing the high stack consumption that leads to the overflow (seen with the invalid memory segfault), and I found one of the culprits!
The type ExecutionConfig in src/evm_circuit/execution.rs has been growing with each gadget that we've added, and it currently uses 337 KiB of memory. This struct is nested in other structs that are used as return values in functions, which causes several functions in the call stack to use more than 337 KiB of stack memory, which easily adds up!

While increasing the RUST_MIN_STACK solves the issue, I think we should also be careful to not have big values in the stack, so I propose moving the ExecutionConfig to the heap like this:

--- a/zkevm-circuits/src/evm_circuit.rs
+++ b/zkevm-circuits/src/evm_circuit.rs
@@ -22,7 +22,7 @@ use witness::Block;
 pub struct EvmCircuit<F> {
     fixed_table: [Column<Fixed>; 4],
     byte_table: [Column<Fixed>; 1],
-    execution: ExecutionConfig<F>,
+    execution: Box<ExecutionConfig<F>>,
 }

 impl<F: Field> EvmCircuit<F> {
@@ -38,7 +38,7 @@ impl<F: Field> EvmCircuit<F> {
         let fixed_table = [(); 4].map(|_| meta.fixed_column());
         let byte_table = [(); 1].map(|_| meta.fixed_column());

-        let execution = ExecutionConfig::configure(
+        let execution = Box::new(ExecutionConfig::configure(
             meta,
             power_of_randomness,
             &fixed_table,
@@ -47,7 +47,7 @@ impl<F: Field> EvmCircuit<F> {
             rw_table,
             bytecode_table,
             block_table,
-        );
+        ));

         Self {
             fixed_table,

Copy link
Member

@ed255 ed255 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Reviewing this took longer than expected!

Copy link
Contributor

@davidnevadoc davidnevadoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Good work 👍

self.muladd_d_n_r.assign(
region,
offset,
[d, n, a_reduced_plus_b, a_reduced_plus_b_overflow],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this is what @ed255 said plus the fact that we are using saturating_sub in assign.
Turns out that in this example the correct values for the 3 carry are 0. When d and e are swapped, the computation for these carries would result in an underflow and a panic. Instead of this, the use of saturating_sub gives a 0 (the correct value by chance).
The constraints pass because the rest of the cells (d and e included) are placed correctly.

@adria0 adria0 merged commit 50c5194 into main Jun 21, 2022
@adria0 adria0 deleted the feature/addmod branch August 2, 2022 11:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
crate-zkevm-circuits Issues related to the zkevm-circuits workspace member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants