Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add INCR privileged instructions #734

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from
Open

Conversation

Nashtare
Copy link
Collaborator

@Nashtare Nashtare commented Oct 18, 2024

Add a series of 4 INCR privileged instructions (INCR1, INCR2, INCR3 and INCR4) to increment by 1 the Nth element of the stack in place (i.e. no PUSH / POP).
Particularly helpful for accumulators increment previously requiring SWAPN PUSH 1 ADD SWAPN now only requiring INCRN.

Though having an overall lesser impact, we could see how interesting a DECR variant would be (we could add it at no cost by combining it with the INCR CPU column).

Removes 4% to 5% of CPU cycles on mainnet blocks.

Total CPU columns for vanilla type1: 86

MemBefore new initial size:

  • vanilla type1: 63199
  • type2: 62691

@Nashtare Nashtare added the performance Performance improvement related changes label Oct 18, 2024
@Nashtare Nashtare added this to the Performance Tuning milestone Oct 18, 2024
@Nashtare Nashtare self-assigned this Oct 18, 2024
@github-actions github-actions bot added crate: evm_arithmetization Anything related to the evm_arithmetization crate. specs labels Oct 18, 2024
Comment on lines +105 to +106
cat $TEST_OUT_PATH
echo "Failed to create proof witnesses. See $TEST_OUT_PATH for more details."
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated, but mentioned on slack

Comment on lines -23 to -34
# Circuit sizes only matter in non test_only mode.
if ! [[ $8 == "test_only" ]]; then
export ARITHMETIC_CIRCUIT_SIZE="16..21"
export BYTE_PACKING_CIRCUIT_SIZE="8..21"
export CPU_CIRCUIT_SIZE="8..21"
export KECCAK_CIRCUIT_SIZE="4..20"
export KECCAK_SPONGE_CIRCUIT_SIZE="8..17"
export LOGIC_CIRCUIT_SIZE="4..21"
export MEMORY_CIRCUIT_SIZE="17..24"
export MEMORY_BEFORE_CIRCUIT_SIZE="16..23"
export MEMORY_AFTER_CIRCUIT_SIZE="7..23"
fi
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these match the default ones in .env

Copy link
Contributor

@muursh muursh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

Copy link
Contributor

@hratoanina hratoanina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but there are some issues with the constraints. We should also be able to get rid of the memory operations for INCR1.

use super::dup_swap::{constrain_channel_ext_circuit, constrain_channel_packed};
use crate::cpu::columns::CpuColumnsView;

/// Evaluates the constraints for the DUP and SWAP opcodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment.


let n = lv.opcode_bits[0]
+ lv.opcode_bits[1] * P::Scalar::from_canonical_u64(2)
+ lv.opcode_bits[2] * P::Scalar::from_canonical_u64(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought n was only two bits?

@@ -156,13 +156,12 @@ modmul_check_loop:
SWAP1
%decrement
// stack: n-1, base_addr, i, j, retdest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably faster to do:

INCR3 INCR4
// stack: n-1, base_addr, i+1, j+1, retdest
%stack (n, addr) -> (n, addr, n)

BYTES 3 // 0xe2, INCR3
BYTES 4 // 0xe3, INCR4

%rep 12 // 0xe5-0xef, invalid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%rep 12 // 0xe5-0xef, invalid
%rep 12 // 0xe4-0xef, invalid

@@ -119,9 +119,9 @@ buffer_update:
// stack: get, set, get , set , times , retdest
%mupdate_current_general
// stack: get , set , times , retdest
%increment
INCR1
INCR2
SWAP1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the two extra SWAP1.

@@ -597,6 +599,39 @@ pub(crate) fn generate_swap<F: RichField, T: Transition<F>>(
Ok(())
}

pub(crate) fn generate_incr<F: RichField, T: Transition<F>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure why it's working. For INCR2-4, there's no problem, but for INCR1 we are reading the stack (and writing) at address stack_len - 1. There is no guarantee that the current top of the stack has been written in memory, so I'm surprised the reads don't return a wrong value sometimes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some constraints seem to be missing, and some seem to be unneeded.

The current set of constraints work as intended for INCR2-4, but for INCR1 we are not checking that the output channel is equal to the next top of the stack.
Moreover, the value read in the input channel is not constrained to match the current top of the stack (tests pass so it seems to be the case, but it sounds like coincidence to me).

I think the clean way to do it is to filter all of the current constraints with lv.opcode_bits[0] (with a new one making sure that the top of the stack doesn't change), and handle INCR1 separately with filter 1 - lv.opcode_bits[0] (you can even disable the memory channels to save some memory rows).

Copy link
Contributor

@LindaGuiga LindaGuiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same concerns as Hamy regarding generate_incr and constraints for INCR1. For INCR2-4, it looks good to me besides some nits.

use super::dup_swap::{constrain_channel_ext_circuit, constrain_channel_packed};
use crate::cpu::columns::CpuColumnsView;

/// Evaluates the constraints for the DUP and SWAP opcodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Evaluates the constraints for the DUP and SWAP opcodes.
/// Evaluates the constraints for the INCR opcode.


let n = lv.opcode_bits[0]
+ lv.opcode_bits[1] * P::Scalar::from_canonical_u64(2)
+ lv.opcode_bits[2] * P::Scalar::from_canonical_u64(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opcodes go from e0 to e3, so I don't think you need + lv.opcode_bits[2] * P::Scalar::from_canonical_u64(4);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate: evm_arithmetization Anything related to the evm_arithmetization crate. performance Performance improvement related changes specs
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants