Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVM modexp: solve DOS vectors #286

Merged
merged 9 commits into from
Oct 18, 2023
Merged

EVM modexp: solve DOS vectors #286

merged 9 commits into from
Oct 18, 2023

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented Oct 18, 2023

This follows up on #249.

A pattern of DOS vectors was in the form of small a and e and large M for a^e (mod M) whether M was odd or even or a power-of-2.

There were several causes that can be seen in the current master 34baa74

Using metering on one of the DOS vector
image

  1. 25% of the time is spent on converting to Montgomery residue form, i.e. this is not amortized over enough multiplications
  2. 33% of the time is spent in the "prologue" which creates a precompute table. But that precomputation requires 2^window operations and eth_evm_modexp sets the window to 4 by default. This is pure waste when the exponent is 7 or 16 as the total operations needed is less than the prologue.

For 1:

  • the approach to compute Montgomery magic constant R (mod p) and R² (mod p) by repeated doubling is actually very slow for R²,
    func r_powmod_vartime(r: var openArray[SecretWord], M: openArray[SecretWord], n: static int) =
    ## Returns the Montgomery domain magic constant for the input modulus:
    ##
    ## R ≡ R (mod M) with R = (2^WordBitWidth)^numWords
    ## or
    ## R² ≡ R² (mod M) with R = (2^WordBitWidth)^numWords
    ##
    ## Assuming a field modulus of size 256-bit with 63-bit words, we require 5 words
    ## R² ≡ ((2^63)^5)^2 (mod M) = 2^630 (mod M)
    # Algorithm
    # Bos and Montgomery, Montgomery Arithmetic from a Software Perspective
    # https://eprint.iacr.org/2017/1057.pdf
    #
    # For R = r^n = 2^wn and 2^(wn − 1) ≤ N < 2^wn
    # r^n = 2^63 in on 64-bit and w the number of words
    #
    # 1. C0 = 2^(wn - 1), the power of two immediately less than N
    # 2. for i in 1 ... wn+1
    # Ci = C(i-1) + C(i-1) (mod M)
    #
    # Thus: C(wn+1) ≡ 2^(wn+1) C0 ≡ 2^(wn + 1) 2^(wn - 1) ≡ 2^(2wn) ≡ (2^wn)^2 ≡ R² (mod M)
    debug:
    doAssert bool(M[0] and One)
    doAssert BaseType(M[M.len-1]) != 0
    doAssert r.len == M.len
    let
    w = M.len
    msb = int log2_vartime(BaseType M[M.len-1])
    start = (w-1)*WordBitWidth + msb
    stop = n*WordBitWidth*w
    for i in 0 ..< r.len-1:
    r[i] = Zero
    r[r.len-1] = SecretWord(BaseType(1) shl msb) # C0 = 2^(wn-1), the power of 2 immediatly less than the modulus
    for i in start ..< stop:
    r.doublemod_vartime(r, M)

    This actually should also be fixed for the fixed precision / elliptic curve arithmetic as it might quite improve compile-time
  • Using a vartime reduction quite improve performance. However R² is actually not required, instead of computing R² (mod p) and then doing a montmul(a, R²) for conversion, it's faster to do aR (mod p) especially given that R is a power-of-2 so aR only needs left shift.

The vartime division is faster than GMP for sizes <= 968 bits
image

For 2:

  • When exponent is less or equal 255, window optimization won't be used.

The current master 34baa74 takes 1.1s and 1.9s for the identified DOS vector.

image

After f79b050, perf has been improved by 2.7x to 3.3x

image

image

However, an Ethereum block full of these computations would still delay a node by 500ms which is too high for networking.

Finally the last insight is that for reductions aren't actually needed there, bringing performance up by 10x to 20x.

image
image

A side-benefit is that now with Clang, modular exponentiation is decidedly faster than GMP even without assembly:

image

while before it was a little slower:

image

@mratsim mratsim merged commit 4ccd8aa into master Oct 18, 2023
12 checks passed
@mratsim mratsim deleted the fuzz-6-modexp-slow-refactor branch October 19, 2023 07:11
@mratsim mratsim mentioned this pull request Oct 21, 2023
mratsim added a commit that referenced this pull request Oct 22, 2023
* fix the new div2n1n_vartime on 32-bit - regression from #286

* remove unnecessary defensive programming

* reactivate 32-bit CI to check on #244

* 32-bit: centralize OS, ISA and env variable config

* enable assemble on x86 32-bit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant