EVM modexp: solve DOS vectors #286

mratsim · 2023-10-18T22:25:15Z

This follows up on #249.

A pattern of DOS vectors was in the form of small a and e and large M for a^e (mod M) whether M was odd or even or a power-of-2.

There were several causes that can be seen in the current master 34baa74

Using metering on one of the DOS vector

25% of the time is spent on converting to Montgomery residue form, i.e. this is not amortized over enough multiplications
33% of the time is spent in the "prologue" which creates a precompute table. But that precomputation requires 2^window operations and eth_evm_modexp sets the window to 4 by default. This is pure waste when the exponent is 7 or 16 as the total operations needed is less than the prologue.

For 1:

the approach to compute Montgomery magic constant R (mod p) and R² (mod p) by repeated doubling is actually very slow for R²,

constantine/constantine/math_arbitrary_precision/arithmetic/limbs_montgomery.nim

Lines 29 to 68 in 34baa74

    
           func r_powmod_vartime(r: var openArray[SecretWord], M: openArray[SecretWord], n: static int) = 
        
             ## Returns the Montgomery domain magic constant for the input modulus: 
        
             ## 
        
             ##   R ≡ R (mod M) with R = (2^WordBitWidth)^numWords 
        
             ##   or 
        
             ##   R² ≡ R² (mod M) with R = (2^WordBitWidth)^numWords 
        
             ## 
        
             ## Assuming a field modulus of size 256-bit with 63-bit words, we require 5 words 
        
             ##   R² ≡ ((2^63)^5)^2 (mod M) = 2^630 (mod M) 
        
             # Algorithm 
        
             # Bos and Montgomery, Montgomery Arithmetic from a Software Perspective 
        
             # https://eprint.iacr.org/2017/1057.pdf 
        
             # 
        
             # For R = r^n = 2^wn and 2^(wn − 1) ≤ N < 2^wn 
        
             # r^n = 2^63 in on 64-bit and w the number of words 
        
             # 
        
             # 1. C0 = 2^(wn - 1), the power of two immediately less than N 
        
             # 2. for i in 1 ... wn+1 
        
             #      Ci = C(i-1) + C(i-1) (mod M) 
        
             # 
        
             # Thus: C(wn+1) ≡ 2^(wn+1) C0 ≡ 2^(wn + 1) 2^(wn - 1) ≡ 2^(2wn) ≡ (2^wn)^2 ≡ R² (mod M) 
        
             debug: 
        
               doAssert bool(M[0] and One) 
        
               doAssert BaseType(M[M.len-1]) != 0 
        
               doAssert r.len == M.len 
        
             let 
        
               w = M.len 
        
               msb = int log2_vartime(BaseType M[M.len-1]) 
        
               start = (w-1)*WordBitWidth + msb 
        
               stop = n*WordBitWidth*w 
        
             for i in 0 ..< r.len-1: 
        
               r[i] = Zero 
        
             r[r.len-1] = SecretWord(BaseType(1) shl msb) # C0 = 2^(wn-1), the power of 2 immediatly less than the modulus 
        
             for i in start ..< stop: 
        
               r.doublemod_vartime(r, M)

This actually should also be fixed for the fixed precision / elliptic curve arithmetic as it might quite improve compile-time

Using a vartime reduction quite improve performance. However R² is actually not required, instead of computing R² (mod p) and then doing a montmul(a, R²) for conversion, it's faster to do aR (mod p) especially given that R is a power-of-2 so aR only needs left shift.

The vartime division is faster than GMP for sizes <= 968 bits

For 2:

When exponent is less or equal 255, window optimization won't be used.

The current master 34baa74 takes 1.1s and 1.9s for the identified DOS vector.

After f79b050, perf has been improved by 2.7x to 3.3x

However, an Ethereum block full of these computations would still delay a node by 500ms which is too high for networking.

Finally the last insight is that for reductions aren't actually needed there, bringing performance up by 10x to 20x.

A side-benefit is that now with Clang, modular exponentiation is decidedly faster than GMP even without assembly:

while before it was a little slower:

…nt for small exponents. ~2.7x to 3.3x accel

…d to modulus. Fix DOS

* fix the new div2n1n_vartime on 32-bit - regression from #286 * remove unnecessary defensive programming * reactivate 32-bit CI to check on #244 * 32-bit: centralize OS, ISA and env variable config * enable assemble on x86 32-bit

mratsim added 9 commits October 10, 2023 23:12

stash prep for Barret Reduction

30ccbb7

benches lost in rebase

982b1c7

fix vartime reduction

b065e9b

some improvement and fixes on reduce_vartime

c80baac

Fuse reductions when converting to Montgomery + use window=1 in powMo…

f79b050

…nt for small exponents. ~2.7x to 3.3x accel

modexp: Introduce a no-reduction path for small base+exponent compare…

9f3c638

…d to modulus. Fix DOS

optim for padded exponents

dc04589

remove commented out code [skip ci]

3c8a777

Missing noInline for allocStackArray

2e13866

mratsim merged commit 4ccd8aa into master Oct 18, 2023
12 checks passed

mratsim mentioned this pull request Oct 18, 2023

modexp: 2.5x accel on small exponent #268

Merged

mratsim deleted the fuzz-6-modexp-slow-refactor branch October 19, 2023 07:11

mratsim added the hacktoberfest-accepted label Oct 21, 2023

mratsim added a commit that referenced this pull request Oct 21, 2023

fix the new div2n1n_vartime on 32-bit - regression from #286

12b6ce5

mratsim mentioned this pull request Oct 21, 2023

32-bit fixes #288

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVM modexp: solve DOS vectors #286

EVM modexp: solve DOS vectors #286

mratsim commented Oct 18, 2023

	func r_powmod_vartime(r: var openArray[SecretWord], M: openArray[SecretWord], n: static int) =
	## Returns the Montgomery domain magic constant for the input modulus:
	##
	## R ≡ R (mod M) with R = (2^WordBitWidth)^numWords
	## or
	## R² ≡ R² (mod M) with R = (2^WordBitWidth)^numWords
	##
	## Assuming a field modulus of size 256-bit with 63-bit words, we require 5 words
	## R² ≡ ((2^63)^5)^2 (mod M) = 2^630 (mod M)

	# Algorithm
	# Bos and Montgomery, Montgomery Arithmetic from a Software Perspective
	# https://eprint.iacr.org/2017/1057.pdf
	#
	# For R = r^n = 2^wn and 2^(wn − 1) ≤ N < 2^wn
	# r^n = 2^63 in on 64-bit and w the number of words
	#
	# 1. C0 = 2^(wn - 1), the power of two immediately less than N
	# 2. for i in 1 ... wn+1
	# Ci = C(i-1) + C(i-1) (mod M)
	#
	# Thus: C(wn+1) ≡ 2^(wn+1) C0 ≡ 2^(wn + 1) 2^(wn - 1) ≡ 2^(2wn) ≡ (2^wn)^2 ≡ R² (mod M)

	debug:
	doAssert bool(M[0] and One)
	doAssert BaseType(M[M.len-1]) != 0
	doAssert r.len == M.len

	let
	w = M.len
	msb = int log2_vartime(BaseType M[M.len-1])
	start = (w-1)*WordBitWidth + msb
	stop = nWordBitWidthw

	for i in 0 ..< r.len-1:
	r[i] = Zero
	r[r.len-1] = SecretWord(BaseType(1) shl msb) # C0 = 2^(wn-1), the power of 2 immediatly less than the modulus

	for i in start ..< stop:
	r.doublemod_vartime(r, M)

EVM modexp: solve DOS vectors #286

EVM modexp: solve DOS vectors #286

Conversation

mratsim commented Oct 18, 2023