Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This follows up on #249.
A pattern of DOS vectors was in the form of small a and e and large M for
a^e (mod M)
whether M was odd or even or a power-of-2.There were several causes that can be seen in the current master 34baa74
Using metering on one of the DOS vector
eth_evm_modexp
sets the window to 4 by default. This is pure waste when the exponent is 7 or 16 as the total operations needed is less than the prologue.For 1:
constantine/constantine/math_arbitrary_precision/arithmetic/limbs_montgomery.nim
Lines 29 to 68 in 34baa74
This actually should also be fixed for the fixed precision / elliptic curve arithmetic as it might quite improve compile-time
R² (mod p)
and then doing amontmul(a, R²)
for conversion, it's faster to doaR (mod p)
especially given that R is a power-of-2 soaR
only needs left shift.The vartime division is faster than GMP for sizes <= 968 bits
For 2:
The current master 34baa74 takes 1.1s and 1.9s for the identified DOS vector.
After f79b050, perf has been improved by 2.7x to 3.3x
However, an Ethereum block full of these computations would still delay a node by 500ms which is too high for networking.
Finally the last insight is that for reductions aren't actually needed there, bringing performance up by 10x to 20x.
A side-benefit is that now with Clang, modular exponentiation is decidedly faster than GMP even without assembly:
while before it was a little slower: