Setup bench framework to run experiments #1109

jayz22 · 2023-10-13T00:01:35Z

What

Setup bench framework to run experiments

Experimental bench code is located in experimental (benches/common and src/cost_runner dirs)
Aggregate ContractCostType, WasmInsnType and a new ExperimentalCostType into top level CostType

Resolves #1030

Bring back EdwardsPointCurve25519ScalarMul as a experimental cost type and run variation histogram on it.

Why

[TODO: Why this change is being made. Include any context required to understand the why.]

Known limitations

[TODO or N/A]

jayz22 · 2023-10-13T00:11:12Z

Calibration result shows both EdwardsPointCurve25519ScalarMul and VerifyEd25519Sig have a large variation (max/min = 14 and 6 respectively, see screenshot), which confirms @graydon's concern in #1030

~~Will next figure out how to address it.~~

jayz22 · 2023-10-13T17:26:22Z

Upon closer examination, I change my conclusion above. The worst case vs the average case is actually pretty close.

I realize the above experiment isn't really testing what we are looking for. We are trying to answer the question "is our calibration, which is based on the average case, underestimating the worst case".
Here are some tweaks to the setup in order to better answer that question:

Throw away the "best" case. For EdwardsPointCurve25519ScalarMul (doesn't matter too much for VerifyEd25519Sig ), the best case is significantly faster than the average, but it's rare and doesn't contribute to the problem statement above.
Fix the input size. For VerifyEd25519Sig that is the message size. The variation analysis should fix the input size and examine the effect of random bytes in the input. (Longer message will cost more to verify sig but that's not what we are interested in discovering)
Increase the iteration count from 100->1000

With the above tweaks, there are the results:

The ration between worst/best case for EdwardsPointCurve25519ScalarMul is 1.34.
For VerifyEd25519Sig there is no trivial way to set the worst case, so everything is random. With 1000 sample points, the ratio between max/min is just 1.008.

So I conclude the worst case of ed25519 is not significantly worst than the average, thus no need to tweak our calibration setup or numbers.

graydon

Looks good! Thanks for investigating

jayz22 requested review from graydon, sisuresh and a team as code owners October 13, 2023 00:01

jayz22 added 2 commits October 13, 2023 13:52

Setup bench framework to run experiments

57cbefb

Add more options to measure_cost_variation

361098a

jayz22 force-pushed the calibration branch from b694abf to 361098a Compare October 13, 2023 17:52

jayz22 added 2 commits October 13, 2023 13:53

fixup! Add more options to measure_cost_variation

cff6983

Merge branch 'main' into calibration

fc41b2e

graydon enabled auto-merge October 13, 2023 18:38

graydon approved these changes Oct 13, 2023

View reviewed changes

graydon added this pull request to the merge queue Oct 13, 2023

Merged via the queue into stellar:main with commit 930ba53 Oct 13, 2023
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup bench framework to run experiments #1109

Setup bench framework to run experiments #1109

jayz22 commented Oct 13, 2023 •

edited

Loading

jayz22 commented Oct 13, 2023 •

edited

Loading

jayz22 commented Oct 13, 2023

graydon left a comment

Setup bench framework to run experiments #1109

Setup bench framework to run experiments #1109

Conversation

jayz22 commented Oct 13, 2023 • edited Loading

What

Why

Known limitations

jayz22 commented Oct 13, 2023 • edited Loading

jayz22 commented Oct 13, 2023

graydon left a comment

Choose a reason for hiding this comment

jayz22 commented Oct 13, 2023 •

edited

Loading

jayz22 commented Oct 13, 2023 •

edited

Loading