Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup bench framework to run experiments #1109

Merged
merged 4 commits into from
Oct 13, 2023
Merged

Conversation

jayz22
Copy link
Contributor

@jayz22 jayz22 commented Oct 13, 2023

What

Setup bench framework to run experiments

  • Experimental bench code is located in experimental (benches/common and src/cost_runner dirs)
  • Aggregate ContractCostType, WasmInsnType and a new ExperimentalCostType into top level CostType

Resolves #1030

  • Bring back EdwardsPointCurve25519ScalarMul as a experimental cost type and run variation histogram on it.

Why

[TODO: Why this change is being made. Include any context required to understand the why.]

Known limitations

[TODO or N/A]

@jayz22 jayz22 requested review from graydon, sisuresh and a team as code owners October 13, 2023 00:01
@jayz22
Copy link
Contributor Author

jayz22 commented Oct 13, 2023

Calibration result shows both EdwardsPointCurve25519ScalarMul and VerifyEd25519Sig have a large variation (max/min = 14 and 6 respectively, see screenshot), which confirms @graydon's concern in #1030

Screen Shot 2023-10-12 at 6 43 45 PM

Will next figure out how to address it.

@jayz22
Copy link
Contributor Author

jayz22 commented Oct 13, 2023

Upon closer examination, I change my conclusion above. The worst case vs the average case is actually pretty close.

I realize the above experiment isn't really testing what we are looking for. We are trying to answer the question "is our calibration, which is based on the average case, underestimating the worst case".
Here are some tweaks to the setup in order to better answer that question:

  1. Throw away the "best" case. For EdwardsPointCurve25519ScalarMul (doesn't matter too much for VerifyEd25519Sig ), the best case is significantly faster than the average, but it's rare and doesn't contribute to the problem statement above.
  2. Fix the input size. For VerifyEd25519Sig that is the message size. The variation analysis should fix the input size and examine the effect of random bytes in the input. (Longer message will cost more to verify sig but that's not what we are interested in discovering)
  3. Increase the iteration count from 100->1000

With the above tweaks, there are the results:
Screen Shot 2023-10-13 at 1 09 32 PM

The ration between worst/best case for EdwardsPointCurve25519ScalarMul is 1.34.
For VerifyEd25519Sig there is no trivial way to set the worst case, so everything is random. With 1000 sample points, the ratio between max/min is just 1.008.

So I conclude the worst case of ed25519 is not significantly worst than the average, thus no need to tweak our calibration setup or numbers.

Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for investigating

@graydon graydon added this pull request to the merge queue Oct 13, 2023
Merged via the queue into stellar:main with commit 930ba53 Oct 13, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

confirm worst-case calibration of ed25519 operations
2 participants