Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI for running benchmarks #326

Merged
merged 6 commits into from
Jul 11, 2023
Merged

Add CI for running benchmarks #326

merged 6 commits into from
Jul 11, 2023

Conversation

YJDoc2
Copy link
Contributor

@YJDoc2 YJDoc2 commented Jun 24, 2023

Solves #324

As discussed, I have added CI for running benchmarks each time a PR is opened, and subsequently, when anyone comments /run-bench on the PR. This will run the benchmarks and post the result table as a comment on the PR. On subsequent runs, the same comment will be edited and updated with new results.

Because github needs the CI file to be present in default branch to run a CI, the bench CI won't run for this PR.

The CI is failing here, because the comment post action needs write permission on PRs, which is not present for default github token provided to CI in this repo (?). I have opened a dummy PR here on my fork. you can see the posted results there, as well as try out the comment based invocation of the benchmarks.

I have also changed the benchmarking method as discussed. We used to simply clone the cmark-gfm and run the bench from there ; now I have copied the benchmark running script from there, and use hyperfine to run them. Now we also run benchmark on pulldown-cmark and cmark-gfm . Markdown-rs was also considered, but it is a library-only repo, so we will probably need to setup cargo project with simple binary which reads from stdin and runs the markdow-rs parse on it for running benchmarks on it. That is not done in this PR.

Some issues I have noticed :

  • Benchmark CI takes ~5 minutes total to build and run benchmark on all repos. Major time is taken in building all of the binaries, the benchmarks only take ~1 minute total to run.
  • The time measurement varies for each CI run considerably. Not sure if this is GH runner issue or what, because when running this locally, the measurements were consistent across several runs. What this would mean is that we cannot compare measurement from a previous CI run to current, but only the values in current CI run. This also means that the relative section in the measurement is most sensible for understanding difference and changes in speed. You can see the comment history of the measurement comment on my PR (linked above) to see what I mean by all this

benches/bench.sh Show resolved Hide resolved
Makefile Show resolved Hide resolved
.github/workflows/benchmarks.yml Show resolved Hide resolved
@YJDoc2
Copy link
Contributor Author

YJDoc2 commented Jun 26, 2023

@kivikakk @digitalmoksha Please take a look when possible :)

Copy link
Collaborator

@digitalmoksha digitalmoksha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YJDoc2 this looks pretty good from my angle! Just had a few comments.

Makefile Outdated
bench-comrak: build-comrak-branch
git clone https://github.com/progit/progit.git ${ROOT}/vendor/progit || true > /dev/null
cd benches && \
hyperfine --prepare 'sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' --warmup 3 --min-runs ${MIN_RUNS} -L binary comrak-${COMMIT} './bench.sh ./{binary}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do we really need to use sudo? I tried make bench-comrak locally, and hyperfine just hung during warmup, and no idea why. Turns out sudo was asking for a password. I don't really feel comfortable giving a bench mark program sudo access.

.github/workflows/benchmarks.yml Show resolved Hide resolved
Makefile Show resolved Hide resolved
@kivikakk
Copy link
Owner

This is looking amazing! Thank you so much.

I echo @digitalmoksha's comments about the use of sudo — is there usually a measurable difference between clearing the caches for our purposes? I imagine the warmup runs will remove a lot of the startup variation even without doing so.

  • The time measurement varies for each CI run considerably. Not sure if this is GH runner issue or what, because when running this locally, the measurements were consistent across several runs. What this would mean is that we cannot compare measurement from a previous CI run to current, but only the values in current CI run. This also means that the relative section in the measurement is most sensible for understanding difference and changes in speed. You can see the comment history of the measurement comment on my PR (linked above) to see what I mean by all this

Yes, that makes a lot of sense. I'm hopeful the relative data itself is accurate enough, often enough — the inter-run differences are probably caused by being scheduled onto hosts with vastly different loads at that time, but there's no predicting when that might happen during a run.

@YJDoc2
Copy link
Contributor Author

YJDoc2 commented Jun 27, 2023

Hey @kivikakk @digitalmoksha , thanks for the review :)

Hmm, do we really need to use sudo? I tried make bench-comrak locally, and hyperfine just hung during warmup, and no idea why. Turns out sudo was asking for a password. I don't really feel comfortable giving a bench mark program sudo access.

I echo @digitalmoksha's comments about the use of sudo — is there usually a measurable difference between clearing the caches for our purposes? I imagine the warmup runs will remove a lot of the startup variation even without doing so.

I added the cache clearing step because otherwise hyperfine gave a warning that "first run is an outlier, which might be an effect of cache". Also when I tried without cache clearing, the measurements were considerably different. (consistent, but different). That said, I don't like the sudo use either, especially for benchmarks, so I'll try to figure out a workaround. Maybe increasing the number of warmup runs might solves the issue. Will get back after updating this.

Thanks :)

@YJDoc2
Copy link
Contributor Author

YJDoc2 commented Jul 4, 2023

@kivikakk @digitalmoksha I have removed sudo and added markdown-it rs in comparisons. The benchmark value change from using the cache drops, but it is consistent within itself, so it should be fine. I have also added timestamp in the comment. Please take a look.

I'll see why the CI is failing...

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@digitalmoksha
Copy link
Collaborator

@YJDoc2 this look good to me, nice job.

Co-authored-by: digitalMoksha <brett@digitalmoksha.com>
@kivikakk
Copy link
Owner

I'll look into the CI failure this afternoon with a view to merging this.

@YJDoc2
Copy link
Contributor Author

YJDoc2 commented Jul 10, 2023

Hey, sorry I wasn't able to update on this last week.

I'll look into the CI failure this afternoon with a view to merging this.

Thanks a lot. Is it ok if I rebase it with main and force push? Maybe the CI is expecting some new things in master that aren't pulled in this branch.

@YJDoc2
Copy link
Contributor Author

YJDoc2 commented Jul 10, 2023

@kivikakk I have updated the proc-macro2 version, as that was causing the build failure on nightly.
Related issue dtolnay/proc-macro2#356 .
I haven't updated anything else in cargo lock.

Copy link
Owner

@kivikakk kivikakk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks excellent. Tested all locally — thank you so much for your effort! 🤍

@kivikakk kivikakk linked an issue Jul 11, 2023 that may be closed by this pull request
@kivikakk kivikakk merged commit 216848f into kivikakk:main Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] CI for running Benchmarks
3 participants