-
-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CI for running benchmarks #326
Conversation
@kivikakk @digitalmoksha Please take a look when possible :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YJDoc2 this looks pretty good from my angle! Just had a few comments.
Makefile
Outdated
bench-comrak: build-comrak-branch | ||
git clone https://github.com/progit/progit.git ${ROOT}/vendor/progit || true > /dev/null | ||
cd benches && \ | ||
hyperfine --prepare 'sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' --warmup 3 --min-runs ${MIN_RUNS} -L binary comrak-${COMMIT} './bench.sh ./{binary}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, do we really need to use sudo
? I tried make bench-comrak
locally, and hyperfine just hung during warmup, and no idea why. Turns out sudo
was asking for a password. I don't really feel comfortable giving a bench mark program sudo
access.
This is looking amazing! Thank you so much. I echo @digitalmoksha's comments about the use of
Yes, that makes a lot of sense. I'm hopeful the relative data itself is accurate enough, often enough — the inter-run differences are probably caused by being scheduled onto hosts with vastly different loads at that time, but there's no predicting when that might happen during a run. |
Hey @kivikakk @digitalmoksha , thanks for the review :)
I added the cache clearing step because otherwise hyperfine gave a warning that "first run is an outlier, which might be an effect of cache". Also when I tried without cache clearing, the measurements were considerably different. (consistent, but different). That said, I don't like the sudo use either, especially for benchmarks, so I'll try to figure out a workaround. Maybe increasing the number of warmup runs might solves the issue. Will get back after updating this. Thanks :) |
@kivikakk @digitalmoksha I have removed sudo and added markdown-it rs in comparisons. The benchmark value change from using the cache drops, but it is consistent within itself, so it should be fine. I have also added timestamp in the comment. Please take a look. I'll see why the CI is failing... |
@YJDoc2 this look good to me, nice job. |
Co-authored-by: digitalMoksha <brett@digitalmoksha.com>
I'll look into the CI failure this afternoon with a view to merging this. |
Hey, sorry I wasn't able to update on this last week.
Thanks a lot. Is it ok if I rebase it with main and force push? Maybe the CI is expecting some new things in master that aren't pulled in this branch. |
@kivikakk I have updated the proc-macro2 version, as that was causing the build failure on nightly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks excellent. Tested all locally — thank you so much for your effort! 🤍
Solves #324
As discussed, I have added CI for running benchmarks each time a PR is opened, and subsequently, when anyone comments
/run-bench
on the PR. This will run the benchmarks and post the result table as a comment on the PR. On subsequent runs, the same comment will be edited and updated with new results.Because github needs the CI file to be present in default branch to run a CI, the bench CI won't run for this PR.The CI is failing here, because the comment post action needs write permission on PRs, which is not present for default github token provided to CI in this repo (?). I have opened a dummy PR here on my fork. you can see the posted results there, as well as try out the comment based invocation of the benchmarks.
I have also changed the benchmarking method as discussed. We used to simply clone the cmark-gfm and run the bench from there ; now I have copied the benchmark running script from there, and use hyperfine to run them. Now we also run benchmark on pulldown-cmark and cmark-gfm . Markdown-rs was also considered, but it is a library-only repo, so we will probably need to setup cargo project with simple binary which reads from stdin and runs the markdow-rs parse on it for running benchmarks on it. That is not done in this PR.
Some issues I have noticed :
relative
section in the measurement is most sensible for understanding difference and changes in speed. You can see the comment history of the measurement comment on my PR (linked above) to see what I mean by all this