bench: add run outputs

This makes it easy to link to benchmarks when someone asks, but also serves as a good way to archive benchmark data at defined points for comparison later. We also make a (feeble) attempt at putting a "pretty" version of a subset of benchmarks in the README of each run directory.
BurntSushi · Apr 30, 2021 · 317075f · 317075f
1 parent 54824bf
commit 317075f
Show file tree

Hide file tree

Showing 9 changed files with 77,242 additions and 5 deletions.
diff --git a/Cargo.toml b/Cargo.toml
@@ -9,7 +9,7 @@ repository = "https://github.com/BurntSushi/rust-memchr"
 readme = "README.md"
 keywords = ["memchr", "char", "scan", "strchr", "string"]
 license = "Unlicense/MIT"
-exclude = ["/ci/*", "/.travis.yml", "/Makefile", "/appveyor.yml"]
+exclude = ["/bench", "/.github", "/fuzz"]
 edition = "2018"
 
 [workspace]

diff --git a/bench/README.md b/bench/README.md
@@ -0,0 +1,44 @@
+This directory defines a large suite of benchmarks for both the memchr and
+memmem APIs in this crate. A selection of "competitor" implementations are
+chosen. In general, benchmarks are meant to be a tool for optimization. That's
+why there is so many: we want to be sure we get enough coverage such that our
+benchmarks approximate real world usage. When some benchmarks look a bit slower
+than we expect (for one reason another), we can use profiling tools to look at
+codegen and attempt to improve that case.
+
+Because there are so many benchmarks, if you run all of them, you might want to
+step away for a cup of coffee (or two). Therefore, the typical way to run them
+is to select a subset. For example,
+
+```
+$ cargo bench -- 'memmem/krate/.*never.*'
+```
+
+runs all benchmarks for the memmem implementation in this crate with searches
+that never produce any matches. This will still take a bit, but perhaps only a
+few minutes.
+
+Running a specific benchmark can be useful for profiling. For example, if you
+want to see where `memmem/krate/prebuiltiter/huge-en/common-one-space` is
+spending all of its time, you would first want to run it (to make sure the code
+is compiled):
+
+```
+$ cargo bench -- memmem/krate/prebuiltiter/huge-en/common-one-space
+```
+
+And then run it under your profiling tool (I use `perf` on Linux):
+
+```
+$ perfr --callgraph cargo bench -- memmem/krate/prebuiltiter/huge-en/common-one-space --profile-time 3
+```
+
+Where
+[`perfr` is my own wrapper around `perf`](https://github.com/BurntSushi/dotfiles/blob/master/bin/perfr),
+and the `--profile-time 3` flag means, "just run the code for 3 seconds, but
+don't do anything else." This makes the benchmark harness get out of the way,
+which lets the profile focus as much as possible on the code being measured.
+
+See the README in the `runs` directory for a bit more info on how to use
+`critcmp` to look at benchmark data in a way that makes it easy to do
+comparisons.
diff --git a/bench/data/README.md b/bench/data/README.md
@@ -0,0 +1,2 @@
+This directory contains benchmark corpora. Each sub-directory contains a README
+documenting the corpus a bit more.
diff --git a/bench/data/code/README.md b/bench/data/code/README.md
@@ -0,0 +1,12 @@
+This data contains corpora generated from source code. These sorts of corpora
+are important because code is something that is frequently searched.
+
+This corpus was generated by running
+
+```
+$ find ./library/alloc -name '*.rs' -print0 \
+    | xargs -0 cat > .../memchr/bench/data/code/rust-library.rs
+```
+
+in a checkout of the https://github.com/rust-lang/rust repository at commit
+78c963945aa35a76703bf62e024af2d85b2796e2.
diff --git a/bench/runs/2021-04-30_initial/README.md b/bench/runs/2021-04-30_initial/README.md
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		This directory contains benchmark corpora. Each sub-directory contains a README
		documenting the corpus a bit more.