Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore: Use a crossbeam deque instead of an Arc<Mutex<Vec<_>>> #2591

Closed
wants to merge 1 commit into from

Conversation

tavianator
Copy link
Contributor

This gives a significant scalability improvement for fd:

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-before -uj1 '^$' ~/code/bfs/bench/corpus/chromium 577.6 ± 9.1 558.1 586.9 5.46 ± 1.29
./fd-before -uj6 '^$' ~/code/bfs/bench/corpus/chromium 553.7 ± 17.4 522.1 574.7 5.24 ± 1.25
./fd-before -uj12 '^$' ~/code/bfs/bench/corpus/chromium 581.7 ± 14.2 556.0 600.0 5.50 ± 1.30
./fd-before -uj24 '^$' ~/code/bfs/bench/corpus/chromium 587.7 ± 25.6 547.9 614.3 5.56 ± 1.33
./fd-after -uj1 '^$' ~/code/bfs/bench/corpus/chromium 586.8 ± 11.3 569.5 597.5 5.55 ± 1.31
./fd-after -uj6 '^$' ~/code/bfs/bench/corpus/chromium 171.9 ± 7.2 157.4 187.8 1.63 ± 0.39
./fd-after -uj12 '^$' ~/code/bfs/bench/corpus/chromium 128.7 ± 10.7 111.0 143.4 1.22 ± 0.30
./fd-after -uj24 '^$' ~/code/bfs/bench/corpus/chromium 105.7 ± 24.9 84.5 159.7 1.00

There is a minor (~1.5%) single-threaded performance regression in this benchmark. If you want, I can try to mitigate that, but it seems okay to me.

@tavianator
Copy link
Contributor Author

Ripgrep-specific benchmarks:

Command Mean [ms] Min [ms] Max [ms] Relative
./rg-before -uuuj1 --files ~/code/bfs/bench/corpus/chromium 579.4 ± 12.8 561.5 599.9 3.54 ± 0.14
./rg-before -uuuj6 --files ~/code/bfs/bench/corpus/chromium 507.0 ± 13.0 475.2 516.5 3.10 ± 0.13
./rg-before -uuuj12 --files ~/code/bfs/bench/corpus/chromium 565.6 ± 4.4 557.1 571.1 3.46 ± 0.12
./rg-before -uuuj24 --files ~/code/bfs/bench/corpus/chromium 548.5 ± 3.2 543.1 553.3 3.36 ± 0.11
./rg-after -uuuj1 --files ~/code/bfs/bench/corpus/chromium 575.2 ± 16.3 557.9 605.6 3.52 ± 0.15
./rg-after -uuuj6 --files ~/code/bfs/bench/corpus/chromium 229.7 ± 9.2 216.1 249.1 1.41 ± 0.07
./rg-after -uuuj12 --files ~/code/bfs/bench/corpus/chromium 192.7 ± 7.0 179.8 205.9 1.18 ± 0.06
./rg-after -uuuj24 --files ~/code/bfs/bench/corpus/chromium 163.5 ± 5.4 155.8 178.2 1.00

@BurntSushi
Copy link
Owner

Thank you for this! I just wanted to acknowledge it and thank you for the work. I'll plan to dig into this before the ripgrep 14 release. (But I'm taking a detour to do some improvements to the aho-corasick crate first.)

@BurntSushi
Copy link
Owner

BurntSushi commented Sep 18, 2023

OK, so I've read through this patch and it looks good to me! It's very well written, thank you. I've also run a number of ad hoc benchmarks (shown below) and I'd describe the overall result as follows:

  1. Never meaningfully slower. Sometimes meaningfully faster.
  2. Sometimes uses more memory, but not an obscene amount.
  3. It passes my "insane" test of searching an extremely wide directory of all crates from crates.io (from some time ago). Basically, it completes in roughly the same time as master and uses roughly the same amount of memory.

Some ad hoc benchmarks are below. Unless otherwise noted, file caches are warm.

On a checkout of the Linux kernel:

$ git remote -v
origin  git@github.com:torvalds/linux (fetch)
origin  git@github.com:torvalds/linux (push)

$ git rev-parse HEAD
f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6

$ time rg-before-deque -c BURNTSUSHI

real    0.084
user    0.309
sys     0.608
maxmem  14 MB
faults  0

$ time rg-after-deque -c BURNTSUSHI

real    0.079
user    0.258
sys     0.583
maxmem  13 MB
faults  0

$ hyperfine -i -w3 "rg-before-deque -c BURNTSUSHI" "rg-after-deque -c BURNTSUSHI"
Benchmark 1: rg-before-deque -c BURNTSUSHI
  Time (mean ± σ):      85.3 ms ±   1.0 ms    [User: 304.5 ms, System: 603.8 ms]
  Range (min … max):    83.3 ms …  87.3 ms    34 runs

Benchmark 2: rg-after-deque -c BURNTSUSHI
  Time (mean ± σ):      79.4 ms ±   1.2 ms    [User: 251.1 ms, System: 566.4 ms]
  Range (min … max):    76.7 ms …  81.7 ms    36 runs

Summary
  'rg-after-deque -c BURNTSUSHI' ran
    1.07 ± 0.02 times faster than 'rg-before-deque -c BURNTSUSHI'

On a checkout of chromium (nice result):

$ git remote -v
origin  git@github.com:nwjs/chromium.src (fetch)
origin  git@github.com:nwjs/chromium.src (push)

$ git rev-parse HEAD
e25f26f9aef0fd5958b980c3da4fdc2703ea4ccc

$ time rg-before-deque -c BURNTSUSHI

real    0.327
user    1.384
sys     2.279
maxmem  101 MB
faults  0

$ time rg-after-deque -c BURNTSUSHI

real    0.285
user    1.091
sys     1.977
maxmem  75 MB
faults  0

$ hyperfine -i -w3 "rg-before-deque -c BURNTSUSHI" "rg-after-deque -c BURNTSUSHI"
Benchmark 1: rg-before-deque -c BURNTSUSHI
  Time (mean ± σ):     331.8 ms ±   2.7 ms    [User: 1528.5 ms, System: 2190.0 ms]
  Range (min … max):   327.5 ms … 335.2 ms    10 runs

Benchmark 2: rg-after-deque -c BURNTSUSHI
  Time (mean ± σ):     286.5 ms ±   4.3 ms    [User: 1175.8 ms, System: 1973.8 ms]
  Range (min … max):   281.1 ms … 293.5 ms    10 runs

Summary
  'rg-after-deque -c BURNTSUSHI' ran
    1.16 ± 0.02 times faster than 'rg-before-deque -c BURNTSUSHI'

On a checkout of some Harry Potter fan-fiction (an amazing result):

$ git remote -v
origin  git@github.com:NightMachinary/r_HPfanfiction (fetch)
origin  git@github.com:NightMachinary/r_HPfanfiction (push)

$ git rev-parse HEAD
b373f0c3c03cdeed261646038170a2286fdcb4b8

$ time rg-before-deque -c BURNTSUSHI

real    0.803
user    4.006
sys     4.561
maxmem  157 MB
faults  0

$ time rg-after-deque -c BURNTSUSHI

real    0.524
user    2.242
sys     3.686
maxmem  34 MB
faults  0

$ hyperfine -i -w3 "rg-before-deque -c BURNTSUSHI" "rg-after-deque -c BURNTSUSHI"
Benchmark 1: rg-before-deque -c BURNTSUSHI
  Time (mean ± σ):     797.3 ms ±  18.2 ms    [User: 3810.4 ms, System: 4735.8 ms]
  Range (min … max):   747.8 ms … 809.7 ms    10 runs

Benchmark 2: rg-after-deque -c BURNTSUSHI
  Time (mean ± σ):     525.2 ms ±   2.7 ms    [User: 2267.6 ms, System: 3692.9 ms]
  Range (min … max):   521.6 ms … 529.5 ms    10 runs

Summary
  'rg-after-deque -c BURNTSUSHI' ran
    1.52 ± 0.04 times faster than 'rg-before-deque -c BURNTSUSHI'

And now my "torture" test. In this case, I clear the file cache before each search because the entire corpus doesn't fit into memory on the machine I have it on. So this seemed like the most fair test.

$ sudo clear-file-cache

$ time rg-before-deque BURNTSUSHI

real    1:40.15
user    31.787
sys     1:05.12
maxmem  476 MB
faults  67

$ sudo clear-file-cache

$ time rg-after-deque BURNTSUSHI

real    1:51.20
user    34.253
sys     1:11.48
maxmem  364 MB
faults  69

While there is a big difference here, I ran this test multiple times and there is a ton of noise. However, each test run is all within roughly the same ballpark. If I run with ripgrep 11, it didn't complete after 15 minutes and >1.5GB of memory used, so I killed it. Definitely a different ballpark. Bottom line is that as far as I can tell, this PR does not appear to regress on my torture test.

I'll merge this PR in my rollup branch with a few minor style changes. Thank you! :-)

I think the only downside of this change is that it brings in more dependencies. But they look to be good quality libraries and I feel like they are carrying some significant weight with the work-stealing queue.

@BurntSushi BurntSushi added the rollup A PR that has been merged with many others in a rollup. label Sep 18, 2023
BurntSushi pushed a commit that referenced this pull request Sep 20, 2023
This represents yet another iteration on how `ignore` enqueues and
distributes work in parallel. The original implementation used a
multi-producer/multi-consumer thread safe queue from crossbeam. At some
point, I migrated to a simple `Arc<Mutex<Vec<_>>>` and treated it as a
stack so that we did depth first traversal. This helped with memory
usage in very wide directories.

But it turns out that a naive stack-behind-a-mutex can be quite a bit
slower than something that's a little smarter, such as a work-stealing
stack used in this commit. My hypothesis for why this helps is that
without the stealing component, work distribution can get stuck in
sub-optimal configurations that depend on which directory entries get
assigned to a particular worker. It's likely that this can result in
some workers getting "more" work than others, just by chance, and thus
remain idle. But the work-stealing approach heads that off.

This does re-introduce a dependency on parts of crossbeam which is kind
of a bummer, but it's carrying its weight for now.

Closes #1823, Closes #2591
Ref sharkdp/fd#28
@tavianator tavianator deleted the crossbeam-deque branch September 20, 2023 21:35
ink-splatters pushed a commit to ink-splatters/ripgrep that referenced this pull request Oct 25, 2023
This represents yet another iteration on how `ignore` enqueues and
distributes work in parallel. The original implementation used a
multi-producer/multi-consumer thread safe queue from crossbeam. At some
point, I migrated to a simple `Arc<Mutex<Vec<_>>>` and treated it as a
stack so that we did depth first traversal. This helped with memory
usage in very wide directories.

But it turns out that a naive stack-behind-a-mutex can be quite a bit
slower than something that's a little smarter, such as a work-stealing
stack used in this commit. My hypothesis for why this helps is that
without the stealing component, work distribution can get stuck in
sub-optimal configurations that depend on which directory entries get
assigned to a particular worker. It's likely that this can result in
some workers getting "more" work than others, just by chance, and thus
remain idle. But the work-stealing approach heads that off.

This does re-introduce a dependency on parts of crossbeam which is kind
of a bummer, but it's carrying its weight for now.

Closes BurntSushi#1823, Closes BurntSushi#2591
Ref sharkdp/fd#28
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Nov 28, 2023
14.0.2 (2023-11-27)
===================
This is a patch release with a few small bug fixes.

Bug fixes:

* [BUG #2654](BurntSushi/ripgrep#2654):
  Fix `deb` release sha256 sum file.
* [BUG #2658](BurntSushi/ripgrep#2658):
  Fix partial regression in the behavior of `--null-data --line-regexp`.
* [BUG #2659](BurntSushi/ripgrep#2659):
  Fix Fish shell completions.
* [BUG #2662](BurntSushi/ripgrep#2662):
  Fix typo in documentation for `-i/--ignore-case`.


14.0.1 (2023-11-26)
===================
This a patch release meant to fix `cargo install ripgrep` on Windows.

Bug fixes:

* [BUG #2653](BurntSushi/ripgrep#2653):
  Include `pkg/windows/Manifest.xml` in crate package.


14.0.0 (2023-11-26)
===================
ripgrep 14 is a new major version release of ripgrep that has some new
features, performance improvements and a lot of bug fixes.

The headlining feature in this release is hyperlink support. In this release,
they are an opt-in feature but may change to an opt-out feature in the future.
To enable them, try passing `--hyperlink-format default`. If you use [VS Code],
then try passing `--hyperlink-format vscode`. Please [report your experience
with hyperlinks][report-hyperlinks], positive or negative.

[VS Code]: https://code.visualstudio.com/
[report-hyperlinks]: BurntSushi/ripgrep#2611

Another headlining development in this release is that it contains a rewrite
of its regex engine. You generally shouldn't notice any changes, except for
some searches may get faster. You can read more about the [regex engine rewrite
on my blog][regex-internals]. Please [report your performance improvements or
regressions that you notice][report-perf].

[report-perf]: BurntSushi/ripgrep#2652

Finally, ripgrep switched the library it uses for argument parsing. Users
should not notice a difference in most cases (error messages have changed
somewhat), but flag overrides should generally be more consistent. For example,
things like `--no-ignore --ignore-vcs` work as one would expect (disables all
filtering related to ignore rules except for rules found in version control
systems such as `git`).

[regex-internals]: https://blog.burntsushi.net/regex-internals/

**BREAKING CHANGES**:

* `rg -C1 -A2` used to be equivalent to `rg -A2`, but now it is equivalent to
  `rg -B1 -A2`. That is, `-A` and `-B` no longer completely override `-C`.
  Instead, they only partially override `-C`.

Build process changes:

* ripgrep's shell completions and man page are now created by running ripgrep
with a new `--generate` flag. For example, `rg --generate man` will write a
man page in `roff` format on stdout. The release archives have not changed.
* The optional build dependency on `asciidoc` or `asciidoctor` has been
dropped. Previously, it was used to produce ripgrep's man page. ripgrep now
owns this process itself by writing `roff` directly.

Performance improvements:

* [PERF #1746](BurntSushi/ripgrep#1746):
  Make some cases with inner literals faster.
* [PERF #1760](BurntSushi/ripgrep#1760):
  Make most searches with `\b` look-arounds (among others) much faster.
* [PERF #2591](BurntSushi/ripgrep#2591):
  Parallel directory traversal now uses work stealing for faster searches.
* [PERF #2642](BurntSushi/ripgrep#2642):
  Parallel directory traversal has some contention reduced.

Feature enhancements:

* Added or improved file type filtering for Ada, DITA, Elixir, Fuchsia, Gentoo,
  Gradle, GraphQL, Markdown, Prolog, Raku, TypeScript, USD, V
* [FEATURE #665](BurntSushi/ripgrep#665):
  Add a new `--hyperlink-format` flag that turns file paths into hyperlinks.
* [FEATURE #1709](BurntSushi/ripgrep#1709):
  Improve documentation of ripgrep's behavior when stdout is a tty.
* [FEATURE #1737](BurntSushi/ripgrep#1737):
  Provide binaries for Apple silicon.
* [FEATURE #1790](BurntSushi/ripgrep#1790):
  Add new `--stop-on-nonmatch` flag.
* [FEATURE #1814](BurntSushi/ripgrep#1814):
  Flags are now categorized in `-h/--help` output and ripgrep's man page.
* [FEATURE #1838](BurntSushi/ripgrep#1838):
  An error is shown when searching for NUL bytes with binary detection enabled.
* [FEATURE #2195](BurntSushi/ripgrep#2195):
  When `extra-verbose` mode is enabled in zsh, show extra file type info.
* [FEATURE #2298](BurntSushi/ripgrep#2298):
  Add instructions for installing ripgrep using `cargo binstall`.
* [FEATURE #2409](BurntSushi/ripgrep#2409):
  Added installation instructions for `winget`.
* [FEATURE #2425](BurntSushi/ripgrep#2425):
  Shell completions (and man page) can be created via `rg --generate`.
* [FEATURE #2524](BurntSushi/ripgrep#2524):
  The `--debug` flag now indicates whether stdin or `./` is being searched.
* [FEATURE #2643](BurntSushi/ripgrep#2643):
  Make `-d` a short flag for `--max-depth`.
* [FEATURE #2645](BurntSushi/ripgrep#2645):
  The `--version` output will now also contain PCRE2 availability information.

Bug fixes:

* [BUG #884](BurntSushi/ripgrep#884):
  Don't error when `-v/--invert-match` is used multiple times.
* [BUG #1275](BurntSushi/ripgrep#1275):
  Fix bug with `\b` assertion in the regex engine.
* [BUG #1376](BurntSushi/ripgrep#1376):
  Using `--no-ignore --ignore-vcs` now works as one would expect.
* [BUG #1622](BurntSushi/ripgrep#1622):
  Add note about error messages to `-z/--search-zip` documentation.
* [BUG #1648](BurntSushi/ripgrep#1648):
  Fix bug where sometimes short flags with values, e.g., `-M 900`, would fail.
* [BUG #1701](BurntSushi/ripgrep#1701):
  Fix bug where some flags could not be repeated.
* [BUG #1757](BurntSushi/ripgrep#1757):
  Fix bug when searching a sub-directory didn't have ignores applied correctly.
* [BUG #1891](BurntSushi/ripgrep#1891):
  Fix bug when using `-w` with a regex that can match the empty string.
* [BUG #1911](BurntSushi/ripgrep#1911):
  Disable mmap searching in all non-64-bit environments.
* [BUG #1966](BurntSushi/ripgrep#1966):
  Fix bug where ripgrep can panic when printing to stderr.
* [BUG #2046](BurntSushi/ripgrep#2046):
  Clarify that `--pre` can accept any kind of path in the documentation.
* [BUG #2108](BurntSushi/ripgrep#2108):
  Improve docs for `-r/--replace` syntax.
* [BUG #2198](BurntSushi/ripgrep#2198):
  Fix bug where `--no-ignore-dot` would not ignore `.rgignore`.
* [BUG #2201](BurntSushi/ripgrep#2201):
  Improve docs for `-r/--replace` flag.
* [BUG #2288](BurntSushi/ripgrep#2288):
  `-A` and `-B` now only each partially override `-C`.
* [BUG #2236](BurntSushi/ripgrep#2236):
  Fix gitignore parsing bug where a trailing `\/` resulted in an error.
* [BUG #2243](BurntSushi/ripgrep#2243):
  Fix `--sort` flag for values other than `path`.
* [BUG #2246](BurntSushi/ripgrep#2246):
  Add note in `--debug` logs when binary files are ignored.
* [BUG #2337](BurntSushi/ripgrep#2337):
  Improve docs to mention that `--stats` is always implied by `--json`.
* [BUG #2381](BurntSushi/ripgrep#2381):
  Make `-p/--pretty` override flags like `--no-line-number`.
* [BUG #2392](BurntSushi/ripgrep#2392):
  Improve global git config parsing of the `excludesFile` field.
* [BUG #2418](BurntSushi/ripgrep#2418):
  Clarify sorting semantics of `--sort=path`.
* [BUG #2458](BurntSushi/ripgrep#2458):
  Make `--trim` run before `-M/--max-columns` takes effect.
* [BUG #2479](BurntSushi/ripgrep#2479):
  Add documentation about `.ignore`/`.rgignore` files in parent directories.
* [BUG #2480](BurntSushi/ripgrep#2480):
  Fix bug when using inline regex flags with `-e/--regexp`.
* [BUG #2505](BurntSushi/ripgrep#2505):
  Improve docs for `--vimgrep` by mentioning footguns and some work-arounds.
* [BUG #2519](BurntSushi/ripgrep#2519):
  Fix incorrect default value in documentation for `--field-match-separator`.
* [BUG #2523](BurntSushi/ripgrep#2523):
  Make executable searching take `.com` into account on Windows.
* [BUG #2574](BurntSushi/ripgrep#2574):
  Fix bug in `-w/--word-regexp` that would result in incorrect match offsets.
* [BUG #2623](BurntSushi/ripgrep#2623):
  Fix a number of bugs with the `-w/--word-regexp` flag.
* [BUG #2636](BurntSushi/ripgrep#2636):
  Strip release binaries for macOS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rollup A PR that has been merged with many others in a rollup.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants