Reduce startup time in loop-through mode by 80%-90% #1747

Enselic · 2021-07-22T09:13:26Z

HighlightingAssets::get_syntax_set() is never called when e.g. piping the bat output to a file (see Config::loop_through), so by loading the SyntaxSet only when needed, we radically improve startup time when it is not needed.

On my low-end machine, I get the following numbers when doing
time bat tests/examples/multiline.txt > /tmp/output.txt:

git master

0,09s user 0,00s system 93% cpu 0,103 total

git master + this PR

0,01s user 0,00s system 63% cpu 0,026 total

We can also benchmark with hyperfine , in which case we don't need to redirect the output manually, because the internal redirection of hyperfine is enough to trigger loop-through mode:
hyperfine --export-markdown /dev/tty 'bat tests/examples/multiline.txt'

git master

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bat tests/examples/multiline.txt`	96.1 ± 2.2	93.3	104.3	1.00

git master + this PR

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bat tests/examples/multiline.txt`	11.2 ± 1.1	9.7	15.9	1.00

So on my system, the speedup amounts to (1-11.2/96.1)*100 ~ 88%.

Lazy-loading was one of the key aspects of my prototype on how to improve the startup speed of bat, and this PR is one step on the journey to improve startup speed in general.

I have done sanity checking of this change, and all regression tests pass, so this PR should not be far off production-quality code. But don't hesitate to give criticism when doing code review :) And if you think my approach is completely off track, I would gladly like to know!

I also want to clarify that I think this is a good move long term, even if we start to load partial syntax sets for improved performance. The ability to load the full syntax set is still useful, both for fallback purposes, and for implementing things like —list-languages

Cargo.toml

src/assets.rs

Enselic · 2021-07-22T09:32:55Z

Also starting to lazy-load theme_set should also improve startup speed, although not as dramatic in absolute numbers since it is much smaller.

I plan on looking into that once this PR has landed.

Edit: I couldn't resist, so I did a quick copy-paste job to see what difference it would make, and it reduces startup time to 5.8 ms on my system. Now the bottleneck is constructing SyntaxMapping, and skipping that takes the time down to 2.8 ms. But let's take one step at a time.

keith-hall

Nice achievement, great job!

sharkdp · 2021-07-25T11:15:45Z

This looks great - thank you very much!

Just for my understanding: The use of LazyCell (instead of a simple Option, for example) is needed in order to make the get_syntax_set method &self (and not &mut self), right?

I can confirm the benchmark results.

hyperfine \
  --warmup 5 \
  -L bat bat-master,bat-feature \
  --export-markdown results-loopthrough.md \
  './{bat} tests/examples/multiline.txt'

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./bat-master tests/examples/multiline.txt`	48.5 ± 1.2	47.1	52.6	7.61 ± 1.04
`./bat-feature tests/examples/multiline.txt`	6.4 ± 0.9	5.3	12.8	1.00

I thought it would also be good to make sure that normal syntax set loading is not (significantly) slower, even if I wouldn't expect it to be. This can also be confirmed:

hyperfine \
  --warmup 5 \
  -L bat bat-master,bat-feature \
   --export-markdown results-normal.md \
  './{bat} -f src/less.rs'

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./bat-master -f src/less.rs`	64.6 ± 0.9	63.4	67.9	1.00
`./bat-feature -f src/less.rs`	64.8 ± 1.0	63.5	69.0	1.00 ± 0.02

Enselic · 2021-07-25T17:55:49Z

Just for my understanding: The use of LazyCell (instead of a simple Option, for example) is needed in order to make the get_syntax_set method &self (and not &mut self), right?

Yes, the use of LazyCell is to enable get_syntax_set() on &self, which is a recommended pattern (with some variation) to implement "hidden" caching, or lazy-loading. But that is not the full story. I also wanted to avoid breaking the public bat-as-a-library API. In particular HighlightingAssets::syntaxes(). That method assume things can never go wrong, so there would be no sensible way to implement it if get_syntax_set() would return, say, an Option.

When it is time to look over the public API and make breaking API changes, we could perhaps make it necessary for clients to handle failure to load the cache. Not sure it's worth the complexity though... I suggest we wait and see how much a problem my .expect("cache corrupt ... code is in practice.

Thanks for the code reviews. I will try to merge this shortly. I just need some more time to explore how the code behaves when the cache is corrupt.

Enselic · 2021-07-26T05:39:48Z

So, the PR in its current form introduces a behavioral change when syntaxes.bin is missing from the cache, and I can't make up my mind about if this is a problem or not. I would love to get a second opinion on it.

Step-by-step

Build a cache. One easy way is to do this:
bat cache --build --blank --source ~/src/bat/assets
Remove only syntaxes.bin from the cache:
rm ~/.cache/bat/syntaxes.bin
Ask bat to highlight a file:
bat examples/simple.rs

git master behavior:

Bat silently falls back to using assets (both themes and syntaxes) from the binary.

PR behavior:

Bat panics:

% bat-pr examples/simple.rs 
thread 'main' panicked at 'cache corrupt, consider rebuilding or clearing, see https://github.com/sharkdp/bat#adding-new-syntaxes--language-definitions on how: Error(Msg("Could not load cached syntax set '/home/martin/.cache/bat/syntaxes.bin'"), State { next_error: Some(Os { code: 2, kind: NotFound, message: "No such file or directory" }), backtrace: InternalBacktrace })', src/assets.rs:324:55
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Alternative approach

It is straightforward to also make bat in this PR fallback to binary assets in the scenario above. We just need to (try to) read the file during startup, and only defer parsing (which is the slow part) of it. The code for that looks like this: Enselic@4197558a35

In terms of performance with a fully valid cache and with no cache, it doesn't matter much what route we take. Here are the numbers, where I compare "this PR" with "this PR + above commit", with and without a cache, and with and without loop-through mode:

% bat cache --build --blank --source ~/src/bat/assets
% hyperfine --export-markdown /dev/tty -L bat bat-pr,bat-pr-prime -L color never,always '{bat} --color {color} examples/simple.rs'
% mv ~/.cache/bat ~/.cache/bat-disabled
% hyperfine --export-markdown /dev/tty -L bat bat-pr,bat-pr-prime -L color never,always '{bat} --color {color} examples/simple.rs'

With cache:

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bat-pr --color never examples/simple.rs`	16.8 ± 1.1	15.0	21.8	1.00
`bat-pr-prime --color never examples/simple.rs`	17.5 ± 1.2	15.7	22.5	1.04 ± 0.10
`bat-pr --color always examples/simple.rs`	113.5 ± 1.7	109.1	117.7	6.74 ± 0.47
`bat-pr-prime --color always examples/simple.rs`	115.2 ± 2.3	111.4	122.2	6.84 ± 0.49

Without cache:

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bat-pr --color never examples/simple.rs`	11.8 ± 1.4	10.1	21.8	1.00 ± 0.16
`bat-pr-prime --color never examples/simple.rs`	11.8 ± 1.2	10.2	17.1	1.00
`bat-pr --color always examples/simple.rs`	109.0 ± 2.5	105.5	117.3	9.24 ± 0.97
`bat-pr-prime --color always examples/simple.rs`	112.9 ± 2.4	109.1	121.1	9.57 ± 1.01

I'm starting to love hyperfine more and more btw, such a great tool! :D You really have a talent for coming up with useful tools, David :)

Anyway, the question is basically: Do you think we need to worry about people poking around in the cache like this? (I currently lean more towards “no, we don’t”.)

sharkdp · 2021-07-26T18:02:38Z

Hm. I usually try to invest a lot of time to get rid of possible panics because I don't want users to see these messages. It would really turn me away from a tool. This sounds like a weird edge case where it might be okay for us to show a panic message to the user, but it does look rather scary for the average user ("panicked?", "corrupt?", "what the hell is a backtrace"?). If we can be sure that this is the only scenario where it will be shown, I'm okay with it. Otherwise, I would prefer your alternative solution.

sharkdp · 2021-07-26T18:05:44Z

I'm starting to love hyperfine more and more btw, such a great tool! :D You really have a talent for coming up with useful tools, David :)

Thank you for the feedback! If you are interested, I encourage you to learn about Warmup runs and prepare commands which can be really useful to compare warm disk cache with cold disk cache. In this scenario here, it could actually be really useful to compare the two, as we are actually loading additional files from disk. And they might not be in the (disk) cache, if a user runs bat from time to time only.

Enselic · 2021-07-26T19:31:11Z

As far as I know, the panic will only happen in the scenario I described. But I do understand your concerns, so I have decided to go the extra mile and incorporate the extra change into this PR. Thanks for pushing me in the right direction 👍 (That will also be a good opportunity for me to play around with --prepare etc of hyperfine to explore performance numbers with cold file caches.)

Btw, I have noticed you seem to prefer hyperfine --export-markdown results.md .... I was curious if you picked up on my trick you use hyperfine --export-markdown /dev/tty ... instead? That way the results ends up directly in the terminal for immediate copy-paste. In your case, don't you need an extra step to access the contents of results.md?

sharkdp · 2021-07-26T20:00:21Z

Btw, I have noticed you seem to prefer hyperfine --export-markdown results.md .... I was curious if you picked up on my trick you use hyperfine --export-markdown /dev/tty ... instead? That way the results ends up directly in the terminal for immediate copy-paste. In your case, don't you need an extra step to access the contents of results.md?

I have used this in the past as well. I didn't know /dev/tty, so I used /dev/stdout. I didn't use it here, as it doesn't work anymore with the latest master version of hyperfine(!), which I just noticed when I commented here. This is probably due to sharkdp/hyperfine#339 which is also very useful. Will have to think about whether to restore that old export-to-stdout "feature" somehow 😄

sharkdp · 2021-07-26T20:03:25Z

That way the results ends up directly in the terminal for immediate copy-paste.

For bonus points:

hyperfine 'sleep 0.1' 'sleep 0.2' --export-markdown >(xclip)

To enable robust and user-friendly support for lazy-loading, we need variants of get_syntax_set() and syntaxes() that can fail (see discussion about panics in sharkdp#1747). This commit deprecates old public syntaxes() and introduces a failable version called get_syntaxes().

Or rather, introduce new versions of these methods and deprecate the old ones. This is preparation to enable robust and user-friendly support for lazy-loading. With lazy-loading, we don't know if the SyntaxSet is valid until after we try to use it, so wherever we try to use it, we need to return a Result. See discussion about panics in sharkdp#1747.

Or rather, introduce new versions of these methods and deprecate the old ones. This is preparation to enable robust and user-friendly support for lazy-loading. With lazy-loading, we don't know if the SyntaxSet is valid until after we try to use it, so wherever we try to use it, we need to return a Result. See discussion about panics in #1747.

They are just a way to get access to data embedded in the binary, so they don't conceptually belong inside HighlightingAssets. This has the nice side effect of getting HighlightingAssets::from_cache() and ::from_binary(), that are highly related, next to each other.

Enselic · 2021-07-29T07:26:31Z

I have now rebased this PR on top of the merged PRs #1758, #1755 and #1755, and here is updated performance numbers and behavior.

First, let's compare startup time between current git master bat and bat from this PR:
hyperfine --export-markdown /dev/tty -L bat bat-new,bat -L color never,always '{bat} --color {color} examples/simple.rs'

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bat-new --color never examples/simple.rs`	11.1 ± 0.9	9.8	14.7	1.00
`bat --color never examples/simple.rs`	95.9 ± 1.2	93.9	98.6	8.62 ± 0.74
`bat-new --color always examples/simple.rs`	107.6 ± 1.4	105.3	110.8	9.67 ± 0.83
`bat --color always examples/simple.rs`	108.0 ± 1.6	105.7	112.1	9.71 ± 0.84

And we can confirm that the PR bat is still significantly faster in loop-through mode, and same as before in normal mode. Now let's use the new bat in this PR to compare performance with and without custom assets (that are otherwise byte-by-byte identical):
hyperfine --export-markdown /dev/tty -L bat bat-new -L assets --no-custom-assets, -L color never,always '{bat} --color {color} examples/simple.rs {assets}'

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`bat-new --color never examples/simple.rs --no-custom-assets`	11.7 ± 0.9	10.3	14.3	1.01 ± 0.11
`bat-new --color never examples/simple.rs`	11.6 ± 0.8	10.3	14.1	1.00
`bat-new --color always examples/simple.rs --no-custom-assets`	110.5 ± 1.3	107.8	113.8	9.53 ± 0.68
`bat-new --color always examples/simple.rs`	108.9 ± 1.4	106.5	112.2	9.39 ± 0.67

And as expected, it practically does not matter if custom or integrated assets are used.

Finally, let's now see what happens if the cache is corrupt. First, let's keep syntaxes.bin around but put garbage in it:

% echo garbage >! ~/.cache/bat/syntaxes.bin
% bat-new examples/simple.rs
[bat error]: Could not parse cached syntax set

Instead of panicing, bat now explains that the cached syntax set could not be parsed. And if we remove the file completely:

% rm ~/.cache/bat/syntaxes.bin
% bat-new examples/simple.rs
[bat error]: Could not load cached syntax set
'/home/martin/.cache/bat/syntaxes.bin'

bat now says that the file can not be found. Note that this is changed behavior compared to before, where we silently used integrated assets if the cached assets were corrupt. But I think the new behavior is better. If the cache is corrupt, the user probably wants to know.

If we remove the cache completely, then of course integrated assets are used:

% rm -rf ~/.cache/bat
% bat-new examples/simple.rs
───────┬─────────────────────────────
       │ File: examples/simple.rs
───────┼─────────────────────────────
[...]

Would be great if you could take a renewed look on this PR before I merge.

sharkdp

This looks great -thank you!

Note that this is changed behavior compared to before, where we silently used integrated assets if the cached assets were corrupt. But I think the new behavior is better. If the cache is corrupt, the user probably wants to know.

I completely agree 👍

Instead of 100 ms - 50 ms, startup takes 10 ms - 5 ms. HighlightingAssets::get_syntax_set() is never called when e.g. piping the bat output to a file (see Config::loop_through), so by loading the SyntaxSet only when needed, we radically improve startup time when it is not needed.

Enselic · 2021-07-29T18:36:41Z

CI problems are unrelated. Merged. (:tada: )

CHANGELOG.md

Enselic commented Jul 22, 2021

View reviewed changes

Cargo.toml Show resolved Hide resolved

Enselic commented Jul 22, 2021

View reviewed changes

src/assets.rs Outdated Show resolved Hide resolved

keith-hall approved these changes Jul 22, 2021

View reviewed changes

sharkdp approved these changes Jul 25, 2021

View reviewed changes

sharkdp mentioned this pull request Jul 25, 2021

bat release focus: performance #1751

Closed

11 tasks

This was referenced Jul 26, 2021

Make loading of cached assets closer in performance to integrated assets #1754

Merged

Improve bat startup speed #951

Closed

Enselic mentioned this pull request Jul 27, 2021

HighlightingAssets: Make .syntaxes() and .syntax_for_file_name() failable #1755

Merged

Enselic added the needs-work label Jul 27, 2021

Enselic changed the title ~~Improve loop-through startup speed by 80%-90%~~ Reduce startup time in loop-through mode with 80%-90% Jul 29, 2021

Enselic force-pushed the lazy-load-syntaxes.bin branch from 6797854 to dabff4e Compare July 29, 2021 07:19

Enselic force-pushed the lazy-load-syntaxes.bin branch from dabff4e to 2582b99 Compare July 29, 2021 07:27

Enselic removed the needs-work label Jul 29, 2021

Enselic changed the title ~~Reduce startup time in loop-through mode with 80%-90%~~ Reduce startup time in loop-through mode by 80%-90% Jul 29, 2021

Enselic requested a review from sharkdp July 29, 2021 17:02

sharkdp approved these changes Jul 29, 2021

View reviewed changes

Enselic force-pushed the lazy-load-syntaxes.bin branch from 2582b99 to 0386c01 Compare July 29, 2021 18:25

Enselic merged commit 6acec2c into sharkdp:master Jul 29, 2021

Enselic deleted the lazy-load-syntaxes.bin branch July 29, 2021 18:36

Enselic commented Jul 29, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Enselic mentioned this pull request Sep 8, 2021

Load independent and minimal syntax sets when using --language #1787

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce startup time in loop-through mode by 80%-90% #1747

Reduce startup time in loop-through mode by 80%-90% #1747

Enselic commented Jul 22, 2021 •

edited

Loading

Enselic commented Jul 22, 2021 •

edited

Loading

keith-hall left a comment

sharkdp commented Jul 25, 2021 •

edited

Loading

Enselic commented Jul 25, 2021

Enselic commented Jul 26, 2021 •

edited

Loading

sharkdp commented Jul 26, 2021

sharkdp commented Jul 26, 2021 •

edited

Loading

Enselic commented Jul 26, 2021

sharkdp commented Jul 26, 2021

sharkdp commented Jul 26, 2021 •

edited

Loading

Enselic commented Jul 29, 2021 •

edited

Loading

sharkdp left a comment

Enselic commented Jul 29, 2021 •

edited

Loading

Reduce startup time in loop-through mode by 80%-90% #1747

Reduce startup time in loop-through mode by 80%-90% #1747

Conversation

Enselic commented Jul 22, 2021 • edited Loading

Enselic commented Jul 22, 2021 • edited Loading

keith-hall left a comment

Choose a reason for hiding this comment

sharkdp commented Jul 25, 2021 • edited Loading

Enselic commented Jul 25, 2021

Enselic commented Jul 26, 2021 • edited Loading

Step-by-step

git master behavior:

PR behavior:

Alternative approach

sharkdp commented Jul 26, 2021

sharkdp commented Jul 26, 2021 • edited Loading

Enselic commented Jul 26, 2021

sharkdp commented Jul 26, 2021

sharkdp commented Jul 26, 2021 • edited Loading

Enselic commented Jul 29, 2021 • edited Loading

sharkdp left a comment

Choose a reason for hiding this comment

Enselic commented Jul 29, 2021 • edited Loading

Enselic commented Jul 22, 2021 •

edited

Loading

Enselic commented Jul 22, 2021 •

edited

Loading

sharkdp commented Jul 25, 2021 •

edited

Loading

Enselic commented Jul 26, 2021 •

edited

Loading

sharkdp commented Jul 26, 2021 •

edited

Loading

sharkdp commented Jul 26, 2021 •

edited

Loading

Enselic commented Jul 29, 2021 •

edited

Loading

Enselic commented Jul 29, 2021 •

edited

Loading