Improved ANSI passthrough. #1596

eth-p · 2021-03-23T01:50:04Z

This pull request overhauls how bat interprets and re-emits ANSI escape sequences.
Prior to these changes, bat used a naiive heuristic that re-emits every encountered sequence on a new line. The encountered sequences are only cleared whenever the exact literal \x1B[0m is encountered.

In 60aad68, I added a simple ANSI sequence parser that only tracks the most recently-encountered ANSI SGR (color, style, etc.) sequence for each attribute. I haven't benchmarked the difference in CPU usage, but it significantly increases the performance of the pager (the cause of #1481).

Memory usage: (Top: before, Bottom: after)

In 5aa94e8, I fixed ANSI passthrough support when wrapping is disabled. The non-wrapping branch didn't have code to handle ANSI, which led to inconsistencies across lines.

Example: printf '\x1B[33mYellow\nShould be yellow.\x1B[m' | bat --wrap=never

I did my best to manually check for any regressions, but I may have missed some. If anyone finds any issues with how ANSI is handled now, please let me know.

Additionally, if/when console-rs/console#95 is merged, we should update it. Without the fix, strings such as \x1B(0lqk will be incorrectly interpreted as [\x1B(0l, qk] instead of [\x1B(0, lqk].

Edit by @Enselic: We included the fix in bat a month ago in #1934 🎉

keith-hall · 2021-03-23T04:33:39Z

Nice work! 🎆

I did my best to manually check for any regressions, but I may have missed some.

Do we not have any tests that cover the non-wrapping behavior? Otherwise I'd expect to see some failures due to some differences in our highlighted test files for example.
Also, if we have tests covering input containing ANSI escape codes, that could improve our confidence that all is good ;) (I don't have chance to check atm)

eth-p · 2021-03-23T05:44:28Z

I don't think we have any tests specifically covering input with ANSI sequences. I definitely would've broken snapshots tests with this fix, but everything passed without an issue.

sharkdp · 2021-03-27T10:33:37Z

Awesome! Excited to get this integrated soon. It might take me a few more days until I find the time to review it though.

It would be great to see actual (before and after) benchmark results. Maybe with some of the "highlighted" files in tests/syntax-tests (which doesn't seem to work currently)?

eth-p · 2021-03-28T02:47:47Z

It would be great to see actual (before and after) benchmark results.

I unfortunately don't have a machine that's reliable enough to accurately benchmark bat's wall time. It seems my laptop can introduce noise (thermal throttling? background CPU usage?) that may vary results by up to 40% per run, and I don't want to provide any unreliable benchmarks.

That being said... I can still quantify the difference in how many bytes bat is writing to stdout or the pager. In some cases, it's a small but meaningful 9% difference.

Using the pre-highlighted Rust syntax test:

Cols	Syntax	Style	OLD	NEW	Diff%
80	text	Plain	92016	83801	91.07%
120	Rust	Plain	99073	91131	91.98%
120	Rust	Full	692732	634222	91.55%

And in more extreme cases such as the output from the command in #1481:

Cols	Syntax	Style	Picture	OLD	NEW	Diff%
120	text	Full		908560976	1082842	0.1191%

As far as the changes in 5aa94e8 are concerned though, they will have negative impact on unhighlighted text with --wrap=never. Prior to that commit, --wrap=never was faster than it should have been, since ANSI sequences were not actually being interpreted.

Maybe with some of the "highlighted" files in tests/syntax-tests (which doesn't seem to work currently)?

They do work with -l txt. I suspect the reason they don't work is because Syntect is highlighting parts of the ANSI color sequences as their own highlight blocks, and bat doesn't try to look for sequences across blocks.

A smarter implementation would likey be for bat to just disable all highlighting when it encounters its first ANSI escape sequence, but I feel like that would be a breaking change that requires some discussion around it first.

sharkdp · 2021-04-03T19:27:08Z

I unfortunately don't have a machine that's reliable enough to accurately benchmark bat's wall time. It seems my laptop can introduce noise (thermal throttling? background CPU usage?) that may vary results by up to 40% per run, and I don't want to provide any unreliable benchmarks.

I wrote hyperfine to be able to deal with effects like these when benchmarking command line programs. I'm not asking you to use it, but if you are interested, you could give it a try 😄

hyperfine [options] './bat-old …' './bat-new'

There is a builtin outlier detection that will tell you if background processes completely break your benchmark. Obviously, you should still always try to run benchmarks on a quiet PC. Shuting down things like spotify, dropbox, browsers, … can have a huge impact.
Hyperfine performs a series of benchmarks and performs a statistical analysis. You will get an estimate for the noise in your benchmark results. If you are still unsure, you can run a t-test on the results that will tell you if the speedup/slowdown is statistically significant.
There are options to use warmup runs before the actual benchmark. This can help with disk caching effects (which are likely to play a role here as well) and possibly even with thermal throttling.

Concerning the last point (thermal throttling): On Linux, there is a way to temporarily disable any frequency scaling, which can help with benchmarking.

eth-p · 2021-04-03T20:46:30Z

@sharkdp That worked really well, even on my Mac.

Okay, so... as expected, this pull request does introduce some cost at runtime. In the worst case (e.g. #1481), it takes 27% longer when not printing to a terminal:

And in the best case (no ANSI), it doesn't really change anything:

Now when it actually prints something to the screen... let's just say that the old one couldn't be benchmarked.

hyperfine -r10 --show-output './bat-old -ltxt --decorations=always --color=always --paging=never lots-of-ansi.txt'

The fixed version worked a lot better:

src/printer.rs

…rmance

eth-p · 2021-10-02T21:01:13Z

Updated this PR to be merge-able into master.

God-damnit-all · 2021-11-18T22:50:45Z

I was wondering why color was getting stripped when piping to bat, I was considering opening an issue but I found this PR. I really hope this gets merged soon.

Enselic · 2021-11-19T06:14:58Z

@ImportTaste Does passing --color=never --wrap=never preserve colors? See https://github.com/sharkdp/bat#garbled-output (you did not miss that, it was added minutes ago)

…rmance

Enselic · 2021-12-08T11:18:35Z

I merged the code with origin/master and fixed some lints.

I have confirmed that the added integration test fails on master. In other words, it tests that the fix in this PR is necessary and works. Which is a great test to have.

I do think the code could use some de-duplication, but that was the case even before this PR, so I don't think we need to do that as part of this PR.

Unless I'm mistaken, I think that #1976 allows us to do comprehensive benchmarking of this PR. Here are the results on my low-end desktop:

The grep-output-ansi-sequences.txt test is much faster, from 912 ms to 177 ms with --wrap=character.
Performance for other files with --wrap=character is unchanged.
As already mentioned by Ethan, performance does worsen quite a bit with --wrap=never for grep-output-ansi-sequences.txt, going from 129 ms to 157 ms. But, as explained, that's because we cheated before.
For other files with --wrap=never (that does not contain ANSI escape sequences) there is very little change in performance.

IMHO we should go ahead and merge this PR now, because it is an overall improvement. And we can always keep working on the code.

Here is the full benchmark comparison if you want to look. Left is git master e250da8. Right is the latest commit in this PR namely
4c044f5. (I realize now that 9f36470 is not in the benchmark, but that's just lint fixes and does not make a difference) :

sharkdp · 2021-12-08T11:44:51Z

Thank you for looking into this. I haven't had the time to fully review it. From a high-level view, it looks like the vscreen module could use some unit tests, but I am okay with merging this.

Enselic · 2021-12-08T15:05:32Z

I haven't done a full review review either in the sense that I understand how the code works. But it looks like completely reasonable code, and I'm confident enough in the code to believe it's going to be easy to understand how it works when the need arises. I'm confident enough of that to set Approved on the code at least, and be willing to merge it.

All the evidence so far points that it is an improvement over what we have at master right now, without any regressions. And we can always keep working on the code even if we merge it.

Since you too are OK with merging it, I will merge it now.

Enselic · 2021-12-08T15:33:03Z

Closed #1481

eth-p added 2 commits March 22, 2021 17:45

Improve handling of ANSI passthrough

60aad68

Fix ANSI passthrough for --wrap=never

5aa94e8

eth-p added the performance label Mar 23, 2021

eth-p requested a review from sharkdp March 23, 2021 01:50

eth-p self-assigned this Mar 23, 2021

eth-p mentioned this pull request Mar 23, 2021

Pipelining text with ANSI color code to bat consumes huge CPU and memory. #1481

Closed

eth-p added 2 commits March 23, 2021 16:41

Add test for ANSI passthrough

f455891

Update CHANGELOG.md for sharkdp#1596

8fa928c

keith-hall approved these changes Mar 24, 2021

View reviewed changes

sharkdp mentioned this pull request Jul 25, 2021

bat release focus: performance #1751

Closed

11 tasks

Enselic reviewed Aug 7, 2021

View reviewed changes

src/printer.rs Show resolved Hide resolved

Enselic added the needs-work label Oct 2, 2021

Merge remote-tracking branch 'origin/master' into improved-ansi-perfo…

d4e8bb8

…rmance

sharkdp mentioned this pull request Nov 22, 2021

Improved benchmark suite #1953

Merged

sharkdp added this to the v0.19 milestone Nov 22, 2021

Enselic added 2 commits December 8, 2021 08:49

Merge remote-tracking branch 'origin/master' into improved-ansi-perfo…

4c044f5

…rmance

Run cargo clippy --fix --all-targets --all-features

9f36470

Enselic mentioned this pull request Dec 8, 2021

run-benchmarks: Benchmark both --wrap=character and --wrap=never #1976

Merged

Enselic approved these changes Dec 8, 2021

View reviewed changes

Enselic removed the needs-work label Dec 8, 2021

Enselic merged commit 63ad538 into sharkdp:master Dec 8, 2021

keith-hall mentioned this pull request May 6, 2022

Issue handling text with escape sequences already embedded #2185

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved ANSI passthrough. #1596

Improved ANSI passthrough. #1596

eth-p commented Mar 23, 2021 •

edited by Enselic

Loading

keith-hall commented Mar 23, 2021

eth-p commented Mar 23, 2021

sharkdp commented Mar 27, 2021

eth-p commented Mar 28, 2021

sharkdp commented Apr 3, 2021

eth-p commented Apr 3, 2021

eth-p commented Oct 2, 2021

God-damnit-all commented Nov 18, 2021

Enselic commented Nov 19, 2021

Enselic commented Dec 8, 2021 •

edited

Loading

sharkdp commented Dec 8, 2021

Enselic commented Dec 8, 2021

Enselic commented Dec 8, 2021

Improved ANSI passthrough. #1596

Improved ANSI passthrough. #1596

Conversation

eth-p commented Mar 23, 2021 • edited by Enselic Loading

keith-hall commented Mar 23, 2021

eth-p commented Mar 23, 2021

sharkdp commented Mar 27, 2021

eth-p commented Mar 28, 2021

sharkdp commented Apr 3, 2021

eth-p commented Apr 3, 2021

eth-p commented Oct 2, 2021

God-damnit-all commented Nov 18, 2021

Enselic commented Nov 19, 2021

Enselic commented Dec 8, 2021 • edited Loading

sharkdp commented Dec 8, 2021

Enselic commented Dec 8, 2021

Enselic commented Dec 8, 2021

eth-p commented Mar 23, 2021 •

edited by Enselic

Loading

Enselic commented Dec 8, 2021 •

edited

Loading