Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat OSC ANSI Sequences as Invisible Text & Add OSC 8 Support #2544

Merged
merged 9 commits into from
Feb 10, 2024

Conversation

eth-p
Copy link
Collaborator

@eth-p eth-p commented Apr 17, 2023

Fixes #2541 by removing OSC sequences from lines before starting to print them. no longer considering ANSI OSC sequences as visible text.

image
(Left: Fixed, Right: Current version)

Prior to this change, we don't interpret OSC sequences in any special way. They aren't caught by console::AnsiCodeIterator, so the OSC sequences were being treated as visible text.

  • When the OSC is printed in full, the terminal would see a valid sequence and hide it. As a consequence, lines would b e wrapped too early.

  • When the OSC is broken for line wrapping, the terminal wouldn't see a valid sequence. In this case, it prints both halves of the text on different lines and appears to be junk.


Edit: I had some extra time. A couple of new commits gave special handling for the OSC 8 sequence, allowing bat to disable the hyperlink at the end of the line and then re-emit it at the start of the next line.

image

@eth-p eth-p requested review from sharkdp and Enselic April 17, 2023 05:15
@eth-p eth-p force-pushed the fix-2541 branch 3 times, most recently from a34f6f4 to bca3a3e Compare April 17, 2023 23:12
@eth-p eth-p changed the title Strip OSC Ansi Sequences Before Printing Treat OSC ANSI Sequences as Invisible Text Apr 18, 2023
@eth-p
Copy link
Collaborator Author

eth-p commented Apr 18, 2023

Made some changes so that we don't need to strip the sequences anymore. Instead, they'll be emitted as-is and just ignored for line wrapping purposes.

@eth-p
Copy link
Collaborator Author

eth-p commented Apr 18, 2023

I also found some possible optimizations while doing this:

  • The syntax highlighting style is being emitted for every single text chunk (rather than highlight region).

    Essentially, for a string like Text^[33mMore Text^[1mEven More Text what we do is:

    1. For Text
      • Emit highlight style: ^[38;2;100;100;100m
      • Emit text: Text
    2. For ^[33m
      • Passthrough detected ANSI sequence: ^[33m
    3. For More Text
      • Emit highlight style: ^[38;2;100;100;100m
      • Re-emit detected ANSI attributes as sequence: ^[33m
      • Emit text: More Text
    4. For ^[1m
      • Passthrough detected ANSI sequence: ^[1m
    5. For Even More Text
      • Emit highlight style: ^[38;2;100;100;100m
      • Re-emit detected ANSI attributes as sequence: ^[33m^[1m
      • Emit text: Even More Text

    This gives us: ^[38;2;100;100;100mText^[33m^[38;2;100;100;100m^[33mMore Text^[1m^[38;2;100;100;100m^[33m^[1mEven More Text
    Which could be simplified to: ^[38;2;100;100;100mText^[33mMore Text^[1mEven More Text

    For the plain text syntax we don't actually have a style to emit (i.e. no ^[38;2;100;100;100m everywhere), but it still acts as though there is one and double-prints the passed through styles.

  • When wrapping is enabled, we could probably be reusing a single String as a line_buf.
    As it is right now, we call String::with_capacity multiple times per line. A better approach would be to store it on the InteractivePrinter struct and reuse it across lines so we don't have to allocate as much.

    I tested this, and it reduced performance on release builds. I guess reusing the variable prevents LLVM from performing a couple of optimizations.

@eth-p eth-p changed the title Treat OSC ANSI Sequences as Invisible Text Treat OSC ANSI Sequences as Invisible Text & Add OSC 8 Support Apr 18, 2023
@eth-p
Copy link
Collaborator Author

eth-p commented Apr 18, 2023

I figured with such a significant change to a hot loop, it might be worth benchmarking bat before and after my changes.

I no longer have an x86_64 system running MacOS to test with, but I can happily say that these changes actually result in a small performance gain under aarch64 :)

image

Notes

  • Both executables were compiled with cargo build --release --locked using rustc version 1.65.0.
  • bat-master is commit e828d78.
  • bat-with-fix is commit d899f82.
  • The lots-of-ansi.txt file was created using bat src/**/*.rs --decorations=always --wrap=never --paging=never --color=always > lots-of-ansi.txt, so it basically feeds bat's output back to itself.
  • Both versions output the same text, verified by running sha1sum against the the executables' output with the flags --terminal-width=80 --wrap=character --decorations=always --color=always lots-of-ansi.txt --paging=never

@eth-p
Copy link
Collaborator Author

eth-p commented Feb 8, 2024

Rebased to latest master and re-ran benchmarks.

Benchmarks: aarch64

Apple ARM M1 architecture, running Darwin 23.0.0 kernel.

With ANSI content in the input, these changes result in a minor performance increase.

image

With non-ANSI inputs, there were no performance regressions:

image

Benchmarks: amd64

Intel i7-4790k Haswell architecture, running Linux 6.1 kernel with glibc 2.38.

With ANSI content in the input, these changes result in a minor performance increase.

image

With non-ANSI inputs, there were no obvious performance regressions. Differences between the two builds was within margin of error.

image

If you would like benchmarks for a more modern architecture (AMD Zen 4) or for Windows, I can install some development tools on my other PC.

Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much and sorry for leaving this hanging for so long.

Admittedly, I have not reviewed the very core of this (the escape sequence parser/iterator) in detail, but it looks very reasonable and is well covered by tests.

Also, thank you for performing the benchmarks!

@eth-p
Copy link
Collaborator Author

eth-p commented Feb 10, 2024

The failed audit check is related to libgit2, which isn't changed by this pull request. Merging this PR.

This can be used to extract a subset of ANSI escape sequences from a
string of text. I have big plans for this eventually, but for now, it'll
be used to strip OSC before printing.
This commit strips OSC (Operating System Command) sequences before
printing lines. Eventually when time permits, I want to add back
support for printing OSC sequences (and improve it to treat hyperlinks
like an attribute).

Until then, this should help prevent garbled output :)
More specifically, the test ensures that OSC sequences don't end up
wrapping the line.
This is an iterator for escape sequences, using
EscapeSequenceOffsetsIterator for the underlying parsing of individual
escape sequences.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Erratic output when OSC8 sequences are present
3 participants