Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tab expansion support. #302

Merged
merged 17 commits into from
Sep 12, 2018
Merged

Tab expansion support. #302

merged 17 commits into from
Sep 12, 2018

Conversation

eth-p
Copy link
Collaborator

@eth-p eth-p commented Sep 11, 2018

This pull request adds support for expanding tabs into column-aligned spaces (similar to expand(1)). Issues #166 and #184 are fixed when using this feature.

By default, tab width is set to 8 characters. This can be overridden with --tabs=[n] or export BAT_TABS=[n]. It can also be turned off entirely by setting the value to 0.

To allow for lossless copy-pasting, when neither the pager nor the decorations are being used, the default tab width will be set to 0 (passthrough).

Implementation

The function expand (src/preprocessor.rs) is run for every line inside InteractivePrinter.print_line, after the syntax highlighting is processed. ANSI characters are directly passed through while tabs are interpreted and replaced with spaces. The number of spaces depends on the calculated cursor position, and will always try to align the columns.

Additional Information

Three new tests have been created for this feature:

  • tabs_4
  • tabs_8
  • tabs_passthrough

A couple of small changes were made to the tester scripts and generate_snapshots.py was fixed to use the interactive printer for generating snapshots.

Example

Before:

sidebar│         |       |       |         <-- Column Alignment
  24   │ // Tab alignment:
  25   │ /*
  26   │        Indent                     <-- Misaligned Indentation
  27   │        1       2       3       4
  28   │ 1      ?
  29   │ 22     ?
  30   │ 333    ?
  31   │ 4444   ?
  32   │ 55555  ?
  33   │ 666666 ?
  34   │ 7777777        ?                  <-- Misaligned Columns
  35   │ 88888888       ?
  36   │ */

After:

sidebar│         |       |       |
  20   │ // Tab alignment:
  21   │ /*
  22   │         Indent
  23   │         1       2       3       4
  24   │ 1       ?
  25   │ 22      ?
  26   │ 333     ?
  27   │ 4444    ?
  28   │ 55555   ?
  29   │ 666666  ?
  30   │ 7777777 ?
  31   │ 88888888        ?
  32   │ */

This changes how the files are named (to allow for snapshots that aren't
directly related to the --style argument) and fixes the
generate_snapshots.py script to work with the latest version of bat.

Three new tests are also introduced:
- tabs_4 - Tab expansion with a width of 4.
- tabs_8 - Tab expansion with a width of 8.
- tabs_passthrough - No tab expansion.
@sharkdp
Copy link
Owner

sharkdp commented Sep 11, 2018

@eth-p Thank you very much for this contribution!!

My only concern is what I mentioned in #166:

One thing to keep in mind: Using a pre-processing step to expand the tabs could be problematic for languages with a syntax that distinguishes between tabs and spaces (i.e. Makefile).

To be honest, I didn't manage to construct an example that show-cases this issue because the Sublime text syntaxes seem to be rather forgiving, but in principle we could get syntax highlighting problems due to the replacement of tabs with spaces. As a stupid example, if we wanted to add a syntax for the Whitespace programming language, this approach would certainly be problematic 😄

image

Edit: I think I'm okay if we just blindly ignore this issue.

@eth-p
Copy link
Collaborator Author

eth-p commented Sep 11, 2018

@sharkdp Ah, I see what you mean now. That could be problematic.

It might incur a bit of a performance penalty, but I moved the tab expansion step to happen after the syntax is highlighted, so this should no longer be a problem.

Edit: There are a couple inconsistencies with tabs right now. I'm currently fixing them.

@sharkdp
Copy link
Owner

sharkdp commented Sep 11, 2018

It might incur a bit of a performance penalty, but I moved the tab expansion step to happen after the syntax is highlighted, so this should no longer be a problem.

Fantastic! Let me know if this is ready for a full review.

@eth-p
Copy link
Collaborator Author

eth-p commented Sep 11, 2018

@sharkdp Everything's fixed. It should be ready for a review now.

Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you very much for all your work!

Instead of adding more snapshot tests for the tab-expansion alone (which are hard to maintain), I think it would be great if we could use normal unit tests to make sure that the tab-expand-functionality works as expected. What do you think?

src/app.rs Outdated Show resolved Hide resolved
src/app.rs Outdated
.matches
.value_of("tabs")
.and_then(|w| w.parse().ok())
.or_else(|| env::var("BAT_TABS").ok().and_then(|w| w.parse().ok()))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could the .and_then(|w| w.parse().ok()) part be used only once? Or are we dealing with different string types here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's implemented this way so that an invalid value for --tabs would fall back on the environment variable. Should it be done differently?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fail with an error message if the value for --tabs can not be parsed (instead of silently falling back on the environment-variable value).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d404139

src/app.rs Outdated
@@ -41,6 +41,9 @@ pub struct Config<'a> {
/// The character width of the terminal
pub term_width: usize,

/// The width of tab characters.
pub tab_width: usize,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is "no tab expansion" represented? tab_width==0? Could this be Option<usize> instead?

Copy link
Collaborator Author

@eth-p eth-p Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, but it might complicate the argument parsing a little bit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would favor the Option version but I'm also fine with a short comment for now ("A value of zero indicates that no expansion should be performed").

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment in d404139

@@ -10,7 +10,7 @@ fn main() {
"The perimeter of the rectangle is {} pixels.",
perimeter(&rect1)
);
println!(r#"This line contains invalid utf8: "�"#;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on in this changeset?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My IDE probably did something to that when I re-saved the file after adding the tabs section. I'll attempt to fix that.

@eth-p
Copy link
Collaborator Author

eth-p commented Sep 11, 2018

@sharkdp I could make a unit test for the tab expansion function itself, but I don't think it's worth it. The expansion function is pretty simple and straightforward, but the way it's used in the printer isn't. Changes to the printer are the most likely cause of it ever breaking in the future, and I don't think unit tests could cover that as well as the current snapshot tests.

@sharkdp
Copy link
Owner

sharkdp commented Sep 12, 2018

@sharkdp I could make a unit test for the tab expansion function itself, but I don't think it's worth it. The expansion function is pretty simple and straightforward, but the way it's used in the printer isn't. Changes to the printer are the most likely cause of it ever breaking in the future, and I don't think unit tests could cover that as well as the current snapshot tests.

Ok 👍

@@ -10,7 +10,7 @@ fn main() {
"The perimeter of the rectangle is {} pixels.",
perimeter(&rect1)
);
println!(r#"This line contains invalid utf8: "�"#;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some changes in this line (which is why the snapshots also show modifications in this line).

The øˆ€€€ part is just the "interpolated" version of what comes out if you try to decode this as utf-8. You actually need to put the exact same bytes back in place (or tell me if you need any help).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might need a bit of help with that, sorry haha...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it can be pretty irritating when viewing the diffs in GitHub or some editors, as they will show the interpolated øˆ€€€ string in both cases (if you actually put the øˆ€€€ string in the file or if you put the invalid utf-8 sequence in the file)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll look into this later today.

grid_header_numbers: "grid,header,numbers" => [wrap: false, tabs: 8],
changes_grid_header_numbers: "changes,grid,header,numbers" => [wrap: false, tabs: 8],
full: "full" => [wrap: false, tabs: 8],
plain: "plain" => [wrap: false, tabs: 0],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eth-p I had to change this to tabs: 0 because generate_snapshots.py produced a file that had tabs in them for --style=plain. I'm not sure why this was different for you. I tried to add --paging=never and --color=never to make this more robust.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharkdp That would be because it doesn't process tabs unless it needs to for alignment reasons (e.g. the pager or decorations are used). On second thought, though, that might be a bit confusing when people inconsistently end up with tabs sometimes.

Should I make a PR to remove that?

Copy link
Owner

@sharkdp sharkdp Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current behavior is fine. I just wondered why I got a different version of the file than you when I ran the generate_.. script.

@sharkdp sharkdp merged commit 52d0d6c into sharkdp:master Sep 12, 2018
@sharkdp
Copy link
Owner

sharkdp commented Sep 12, 2018

@eth-p Thank you very much for all of this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants