-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make whitespace (' ', \t, \r, \n) always visible for "changed" lines #485
Conversation
" u 2", | ||
" u 3", | ||
" -⇥·leading·space␊", | ||
" -trailing·space⇥·␊", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nedtwigg , what do you think?
For now, I make all the whitespace visible for However, when lines are changed (for any reason), then I make all the whitespace visible just in case. Later it would make sense to add per-line diff, then the formatter could visualize the whitespace only in case there are whitespace issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but it adds a bug related to charset encoding
In the old approach, this was how charsets were managed within the diff:
- we take in Strings (unicode with unspecified internal binary representation)
- we convert them to the UTF8-encoded binary, which allows to look for the '\n' byte as a line-break, and also allows us to put UTF8-encoded binary into the diff
- we convert the UTF8-encoded diff into a Unicode String, which can then be converted to whatever binary encoding is required by the console
In the new approach, the Strings are converted to their native, on-disk binary encoding. The problem with that is we can no longer rely on the byte \n
as a line-break. For example, in one of UTF-16BE vs LE, the '\n' will be in the middle of a codepoint.
If we return to the charset handling of the previous version, then this LGTM.
I remember this comment about bad encodings and the console. It's important to remember that both the console and the disk are binary interfaces, they don't care about Strings. For the console or a file to be a meaningful String, you have to encode the String to a binary representation, and there's nothing that keeps some random file to have the same encoding as some random shell displaying stdout.
It is possible there is an encoding bug somewhere in Spotless' error message reporting, but it isn't in any of the code changed by this particular PR.
Are you sure everything could be converted to UTF-8? |
ok. DiffMessageFormatter is used for reporting purposes only, so it should not harm even in case some data is lost when converting file bytes to UTF-8. I've added a guard to ensure all the fancy characters are convertible to In practice, "middle dot" is convertible to ISO-8859-1 just fine. |
A
I love it! Useful and clever. I'm gonna push a couple API changes and then merge. |
Ahh, sorry. No changelog entry yet, so I'm gonna push the API change back onto you ;-) Once these are all checked, I'll merge and publish a release:
My quibble is with the String to byte[] conversion. The reason for the change was a teensy-tiny performance improvement. The downside is that we're now passing |
context lines, added-only lines, and removed-only lines are shown as usual in the diffs. fixes diffplug#465
please review.
In my opinion having Strings does not help since |
Note: OLD was diffWhitespaceLineEndings (11 lines) + visibleWhitespaceLineEndings (5 lines), so Strings made sense. However the new diffWhitespaceLineEndings is 6 lines, so Strings there hardly make any difference |
Looks great, thanks! JGit and byte[] are implementation details. It's fine for implementation details to end up in the API of private methods, but if its easy to hide them, why not! |
Thanks for the improvement! For future reference:
|
Released in |
It looks like middle dots are not safe: GitHub Actions / Windows
Travis: https://travis-ci.org/apache/calcite/jobs/610889368#L450
Then there's a bug that diff shows too many lines of context. It combines everything to a single diff fragment, while it should show three individual ones. Apparently I'm afraid I'm not confident to submit a fix, so please feel free to revert the whole thing. |
Reverted in 0868063 (but I kept the |
context lines, added-only lines, and removed-only lines are shown as usual in the diffs.
fixes #465
Please make sure that your PR allows edits from maintainers. Sometimes its faster for us to just fix something than it is to describe how to fix it.
After creating the PR, please add a commit that adds a bullet-point under the
-SNAPSHOT
section of CHANGES.md, plugin-gradle/CHANGES.md, and plugin-maven/CHANGES.md which includes:If your change only affects a build plugin, and not the lib, then you only need to update the
CHANGES.md
for that plugin.If your change affects lib in an end-user-visible way (fixing a bug, updating a version) then you need to update
CHANGES.md
for both the lib and the build plugins. Users of a build plugin shouldn't have to refer to lib to see changes that affect them.This makes it easier for the maintainers to quickly release your changes :)