Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate long lines #290

Merged
merged 1 commit into from
Aug 14, 2020
Merged

Truncate long lines #290

merged 1 commit into from
Aug 14, 2020

Conversation

dandavison
Copy link
Owner

Fixes #154, #288

@dandavison dandavison merged commit d4441fe into master Aug 14, 2020
@dandavison dandavison deleted the 154-truncate-long-lines branch August 14, 2020 16:10
@gasche
Copy link

gasche commented Jun 17, 2024

In my experience this default setting makes it inconvenient to use delta on text documents without hard-wrapping, as is a de-facto standard for say markdown documents and some LaTeX writing styles. This makes the delta default settings unsuitable for git repositories versioning such documents (rather than code).

For example, calling delta --help | delta results in truncated output:

Options --line-numbers-left-format and --line-numbers-right-format allow you to change the contents of the line number columns. Their values are arbitrary format strings, which are allowed to contain the placeholders {nm} for the line number associated with the old version of the file and {np} for the line number associated with the new version of the file. The placeholders support a subset of the string formatting syntax documented here: https://doc.rust-lang.org/std/fmt/#formatting-parameters. Specifical→

It is not a great for a pager to hide output by default.

I think that it would be better if:

  • the default could be made much larger; delta --help | wc -L reports a maximum line length of 701, maybe a default maximum of say 2048 would be safe for most markdown documents?
  • or if you found a different way to avoid computation-time blowup that does not require truncating line. Maybe the whatever processing that is done per-line could be disabled for long lines, but the line content would still be shown.

@dandavison
Copy link
Owner Author

Hi @gasche, I haven't had too many complaints about this and it was done very early on in the project. The main thing that caused me to do it was minified css and js files on a single line -- delta attempted to syntax highlight them and you can imagine what the performance was like.

This makes the delta default settings unsuitable for git repositories versioning such documents (rather than code).

You probably know this, but in such repos this problem can be fixed permanently by adding

[delta]
    max-line-length = 0

to the repo-specific git config (.git/config)

the default could be made much larger; delta --help | wc -L reports a maximum line length of 701, maybe a default maximum of say 2048 would be safe for most markdown documents?

It seems to me that if these are paragraphs of prose then we can't predict an upper limit

Maybe the whatever processing that is done per-line could be disabled for long lines, but the line content would still be shown.

Possibly. But I think that I like diffs not showing entire minified js/css files, with or without syntax highlighting.

So, I'm not quite sold personally on the need to change. But I could definitely believe that we could do a better job of letting people working in repos of plain text prose know that they should set max-line-length = 0.

@gasche
Copy link

gasche commented Jun 17, 2024

It seems to me that if these are paragraphs of prose then we can't predict an upper limit

Non-hard-wrapped text typically uses one line per paragraph, and paragraph tend to not be super-long for readability reasons (and because block structure such as lists, code or math elements require breaking the text paragraph). For delta --help, the maximum line length is 701. In my own documents I have found 999 to be the maximum line length. Finally, for physical books the standard character-per-page counts are reported to be between 1800 and 3700 characters -- some books have paragraphs running on several pages, but this is highly unusual.

To me this suggests that bumping the limit to 1024 or 2048 would be a reasonable choice.

In contrast, searching the web for "typical" css file sizes suggests that people aim for CSS files between 10Kio and 25Kio, so the minified CSS one-line files would typically start at around 10,000 characters. (People are generally discouraged from committing their minified CSS files in the version control system, but oh well.) Javascript file sizes are reported to be much larger than that.

@gasche
Copy link

gasche commented Jun 17, 2024

(The delta --help output contains four paragraphs over the current 512 limit.)

@injust
Copy link
Contributor

injust commented Jul 9, 2024

I got bit by this -- for a few days, I thought there was a bug in delta because changed lines were being truncated, hiding the actual change in the line. This default was surprising because my primary use case of delta is to highlight the smaller changes within longer lines.

In contrast, searching the web for "typical" css file sizes suggests that people aim for CSS files between 10Kio and 25Kio, so the minified CSS one-line files would typically start at around 10,000 characters.

+1, 500 characters seems too low for a default. Also, 8GB of minifed JS takes ~10 seconds to diff (from #154), so there should be room to increase the default before performance becomes a factor.

Maybe the whatever processing that is done per-line could be disabled for long lines, but the line content would still be shown.

+1

Possibly. But I think that I like diffs not showing entire minified js/css files, with or without syntax highlighting.

That purpose seems a bit orthogonal to delta's design (but you'd know about that more than me 😉). .gitignore could do the same thing in a much cleaner way.

You probably know this, but in such repos this problem can be fixed permanently by adding

[delta]
    max-line-length = 0

to the repo-specific git config (.git/config)

I think we should document this in the README, or even better, add the default to the config so it's obvious that this functionality is in play.

@gasche
Copy link

gasche commented Jul 9, 2024

@injust thanks for the extra feedback. Do you have estimates of maximal line lengths for the documents you happen to be working with? (Note: wc -L returns the maximal line length.)

@injust
Copy link
Contributor

injust commented Jul 9, 2024

@injust thanks for the extra feedback. Do you have estimates of maximal line lengths for the documents you happen to be working with? (Note: wc -L returns the maximal line length.)

The diff that got truncated for me was a uBlock Origin backup, which has very long lines because it converts multi-line strings to JSON. That's not a good example of how delta should be used and it's a pretty bad heuristic for setting the truncation threshold.

I don't have a horse in this race because I immediately set delta.max-line-length = 0 after I read this thread 😄, and I'm more interested in seeing max-line-length documented in the README or in the default config.

But my take is that if syntax highlighting performance was the motivation behind this functionality, then "How many characters of minified code (which is probably the worst case for performance here?) does it take to cause noticeable performance degradation?" is a good place to start.

@injust
Copy link
Contributor

injust commented Jul 9, 2024

For reference, my uBlock Origin backup file has a max line length of ~12k. git show takes about half a second, which definitely feels slow.

It's also a .txt file, so I don't think delta is trying to syntax highlight it?

@th1000s
Copy link
Collaborator

th1000s commented Jul 10, 2024

When looking at delta performance (a while ago), syntax highlighting was the most expensive part.

It might make sense to just limit syntax highlighting and not line length (within reason). In #1746 these are now separated, e.g.

--max-syntax-highlighting-length=500 --max-line-length=5000

A more complex solution would give delta "syntax fuel", which accumulates over normal lines and and can be spent on one or two overly long lines, but once it runs into a large block only the first 500 characters are highlighted.

@gasche
Copy link

gasche commented Jul 10, 2024

Indeed, I think that #1746 would certainly satisfy my use-case. Thanks!

@th1000s
Copy link
Collaborator

th1000s commented Jul 21, 2024

While --max-syntax-highlighting-length can speed up the output, there is a good chance that the following lines have incorrect highlighting until the next hunk resets the highlighter state machine, but that as also the case with --max-line-length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 High memory usage when fed diff of minimized files
4 participants