Diff doesn't accurately minimize the number of diff hunks #11683

mgerner · 2016-09-08T10:50:24Z

For some changes, diff hunks (groups of changed lines) can be computed in different ways. VS Code sometimes separates changes into different hunks when it arguably shouldn't, making viewing of diffs (slightly) more difficult.

VSCode Version: 1.4.0
OS Version: Debian stretch

Steps to Reproduce:

Create a file with the contents (note two blank lines) and commit it using e.g. git.

In the middle of the file, add the content

21

22

The diff will now be calculated as shown below (looking good):

Add a blank line to the beginning of the file. The first blank line will now show up as its own hunk (correctly), but the section that was added in the middle will be split into two hunks. Note that we have added two sections to our file, but the diff unnecessarily shows it as three sections having been added:

You can see an actual screenshot from code of mine below (there are other changes above it as well). In this code, it would be very good if we at a glance could see that there was one big change, but instead it (at a glance) looks like two changes. This is unnecessary.

The text was updated successfully, but these errors were encountered:

alexdima · 2016-09-12T08:27:26Z

Technically both diffs are minimal w.r.t. the number of changed lines (from a LCS point of view), but not w.r.t. number of distinct diff regions. We could post-process the diff results to cover cases like this.

mgerner · 2016-09-12T09:12:00Z

Yes, that was my idea as well - introduce a very small cost associated with each distinct diff region, as you call them. That will ensure that the optimization (which I'm guessing is something based on dynamic programming?) will select the "better" diff.

mgerner · 2016-09-12T09:14:31Z

(By the way, I really wouldn't call this a feature request. It's clear that the current behaviour is incorrect, isn't it? Changing the top of a file shouldn't change how diffs for completely unrelated sections are calculated. It'd be better characterized as a bug -- and an easy-to-fix bug, at that.)

mgerner · 2016-09-12T09:33:42Z

Yet another example, that is even clearer: I added a new method to a Python file. This is as straight-forward as can be. What does the diff result look like?

The method I added have been broken up into three different diff sections, and the diff believes that I added my new method in the middle of an old method that was already present above the new one. This just looks bad, it's annoying, and it makes reading these diffs take longer than it should.

(I have to add though, that other than this, I absolutely love VS Code -- and the feeling in my office among those who use it is that it gives a good image of Microsoft.)

alexdima · 2016-09-12T10:30:52Z

I agree this is perceived as a bug, but technically it is a feature request :). The diffing algorithm is based at its core on solving multiple LCS (longest common substring) problems. The algorithm correctly finds a longest common substring sequence, it is just that in some cases there are multiple longest common substring sequences and some would feel "better" than others.

What I mean by that is that in your last example, return res is matched with a return res (so the algorithm identifies a valid longest common substring sequence), it is just not the most "human appeasing" match.

To get to the "best" longest common substring sequence, we need to write some new code that takes the LCS result and tries to "massage" it with some heuristics to transform it into a "better" LCS.

See also #11657

Tyriar · 2016-09-23T07:19:05Z

I just ran into this myself and ti bugged me enough to see if there was an issue open. I don't typically use the full diff editor but I do make use of the blue/green/red diff indicators in the regular editor. Being familiar with git I expect one of the \t\t} lines to not be part of the diff.

Related: #10007

alexdima · 2017-07-04T14:13:39Z

I believe this is improved significantly via a2c47ca (for #30087)

ramya-rao-a added the diff-editor Diff editor issues label Sep 8, 2016

ramya-rao-a assigned alexdima Sep 8, 2016

alexdima added the feature-request Request for new features or functionality label Sep 12, 2016

alexdima added this to the Backlog milestone Sep 12, 2016

alexdima closed this as completed Jul 4, 2017

alexdima modified the milestones: July 2017, Backlog Jul 4, 2017

weinand added the on-testplan label Jul 27, 2017

vscodebot bot locked and limited conversation to collaborators Nov 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diff doesn't accurately minimize the number of diff hunks #11683

Diff doesn't accurately minimize the number of diff hunks #11683

mgerner commented Sep 8, 2016

alexdima commented Sep 12, 2016

mgerner commented Sep 12, 2016 •

edited

Loading

mgerner commented Sep 12, 2016

mgerner commented Sep 12, 2016 •

edited

Loading

alexdima commented Sep 12, 2016 •

edited

Loading

Tyriar commented Sep 23, 2016 •

edited

Loading

alexdima commented Jul 4, 2017

Diff doesn't accurately minimize the number of diff hunks #11683

Diff doesn't accurately minimize the number of diff hunks #11683

Comments

mgerner commented Sep 8, 2016

alexdima commented Sep 12, 2016

mgerner commented Sep 12, 2016 • edited Loading

mgerner commented Sep 12, 2016

mgerner commented Sep 12, 2016 • edited Loading

alexdima commented Sep 12, 2016 • edited Loading

Tyriar commented Sep 23, 2016 • edited Loading

alexdima commented Jul 4, 2017

mgerner commented Sep 12, 2016 •

edited

Loading

mgerner commented Sep 12, 2016 •

edited

Loading

alexdima commented Sep 12, 2016 •

edited

Loading

Tyriar commented Sep 23, 2016 •

edited

Loading