Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diff doesn't accurately minimize the number of diff hunks #11683

Closed
mgerner opened this issue Sep 8, 2016 · 7 comments
Closed

Diff doesn't accurately minimize the number of diff hunks #11683

mgerner opened this issue Sep 8, 2016 · 7 comments
Assignees
Labels
diff-editor Diff editor issues feature-request Request for new features or functionality on-testplan
Milestone

Comments

@mgerner
Copy link

mgerner commented Sep 8, 2016

For some changes, diff hunks (groups of changed lines) can be computed in different ways. VS Code sometimes separates changes into different hunks when it arguably shouldn't, making viewing of diffs (slightly) more difficult.

  • VSCode Version: 1.4.0
  • OS Version: Debian stretch

Steps to Reproduce:

Create a file with the contents (note two blank lines) and commit it using e.g. git.

1


2

In the middle of the file, add the content

21

22

The diff will now be calculated as shown below (looking good):
image

Add a blank line to the beginning of the file. The first blank line will now show up as its own hunk (correctly), but the section that was added in the middle will be split into two hunks. Note that we have added two sections to our file, but the diff unnecessarily shows it as three sections having been added:

image

You can see an actual screenshot from code of mine below (there are other changes above it as well). In this code, it would be very good if we at a glance could see that there was one big change, but instead it (at a glance) looks like two changes. This is unnecessary.

image

@ramya-rao-a ramya-rao-a added the diff-editor Diff editor issues label Sep 8, 2016
@alexdima
Copy link
Member

Technically both diffs are minimal w.r.t. the number of changed lines (from a LCS point of view), but not w.r.t. number of distinct diff regions. We could post-process the diff results to cover cases like this.

@alexdima alexdima added the feature-request Request for new features or functionality label Sep 12, 2016
@alexdima alexdima added this to the Backlog milestone Sep 12, 2016
@mgerner
Copy link
Author

mgerner commented Sep 12, 2016

Yes, that was my idea as well - introduce a very small cost associated with each distinct diff region, as you call them. That will ensure that the optimization (which I'm guessing is something based on dynamic programming?) will select the "better" diff.

@mgerner
Copy link
Author

mgerner commented Sep 12, 2016

(By the way, I really wouldn't call this a feature request. It's clear that the current behaviour is incorrect, isn't it? Changing the top of a file shouldn't change how diffs for completely unrelated sections are calculated. It'd be better characterized as a bug -- and an easy-to-fix bug, at that.)

@mgerner
Copy link
Author

mgerner commented Sep 12, 2016

Yet another example, that is even clearer: I added a new method to a Python file. This is as straight-forward as can be. What does the diff result look like?

image

The method I added have been broken up into three different diff sections, and the diff believes that I added my new method in the middle of an old method that was already present above the new one. This just looks bad, it's annoying, and it makes reading these diffs take longer than it should.

(I have to add though, that other than this, I absolutely love VS Code -- and the feeling in my office among those who use it is that it gives a good image of Microsoft.)

@alexdima
Copy link
Member

alexdima commented Sep 12, 2016

I agree this is perceived as a bug, but technically it is a feature request :). The diffing algorithm is based at its core on solving multiple LCS (longest common substring) problems. The algorithm correctly finds a longest common substring sequence, it is just that in some cases there are multiple longest common substring sequences and some would feel "better" than others.

What I mean by that is that in your last example, return res is matched with a return res (so the algorithm identifies a valid longest common substring sequence), it is just not the most "human appeasing" match.

To get to the "best" longest common substring sequence, we need to write some new code that takes the LCS result and tries to "massage" it with some heuristics to transform it into a "better" LCS.

See also #11657

@Tyriar
Copy link
Member

Tyriar commented Sep 23, 2016

I just ran into this myself and ti bugged me enough to see if there was an issue open. I don't typically use the full diff editor but I do make use of the blue/green/red diff indicators in the regular editor. Being familiar with git I expect one of the \t\t} lines to not be part of the diff.

image

Related: #10007

@alexdima
Copy link
Member

alexdima commented Jul 4, 2017

I believe this is improved significantly via a2c47ca (for #30087)

@alexdima alexdima closed this as completed Jul 4, 2017
@alexdima alexdima modified the milestones: July 2017, Backlog Jul 4, 2017
@vscodebot vscodebot bot locked and limited conversation to collaborators Nov 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
diff-editor Diff editor issues feature-request Request for new features or functionality on-testplan
Projects
None yet
Development

No branches or pull requests

5 participants