-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diff doesn't accurately minimize the number of diff hunks #11683
Comments
Technically both diffs are minimal w.r.t. the number of changed lines (from a LCS point of view), but not w.r.t. number of distinct diff regions. We could post-process the diff results to cover cases like this. |
Yes, that was my idea as well - introduce a very small cost associated with each distinct diff region, as you call them. That will ensure that the optimization (which I'm guessing is something based on dynamic programming?) will select the "better" diff. |
(By the way, I really wouldn't call this a feature request. It's clear that the current behaviour is incorrect, isn't it? Changing the top of a file shouldn't change how diffs for completely unrelated sections are calculated. It'd be better characterized as a bug -- and an easy-to-fix bug, at that.) |
Yet another example, that is even clearer: I added a new method to a Python file. This is as straight-forward as can be. What does the diff result look like? The method I added have been broken up into three different diff sections, and the diff believes that I added my new method in the middle of an old method that was already present above the new one. This just looks bad, it's annoying, and it makes reading these diffs take longer than it should. (I have to add though, that other than this, I absolutely love VS Code -- and the feeling in my office among those who use it is that it gives a good image of Microsoft.) |
I agree this is perceived as a bug, but technically it is a feature request :). The diffing algorithm is based at its core on solving multiple LCS (longest common substring) problems. The algorithm correctly finds a longest common substring sequence, it is just that in some cases there are multiple longest common substring sequences and some would feel "better" than others. What I mean by that is that in your last example, To get to the "best" longest common substring sequence, we need to write some new code that takes the LCS result and tries to "massage" it with some heuristics to transform it into a "better" LCS. See also #11657 |
I just ran into this myself and ti bugged me enough to see if there was an issue open. I don't typically use the full diff editor but I do make use of the blue/green/red diff indicators in the regular editor. Being familiar with git I expect one of the Related: #10007 |
For some changes, diff hunks (groups of changed lines) can be computed in different ways. VS Code sometimes separates changes into different hunks when it arguably shouldn't, making viewing of diffs (slightly) more difficult.
Steps to Reproduce:
Create a file with the contents (note two blank lines) and commit it using e.g. git.
In the middle of the file, add the content
The diff will now be calculated as shown below (looking good):
Add a blank line to the beginning of the file. The first blank line will now show up as its own hunk (correctly), but the section that was added in the middle will be split into two hunks. Note that we have added two sections to our file, but the diff unnecessarily shows it as three sections having been added:
You can see an actual screenshot from code of mine below (there are other changes above it as well). In this code, it would be very good if we at a glance could see that there was one big change, but instead it (at a glance) looks like two changes. This is unnecessary.
The text was updated successfully, but these errors were encountered: