Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JGit’s similarity threshold for rename detection (60%) is different from git’s (50%) #110

Open
2is10 opened this issue Nov 12, 2024 · 2 comments

Comments

@2is10
Copy link

2is10 commented Nov 12, 2024

Version

7.0.0

Operating System

Linux/Unix, MacOS

Bug description

File rename operations in a git commit are identified by computing the similarity (percentage of lines unchanged) of added/removed pairs of files in the commit. If the similarity exceeds some threshold, the deletion and addition are considered a file rename.

Git’s similarity threshold for rename detection is configurable and defaults to 50%. This is clearly documented, and Google knows it well.

JGit’s similarity threshold for rename detection is not configurable and is 60%. That constant was introduced in 2010 (978535b).

The threshold discrepancy manifests often in Gerrit code reviews (see Gerrit bug 40015217 for examples), but it can also manifest in the output of simple commands like:

  • jgit status
  • jgit diff
  • jgit log
  • jgit show

Steps to reproduce

In any empty or existing git repository, run the following commands:

  1. echo $'1\n2\n3\n4\n5\n6\n7\n8' > a.txt
  2. git add a.txt
  3. git commit -m a.txt
  4. git rm a.txt
  5. echo $'1\n2\n3\n4\n5.\n6.\n7.\n8' > b.txt
  6. git add b.txt
  7. git status
  8. jgit status
  9. git commit -m b.txt
  10. git show
  11. jgit show

Actual behavior

The jgit status output is:

Changes to be committed:

	deleted:    a.txt
	new file:   b.txt

The jgit show output is:

commit c33f4dce1d4a0f252b6592c97e519c802a9d64d4
Author: Jared Jacobs
Date:   Tue Nov 12 12:48:58 2024 -0800

    b.txt

diff --git a/a.txt b/a.txt
deleted file mode 100644
index 535d2b0..0000000
--- a/a.txt
+++ /dev/null
@@ -1,8 +0,0 @@
-1
-2
-3
-4
-5
-6
-7
-8
diff --git a/b.txt b/b.txt
new file mode 100644
index 0000000..472a59c
--- /dev/null
+++ b/b.txt
@@ -0,0 +1,8 @@
+1
+2
+3
+4
+5.
+6.
+7.
+8

Expected behavior

I expected the jgit status output to roughly match the git status output, identifying the staged changes as a file rename:

Changes to be committed:

	renamed:    a.txt -> b.txt

I also expected the jgit show output to roughly match the git show output, identifying the commit’s changes as a file rename:

commit c33f4dce1d4a0f252b6592c97e519c802a9d64d4
Author: Jared Jacobs
Date:   Tue Nov 12 12:48:58 2024 -0800

    b.txt

diff --git a/a.txt b/b.txt
similarity index 52%
rename from a.txt
rename to b.txt
index 535d2b01d33..472a59c47cf 100644
--- a/a.txt
+++ b/b.txt
@@ -2,7 +2,7 @@
 2
 3
 4
-5
-6
-7
+5.
+6.
+7.
 8

Relevant log output

No response

Other information

No response

@msohn
Copy link
Member

msohn commented Nov 14, 2024

Please review https://eclipse.gerrithub.io/c/eclipse-jgit/jgit/+/1203908 fixing the default similarity score.

void RenameDetector.setRenameScore(int score) provides the API to configure the score.
RenameDetector DiffFormatter.getRenameDetector() can be used to access the rename detector it's using and configure it using setRenameScore(int score).

@2is10
Copy link
Author

2is10 commented Nov 15, 2024

Thanks for the quick fix! The info about how to customize the threshold is also helpful. I’ll pass it on to the Gerrit team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants