Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement of Duplicate Detection #6707

Closed
mattys101 opened this issue Jul 25, 2020 · 4 comments · Fixed by #6897
Closed

Improvement of Duplicate Detection #6707

mattys101 opened this issue Jul 25, 2020 · 4 comments · Fixed by #6897
Labels
good first issue An issue intended for project-newcomers. Varies in difficulty. type: feature

Comments

@mattys101
Copy link

Please see the original write-up for this on the features request page: https://discourse.jabref.org/t/more-control-on-the-duplicate-finder/120/8

Adding the issue here based on the recommendation of Chirstoph's response to the above.

@Siedlerchr Siedlerchr added good first issue An issue intended for project-newcomers. Varies in difficulty. type: feature labels Jul 25, 2020
@KunAndrew
Copy link
Contributor

Hi. Can I start working on this?

@Siedlerchr
Copy link
Member

@KunAndrew Sure, go ahead! To get started check out the Contribution guide
As I outlined in the forum answer already, the code for the Duplication algorithm is located in this class:
https://github.com/JabRef/jabref/blob/master/src/main/java/org/jabref/logic/bibtex/DuplicateCheck.java

One first easy step would be the comparison of the Identifier Objects, by using the classes from the Identifier package and calling parse, like DOI or ArXIV or ISBN. It might be useful to add the parse method to the interface.

public static Optional<DOI> parse(String doi) {

A second good thing would also to reduce the weighting of the entry type and to exclude or reduce the weighting of some fields like note or comment.

@KunAndrew
Copy link
Contributor

KunAndrew commented Aug 11, 2020

@Siedlerchr @mattys101 Should I do "option to mark the entries as not duplicates" within this pull request?
Pull request unit test failed. I can not define this related with my code or not, because when i running test without my changes they failed too.
Second problem: when I run scripts/generate-authors.sh. for update AUTHORS. It add my name to the list but reduce list from 401 to 48 peoples.
#6756

@Siedlerchr
Copy link
Member

Siedlerchr commented Sep 29, 2020

@mayrmt Thanks to the groundwork by @KunAndrew and the finishing of us, we could merge it
Thank you for reporting this issue. We think, that is already fixed in our development version and consequently the change will be included in the next release.

We would like to ask you to use a development build from https://builds.jabref.org/master and report back if it works for you. Please remember to make a backup of your library before trying-out this version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue An issue intended for project-newcomers. Varies in difficulty. type: feature
Projects
Archived in project
3 participants