-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check duplicate DOI #6333
Check duplicate DOI #6333
Changes from 9 commits
2e07d9b
c3724a8
1a7aa44
a1b0fbc
9b46f85
ae89990
006c241
b3e703b
2c00e7d
fd7621d
03dd2a0
6f9b148
2de1858
4c9e687
97b9cbe
eaebdf5
7c72d52
c8296c3
6597dc6
de629d2
5a5046c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
package org.jabref.logic.integrity; | ||
|
||
import java.util.List; | ||
|
||
import org.jabref.model.entry.BibEntry; | ||
|
||
@FunctionalInterface | ||
public interface Checker { | ||
List<IntegrityMessage> check(BibEntry entry); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
package org.jabref.logic.integrity; | ||
|
||
import java.util.ArrayList; | ||
import java.util.Collections; | ||
import java.util.HashMap; | ||
import java.util.List; | ||
import java.util.Map; | ||
import java.util.Objects; | ||
|
||
import javafx.collections.ObservableList; | ||
|
||
import org.jabref.logic.l10n.Localization; | ||
import org.jabref.model.database.BibDatabase; | ||
import org.jabref.model.entry.BibEntry; | ||
import org.jabref.model.entry.field.StandardField; | ||
import org.jabref.model.entry.identifier.DOI; | ||
|
||
import com.google.common.collect.BiMap; | ||
import com.google.common.collect.HashBiMap; | ||
|
||
public class DoiDuplicationChecker implements Checker { | ||
private final BibDatabase database; | ||
private Map<BibEntry, List<IntegrityMessage>> errors; | ||
|
||
public DoiDuplicationChecker(BibDatabase database) { | ||
this.database = Objects.requireNonNull(database); | ||
} | ||
|
||
@Override | ||
public List<IntegrityMessage> check(BibEntry entry) { | ||
if (errors == null) { | ||
errors = new HashMap<>(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not directly initialize that HashMap in the field declaration? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MAybe you did not see that the later code uses the BibtexDatabase from the constructor. We decided to put the code into check - and not into the constructor - to have a lean constructor and not have high CPU usage when initializing the checker, but at the first time an entry is checked. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like premature optimization for me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are you caching the errors anyway? Also in case you are worrying about performance, why do you choose to implement a solution that has O(n^2) (with n = entries). Why not implement a O(n) solution (e.g. https://stackoverflow.com/a/31341963/873661) which you call in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See in line 30 that the method is called once per entry. Is is by design of the At the first call, the result map is filled. At all subsequent calls do not fill the map and reuse it. We are O(n+m) where n is the size of the database and m is the number of entries with DOIs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To make the code more reable, I moved the call to the error check to the constructor. See fd7621d |
||
|
||
ObservableList<BibEntry> bibEntries = database.getEntries(); | ||
BiMap<DOI, List<BibEntry>> duplicateMap = HashBiMap.create(bibEntries.size()); | ||
for (BibEntry bibEntry : bibEntries) { | ||
bibEntry.getDOI().ifPresent(doi -> | ||
duplicateMap.computeIfAbsent(doi, x -> new ArrayList<>()).add(bibEntry)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @koppor Please have a close look. The second argument of duplicateMap.computeIfAbsent refers to the value and this clearly is of type List. So then maybe name it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Siedlerchr Please explain the JavaDoc to me. The Map is FROM doi TO list. https://www.geeksforgeeks.org/hashmap-computeifabsent-method-in-java-with-examples/ explains the To me, it reads, that the key is passed to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To make the code more reable, I called the variable |
||
} | ||
|
||
duplicateMap.inverse().keySet().stream() | ||
.filter(list -> list.size() > 1) | ||
.flatMap(list -> list.stream()) | ||
.forEach(item -> { | ||
IntegrityMessage errorMessage = new IntegrityMessage(Localization.lang("Unique DOI used in multiple entries"), item, StandardField.DOI); | ||
errors.put(item, List.of(errorMessage)); | ||
}); | ||
} | ||
return errors.getOrDefault(entry, Collections.emptyList()); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,19 @@ public class IntegrityCheck { | |
private final JournalAbbreviationRepository journalAbbreviationRepository; | ||
private final boolean enforceLegalKey; | ||
private final boolean allowIntegerEdition; | ||
private ASCIICharacterChecker asciiCharacterChecker; | ||
private NoBibtexFieldChecker noBibtexFieldChecker; | ||
private BibTeXEntryTypeChecker bibTeXEntryTypeChecker; | ||
private BibtexKeyChecker bibtexKeyChecker; | ||
private TypeChecker typeChecker; | ||
private BibStringChecker bibStringChecker; | ||
private HTMLCharacterChecker htmlCharacterChecker; | ||
private EntryLinkChecker entryLinkChecker; | ||
private BibtexkeyDeviationChecker bibtexkeyDeviationChecker; | ||
private BibtexKeyDuplicationChecker bibtexKeyDuplicationChecker; | ||
private JournalInAbbreviationListChecker journalInAbbreviationListChecker; | ||
private FieldCheckers fieldCheckers; | ||
private DoiDuplicationChecker doiDuplicationChecker; | ||
|
||
public IntegrityCheck(BibDatabaseContext bibDatabaseContext, | ||
FilePreferences filePreferences, | ||
|
@@ -32,9 +45,36 @@ public IntegrityCheck(BibDatabaseContext bibDatabaseContext, | |
this.journalAbbreviationRepository = Objects.requireNonNull(journalAbbreviationRepository); | ||
this.enforceLegalKey = enforceLegalKey; | ||
this.allowIntegerEdition = allowIntegerEdition; | ||
initCheckers(bibDatabaseContext, bibtexKeyPatternPreferences, journalAbbreviationRepository); | ||
} | ||
|
||
public List<IntegrityMessage> checkDatabase() { | ||
private void initCheckers(BibDatabaseContext bibDatabaseContext, BibtexKeyPatternPreferences bibtexKeyPatternPreferences, JournalAbbreviationRepository journalAbbreviationRepository) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be honest, I don't see any advantage of this refactoring. Just makes the code more complex and harder to maintain in my opinion. If you really feel like the original code needs a refactoring, then you create a list of all checkers that need to be run a) always, b) bibtex c)biblatex (still I would create these lists only in the checkdatabase method). This would get ride of the repeated There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We initialize the checkers once per database check - not once per entry check. In case of a 20k database, approx 100k less java objects in memory per quality check. We ensured that each checker is stateless - thus, it can be reused for acroess entries. I agree with your proposal though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With your proposal, the code is much more reable. 🎉 03dd2a0 |
||
asciiCharacterChecker = new ASCIICharacterChecker(); | ||
noBibtexFieldChecker = new NoBibtexFieldChecker(); | ||
bibTeXEntryTypeChecker = new BibTeXEntryTypeChecker(); | ||
bibtexKeyChecker = new BibtexKeyChecker(); | ||
typeChecker = new TypeChecker(); | ||
bibStringChecker = new BibStringChecker(); | ||
htmlCharacterChecker = new HTMLCharacterChecker(); | ||
entryLinkChecker = new EntryLinkChecker(bibDatabaseContext.getDatabase()); | ||
bibtexkeyDeviationChecker = new BibtexkeyDeviationChecker(bibDatabaseContext, bibtexKeyPatternPreferences); | ||
bibtexKeyDuplicationChecker = new BibtexKeyDuplicationChecker(bibDatabaseContext.getDatabase()); | ||
doiDuplicationChecker = new DoiDuplicationChecker(bibDatabaseContext.getDatabase()); | ||
|
||
if (bibDatabaseContext.isBiblatexMode()) { | ||
journalInAbbreviationListChecker = new JournalInAbbreviationListChecker(StandardField.JOURNALTITLE, journalAbbreviationRepository); | ||
} else { | ||
journalInAbbreviationListChecker = new JournalInAbbreviationListChecker(StandardField.JOURNAL, journalAbbreviationRepository); | ||
} | ||
|
||
fieldCheckers = new FieldCheckers(bibDatabaseContext, | ||
filePreferences, | ||
journalAbbreviationRepository, | ||
enforceLegalKey, | ||
allowIntegerEdition); | ||
} | ||
|
||
List<IntegrityMessage> checkDatabase() { | ||
List<IntegrityMessage> result = new ArrayList<>(); | ||
|
||
for (BibEntry entry : bibDatabaseContext.getDatabase().getEntries()) { | ||
|
@@ -51,38 +91,27 @@ public List<IntegrityMessage> checkEntry(BibEntry entry) { | |
return result; | ||
} | ||
|
||
FieldCheckers fieldCheckers = new FieldCheckers(bibDatabaseContext, | ||
filePreferences, | ||
journalAbbreviationRepository, | ||
enforceLegalKey, | ||
allowIntegerEdition); | ||
for (FieldChecker checker : fieldCheckers.getAll()) { | ||
result.addAll(checker.check(entry)); | ||
} | ||
|
||
if (!bibDatabaseContext.isBiblatexMode()) { | ||
// BibTeX only checkers | ||
result.addAll(new ASCIICharacterChecker().check(entry)); | ||
result.addAll(new NoBibtexFieldChecker().check(entry)); | ||
result.addAll(new BibTeXEntryTypeChecker().check(entry)); | ||
result.addAll(new JournalInAbbreviationListChecker(StandardField.JOURNAL, journalAbbreviationRepository).check(entry)); | ||
} else { | ||
result.addAll(new JournalInAbbreviationListChecker(StandardField.JOURNALTITLE, journalAbbreviationRepository).check(entry)); | ||
result.addAll(asciiCharacterChecker.check(entry)); | ||
result.addAll(noBibtexFieldChecker.check(entry)); | ||
result.addAll(bibTeXEntryTypeChecker.check(entry)); | ||
} | ||
|
||
result.addAll(new BibtexKeyChecker().check(entry)); | ||
result.addAll(new TypeChecker().check(entry)); | ||
result.addAll(new BibStringChecker().check(entry)); | ||
result.addAll(new HTMLCharacterChecker().check(entry)); | ||
result.addAll(new EntryLinkChecker(bibDatabaseContext.getDatabase()).check(entry)); | ||
result.addAll(new BibtexkeyDeviationChecker(bibDatabaseContext, bibtexKeyPatternPreferences).check(entry)); | ||
result.addAll(new BibtexKeyDuplicationChecker(bibDatabaseContext.getDatabase()).check(entry)); | ||
result.addAll(journalInAbbreviationListChecker.check(entry)); | ||
result.addAll(bibtexKeyChecker.check(entry)); | ||
result.addAll(typeChecker.check(entry)); | ||
result.addAll(bibStringChecker.check(entry)); | ||
result.addAll(htmlCharacterChecker.check(entry)); | ||
result.addAll(entryLinkChecker.check(entry)); | ||
result.addAll(bibtexkeyDeviationChecker.check(entry)); | ||
result.addAll(bibtexKeyDuplicationChecker.check(entry)); | ||
result.addAll(doiDuplicationChecker.check(entry)); | ||
|
||
return result; | ||
} | ||
|
||
@FunctionalInterface | ||
public interface Checker { | ||
List<IntegrityMessage> check(BibEntry entry); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to EntryChecker?