New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add support biblatex extended name format #11975

Draft

u7302654 wants to merge 6 commits into JabRef:main from u7302654:add-support-biblatex-extended-name-format

u7302654 commented Oct 16, 2024 •

edited by koppor

Loading

Description:
This pull request adds support for parsing BibLaTeX extended name format, which includes fields including family, given, prefix, suffix, and useprefix. This feature ensures correct parsing and formatting of author names with complex structures, including options to handle prefixes such as "van" and to manage cases where the prefix should be ignored (useprefix=false).

These changes solve the problem described in Issue #4558, where entries using the extended BibLaTeX format were not being parsed and sorted correctly, leading to incorrect name sorting behaviour.

Changes:

Added a new method parseExtendedNameFormat in AuthorListParser to handle the extended name format.
Created tests to ensure correct behaviour for author names with different combinations of family, given, prefix, and useprefix values.
Modified the author list parsing logic to include these new parsing rules.

Closes #4558

Mandatory checks

I own the copyright of the code submitted and I licence it under the MIT license
[/] Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
[/] Screenshots added in PR description (for UI changes)
Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

u7302654 added 4 commits

October 15, 2024 21:12


          Add support for BibLaTeX extended name format


          Add new test cases for BibLaTeX extended name format support

7094a6e


          Add comments to new methods in AuthorListParser and AuthorListParserTest


          Merge remote-tracking branch 'origin/main' into add-support-biblatex-…

fb0fb6c

…extended-name-format

github-actions bot reviewed

View reviewed changes

Contributor

github-actions bot left a comment

Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
The tool reviewdog already placed comments on GitHub to indicate the places. See the tab "Files" in you PR.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.

You can check review dog's comments at the tab "Files changed" of your pull request.

Member

koppor commented Oct 16, 2024 •

edited

Loading

Update: I removed the HTML comment markers  to enable issue linking

Member

koppor commented Oct 16, 2024

I did not see any JavaFX component implementing a new JavaFX component for the entry editor: #4558 (comment). It is OK to focus on the logic only. It is OK, but then we need to create a follow-up issue.

@u7302654 Please fix checkstyle issues. You seem to have missed Step 3: Set up JabRef's code style.

Moreover, please add a screenshot to show that the sorting issue of the original poster is solved (#4558 (comment)). If it is not solved, it is also OK, because it improves the data model, but then we cannot close the issue at all.

koppor requested changes

View reviewed changes

src/main/java/org/jabref/logic/importer/AuthorListParser.java

@@ @@ -147,59 +140,37 @@ public AuthorList parse(@NonNull String listOfNames) { @@
                       listOfNames = simpleNormalForm.authors;
                       boolean andOthersPresent = simpleNormalForm.andOthersPresent;
-                      // Handle case names in order lastname, firstname and separated by ","
-                      // E.g., Ali Babar, M., Dingsøyr, T., Lago, P., van der Vliet, H.
-                      final boolean authorsContainAND = listOfNames.toUpperCase(Locale.ENGLISH).contains(" AND ");

Member

koppor Oct 16, 2024

This code was removed - therefore org.jabref.logic.formatter.bibtexfields.NormalizeNamesFormatterTest is failing. Please fix. 😅

HoussemNasri reviewed

View reviewed changes

src/main/java/org/jabref/logic/importer/AuthorListParser.java Outdated

+                   */
+                  private Optional<Author> parseExtendedNameFormat(String authorString) {
+                      Map<String, String> nameParts = new HashMap<>();
+                      Matcher matcher = Pattern.compile("(\\w+)\\s*=\\s*([^,]+)(?:,\\s*|$)").matcher(authorString);

Member

HoussemNasri Oct 16, 2024 •

edited

Loading

Please compile the regex only once. Compiling regular expressions is an expensive operation. Imagine a database of 10,000 entries, all of which use the extended name format, and the regex compilation takes 1 ms. We would have a 10 second delay for nothing.

HoussemNasri reviewed

View reviewed changes

src/main/java/org/jabref/logic/importer/AuthorListParser.java

+                      Map<String, String> nameParts = new HashMap<>();
+                      Matcher matcher = Pattern.compile("(\\w+)\\s*=\\s*([^,]+)(?:,\\s*|$)").matcher(authorString);
+                      while (matcher.find()) {
+                          nameParts.put(matcher.group(1).trim(), matcher.group(2).trim());

Member

HoussemNasri Oct 16, 2024

nitpick: Named groups would make this code clearer

HoussemNasri reviewed

View reviewed changes

src/main/java/org/jabref/logic/importer/AuthorListParser.java Outdated

Comment on lines 266 to 268

+                          } else if (i <= authorList.length() - 5 && authorList.substring(i, i + 5).equals(" and ") && bracesLevel == 0) {
+                              authors.add(authorList.substring(start, i));
+                              i += 4;

Member

HoussemNasri Oct 16, 2024

A lot of magic values here, what do the 5 and 4 stand for here? If possible please delcare a constant that has a good name that explains the purpose of these values.

HoussemNasri reviewed

View reviewed changes

src/main/java/org/jabref/logic/importer/AuthorListParser.java

+                      String givenNameAbbreviated = abbreviateGivenName(givenName);
+                      // create Author object
+                      return Optional.of(new Author(givenName, givenNameAbbreviated, namePrefix, familyName, nameSuffix));

Member

HoussemNasri Oct 16, 2024

This comment doesn't say anything that the code doesn't say already, it is useless, please remove it.

HoussemNasri reviewed

View reviewed changes

src/main/java/org/jabref/logic/importer/AuthorListParser.java

Comment on lines +308 to +310

+                      while (tokenStart < original.length() && Character.isWhitespace(original.charAt(tokenStart))) {
+                          tokenStart++;
+                      }

Member

HoussemNasri Oct 16, 2024

A comment above this block however would be helpful.

u7302654 added 2 commits

October 24, 2024 22:24


          Merge branch 'JabRef:main' into main

15631ec


          Fix checkstyle, compiling regular expressions and magic values

48dec44

koppor marked this pull request as draft

October 24, 2024 11:51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet