Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support biblatex extended name format #11975

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

u7302654
Copy link

@u7302654 u7302654 commented Oct 16, 2024

Description:
This pull request adds support for parsing BibLaTeX extended name format, which includes fields including family, given, prefix, suffix, and useprefix. This feature ensures correct parsing and formatting of author names with complex structures, including options to handle prefixes such as "van" and to manage cases where the prefix should be ignored (useprefix=false).

These changes solve the problem described in Issue #4558, where entries using the extended BibLaTeX format were not being parsed and sorted correctly, leading to incorrect name sorting behaviour.

Changes:

  • Added a new method parseExtendedNameFormat in AuthorListParser to handle the extended name format.
  • Created tests to ensure correct behaviour for author names with different combinations of family, given, prefix, and useprefix values.
  • Modified the author list parsing logic to include these new parsing rules.

Closes #4558

Mandatory checks

  • I own the copyright of the code submitted and I licence it under the MIT license
  • [/] Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • [/] Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
The tool reviewdog already placed comments on GitHub to indicate the places. See the tab "Files" in you PR.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.

You can check review dog's comments at the tab "Files changed" of your pull request.

@koppor
Copy link
Member

koppor commented Oct 16, 2024

Update: I removed the HTML comment markers <!-- ... --> to enable issue linking

@koppor
Copy link
Member

koppor commented Oct 16, 2024

I did not see any JavaFX component implementing a new JavaFX component for the entry editor: #4558 (comment). It is OK to focus on the logic only. It is OK, but then we need to create a follow-up issue.

@u7302654 Please fix checkstyle issues. You seem to have missed Step 3: Set up JabRef's code style.

Moreover, please add a screenshot to show that the sorting issue of the original poster is solved (#4558 (comment)). If it is not solved, it is also OK, because it improves the data model, but then we cannot close the issue at all.

@@ -147,59 +140,37 @@ public AuthorList parse(@NonNull String listOfNames) {
listOfNames = simpleNormalForm.authors;
boolean andOthersPresent = simpleNormalForm.andOthersPresent;

// Handle case names in order lastname, firstname and separated by ","
// E.g., Ali Babar, M., Dingsøyr, T., Lago, P., van der Vliet, H.
final boolean authorsContainAND = listOfNames.toUpperCase(Locale.ENGLISH).contains(" AND ");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was removed - therefore org.jabref.logic.formatter.bibtexfields.NormalizeNamesFormatterTest is failing. Please fix. 😅

*/
private Optional<Author> parseExtendedNameFormat(String authorString) {
Map<String, String> nameParts = new HashMap<>();
Matcher matcher = Pattern.compile("(\\w+)\\s*=\\s*([^,]+)(?:,\\s*|$)").matcher(authorString);
Copy link
Member

@HoussemNasri HoussemNasri Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please compile the regex only once. Compiling regular expressions is an expensive operation. Imagine a database of 10,000 entries, all of which use the extended name format, and the regex compilation takes 1 ms. We would have a 10 second delay for nothing.

Map<String, String> nameParts = new HashMap<>();
Matcher matcher = Pattern.compile("(\\w+)\\s*=\\s*([^,]+)(?:,\\s*|$)").matcher(authorString);
while (matcher.find()) {
nameParts.put(matcher.group(1).trim(), matcher.group(2).trim());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Named groups would make this code clearer

Comment on lines 266 to 268
} else if (i <= authorList.length() - 5 && authorList.substring(i, i + 5).equals(" and ") && bracesLevel == 0) {
authors.add(authorList.substring(start, i));
i += 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of magic values here, what do the 5 and 4 stand for here? If possible please delcare a constant that has a good name that explains the purpose of these values.

String givenNameAbbreviated = abbreviateGivenName(givenName);

// create Author object
return Optional.of(new Author(givenName, givenNameAbbreviated, namePrefix, familyName, nameSuffix));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't say anything that the code doesn't say already, it is useless, please remove it.

Comment on lines +308 to +310
while (tokenStart < original.length() && Character.isWhitespace(original.charAt(tokenStart))) {
tokenStart++;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment above this block however would be helpful.

@koppor koppor marked this pull request as draft October 24, 2024 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support of BibLaTeX extended name format
3 participants