-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to institution citation keys #7210
Updates to institution citation keys #7210
Conversation
The regexp is still broad, but unless there are further complaints perhaps it is enough.
Except for logging when LatexToUnicodeAdapter fails (otherwise names such as "{Link{"{o}}ping University}}" will generate only There will be some updates to the documentation, change log etc. |
Wait... I can't check for failed parsing? |
When generating a key from a university name it should contain at least two parts, "university" and the university's name. If it does not it is likely that the name contained latex that could not be resolved correctly.
I guess you can use |
@tobiasdiez except for JavaDoc and changelog, yes, it is ready for review. Which |
@tobiasdiez sorry, I missed to push 1 local commit. |
O.o |
I have to admit I had not yet looked at your PR. It was just in response to your question. What do you want to do? |
@tobiasdiez good then :) In my opinion it is better if the generation warns if there is incorrect latex. But I am not sure how to do it in practice. I don't have experience with Scala <-> Java but if I am not misreading the |
I don't know the drawbacks/advantages of that approach, but if this is the only part of the JabRef code that needs this, perhaps it is better to leave the code as-is? |
We already have farstparse as a dependency, so that's not a problem. I agree it would be nice to check for parsing errors, so I support you here (sadly only morally since I don't have any scala experience too). |
|
src/main/java/org/jabref/model/strings/LatexToUnicodeAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/model/strings/LatexToUnicodeAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/model/strings/LatexToUnicodeAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/model/strings/LatexToUnicodeAdapter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/model/strings/LatexToUnicodeAdapter.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Christoph <siedlerkiller@gmail.com>
Co-authored-by: Christoph <siedlerkiller@gmail.com>
Co-authored-by: Christoph <siedlerkiller@gmail.com>
Co-authored-by: Christoph <siedlerkiller@gmail.com>
src/main/java/org/jabref/model/strings/LatexToUnicodeAdapter.java
Outdated
Show resolved
Hide resolved
I’ll be looking at it shortly (dinner)
…On December 22, 2020 at 17:29:07, Christoph ***@***.******@***.***)) wrote:
@Siedlerchr commented on this pull request.
In src/main/java/org/jabref/model/strings/LatexToUnicodeAdapter.java(#7210 (comment)):
> + * @return a String with LaTeX resolved into Unicode + * @throws IllegalArgumentException if the LaTeX could not be parsed + */ + public static Optional<String> parse(String inField) throws IllegalArgumentException { + Objects.requireNonNull(inField); + String toFormat = UNDERSCORE_MATCHER.matcher(inField).replaceAll(REPLACEMENT_CHAR); + try { + var parsingResult = LaTeX2Unicode.parse(toFormat); + if (parsingResult instanceof Parsed.Success) { + String text = parsingResult.get().value(); + toFormat = Normalizer.normalize(text, Normalizer.Form.NFC); + return Optional.of(UNDERSCORE_PLACEHOLDER_MATCHER.matcher(toFormat).replaceAll("_")); + } else { + return Optional.empty() + } + } catch (Throwable throwable) {
the try and catch can now be removed
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub(#7210 (review)), or unsubscribe(https://github.com/notifications/unsubscribe-auth/AAT2NZ3T6F3IGY5GR7TBESDSWEMTHANCNFSM4VDLJ7NQ).
|
Is this ready for review? (I think it looks good) |
Yup, it is ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change look good to me. Thanks a lot!
Could you please also add the examples from the issues as test to verify that the issue doesn't reoccur. Thanks!
@tobiasdiez good catch, apparently I did not examine my assumptions enough. My concerns regarding including it is, 1.Based on #6706 (comment) and others,
my thoughts were that, to the largest extent possible, BibTeX citation keys should be ASCII/ANSI only, in which case #7199 should not be added as a test case. If 2.I am not convinced that the currently generated citation key is correct, but you seem to know more about this particular issue than I, so feel free to correct me 😛
In my opinion, the "correct" citation key should be either 杨 or 秀群 (i.e., based on the documentation, the citation key shoul be the last name). With However, all of that being said (and my apologies about the rather lengthy reading), if this is how JabRef is used, perhaps the test case should be added anyway? |
As we have quite a lot of Chinese users, it would be good to support this as far as possible. Thus, if the user enters Chinese authors, then I would say we should generate a key with chinese characters. In the best case, one would actually replace the Chinese character by their romanization, but I'm not sure if there exist a convenient library for this. For now, I would say we simply add |
Sure!
pinyin4j seem to be the most popular Java alternative. However, given that there isn't a need of high performance (clean up operation?), there are more popular javascript libraries. I don't know if there is an interest for this? |
Thanks, also for the research concerning pinyin libraries. Let's see if users report this as a feature request. |
@Siedlerchr @tobiasdiez thank you both for your time, reviews and suggestions, they are appreciated! |
Thank you ALL! |
* upstream/master: (201 commits) Only disable move to file dir when path equals (#7269) Improved detection of long DOI's within text (#7260) Add missing author and fix name Fix style of highlighted checkboxes while searching in preferences (#7258) Updates to institution citation keys (#7210) Bump xmlunit-core from 2.8.1 to 2.8.2 (#7251) Bump classgraph from 4.8.97 to 4.8.98 (#7250) Bump bcprov-jdk15on from 1.67 to 1.68 (#7249) Bump xmlunit-matchers from 2.8.1 to 2.8.2 (#7252) Bump unirest-java from 3.11.06 to 3.11.09 (#7254) Bump org.beryx.jlink from 2.23.0 to 2.23.1 (#7253) Bump pascalgn/automerge-action from v0.12.0 to v0.13.0 (#7255) Added a check to integrate with the flatpak package (#7248) New translations JabRef_en.properties (Chinese Traditional) (#7247) Update code-howtos.md GitBook: [master] 5 pages and 25 assets modified New Crowdin updates (#7246) add language mapping for chinese remove chinese content fix hamcrest link ... # Conflicts: # src/main/java/org/jabref/gui/JabRefFrame.java # src/main/java/org/jabref/gui/dialogs/AutosaveUiManager.java # src/main/java/org/jabref/gui/exporter/SaveAction.java # src/main/java/org/jabref/gui/exporter/SaveDatabaseAction.java # src/main/java/org/jabref/logic/autosaveandbackup/BackupManager.java # src/test/java/org/jabref/gui/exporter/SaveDatabaseActionTest.java
* updateGradle7: (39 commits) try upgrading gradle Only disable move to file dir when path equals (#7269) Improved detection of long DOI's within text (#7260) Add missing author and fix name Fix style of highlighted checkboxes while searching in preferences (#7258) Updates to institution citation keys (#7210) Bump xmlunit-core from 2.8.1 to 2.8.2 (#7251) Bump classgraph from 4.8.97 to 4.8.98 (#7250) Bump bcprov-jdk15on from 1.67 to 1.68 (#7249) Bump xmlunit-matchers from 2.8.1 to 2.8.2 (#7252) Bump unirest-java from 3.11.06 to 3.11.09 (#7254) Bump org.beryx.jlink from 2.23.0 to 2.23.1 (#7253) Bump pascalgn/automerge-action from v0.12.0 to v0.13.0 (#7255) Added a check to integrate with the flatpak package (#7248) New translations JabRef_en.properties (Chinese Traditional) (#7247) Update code-howtos.md GitBook: [master] 5 pages and 25 assets modified New Crowdin updates (#7246) add language mapping for chinese remove chinese content ...
* upstream/master: fix checsktyle Fix for application dialogs opening in wrong displays (#7273) Only disable move to file dir when path equals (#7269) Improved detection of long DOI's within text (#7260) Add missing author and fix name Fix style of highlighted checkboxes while searching in preferences (#7258) Updates to institution citation keys (#7210) Bump xmlunit-core from 2.8.1 to 2.8.2 (#7251) Bump classgraph from 4.8.97 to 4.8.98 (#7250) Bump bcprov-jdk15on from 1.67 to 1.68 (#7249) Bump xmlunit-matchers from 2.8.1 to 2.8.2 (#7252) Bump unirest-java from 3.11.06 to 3.11.09 (#7254) Bump org.beryx.jlink from 2.23.0 to 2.23.1 (#7253) Bump pascalgn/automerge-action from v0.12.0 to v0.13.0 (#7255) # Conflicts: # src/main/java/org/jabref/gui/openoffice/OpenOfficePanel.java
Fixes #6942. Fixes #7199.
TL;DR
Authors only having a last name are abbreviated. #6942 is miss-abbreviated because name parts containing
uni
are assumed to be universities, e.g.,United Airlines
. #7199 is triggers the abbreviation method because, from BibTeX´s point-of-view, the author only has a last name. The name is later removed because Java does not classify the first character as an uppercase letter, hence the heuristic assumes it is an insignificant word. E.g.,{eBay}
gets removed,{JabRef}
doesn't.Background
Authors that are institutions can have very lengthy citation keys unless they are abbreviated in some fashion. The abbreviations are generated in
jabref/src/main/java/org/jabref/logic/citationkeypattern/BracketedPattern.java
Line 1243 in 6718c08
which is based on a heuristic that tries to determine a suitable abbreviation based on the content of the specific author.
Issue #6942 is due to a regexp being overly broad when determining if an author is a university.
Issue #7199 was created from #6706 where I changed how a company/institute is identified. In previous versions of JabRef, an author is an institute if it is enclosed in curly brackets (e.g.,
{JabRef}
), and I changed it to also include authors' who only have a last name (e.g.,JabRef
), which means that the author in issue #7199 is treated as an institution.This would not have been an issue if it wasn't because the heuristic for creating a citation key identifies words starting with a lowercase letter in the author's name as an invalid part of an institute abbreviation. Think of the "and" in "National Aeronautics and Space Administration"). Since the author's name only consist of one "word", it is removed. (this is why the workaround @tobiasdiez mentioned does not work, even if it should)
Changes for
#6942
^uni(v|b|$).*
instead of^uni
. There are other suggestions in Readability for citation key patterns #6706 that can be implemented instead.{JabRef ({JB})}
should produceJB
)shortauthor
field, because I find it unclear how and when to use the field (e.g., for multiple authors).#7199
Author
must only have a last name and that last name must contain a space. This should only happen if theAuthor
originally were enclosed in brackets. In all other cases I'd argue it is acceptable to not abbreviate the author since the name is only one word.Todo
Author
s