Fixes Jabref#7660 Unable to download some arXiv links if the "eprint" field is missing #7663

JavuesZhang · 2021-04-23T12:57:56Z

Brief summary

Run EprintCleanup on a copy of the entry the ArXiv fetcher is fetching before getting arXiv id from the eprint field;
Add two test method. One finds full text with title containing colon and journal, while another finds full text with title containing colon and url.

Problem

When finding full text, this BibTeX reference works:

@Article{booth_bayes-trex_2020,
  author        = {Serena Booth and Yilun Zhou and Ankit Shah and Julie Shah},
  journal       = {arXiv:2002.10248v4 [cs]},
  title         = {Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example},
  year          = {2020},
  month         = dec,
  archiveprefix = {arXiv},
  eprint        = {2002.10248},
  url           = {http://arxiv.org/abs/2002.10248v4},
}

But when eprint field is missing, no full text will be found:

@Article{booth_bayes-trex_2020,
  author        = {Serena Booth and Yilun Zhou and Ankit Shah and Julie Shah},
  journal       = {arXiv:2002.10248v4 [cs]},
  title         = {Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example},
  year          = {2020},
  month         = dec,
  archiveprefix = {arXiv},
  url           = {http://arxiv.org/abs/2002.10248v4},
}

Solution

Since the title contains colon and arXiv uses colon to represent key and value, the title may be recognized mistakenly. So use other fields to get the eprint field to avoid this problem. Thanks to the advice from @tobiasdiez .

Screenshots

After fix:

Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

1. Run EprintCleanup on a copy of the entry the ArXiv fetcher is fetching before getting arXiv id from the eprint field; 2. Add two test method. One finds full text with title containing colon and journal, while another finds full text with title containing colon and url.

tobiasdiez

Wow that was quick 🚀 . Code looks good to me! Thanks for your contribution.

Siedlerchr · 2021-04-23T13:17:13Z

src/main/java/org/jabref/logic/importer/fetcher/ArXiv.java

@@ -116,6 +118,9 @@ public TrustLevel getTrustLevel() {
    }

    private List<ArXivEntry> searchForEntries(BibEntry entry) throws FetcherException {
+        entry = (BibEntry) entry.clone();


Why do you clone the entry?

If we do clean up on the original entry, its fields will be changed but I think we needn't change fields here because it is just a search.

JavuesZhang · 2021-04-23T13:18:23Z

Wow that was quick 🚀 . Code looks good to me! Thanks for your contribution.

It is your advice that makes me efficient! At the beginning I try to ignore colons in title to solve this problem but it seems not so good.

Siedlerchr

Thanks for the quick fix!

…om.tngtech.archunit-archunit-junit5-api-0.18.0 * upstream/main: Fix exception when searching (#7659) Fixes Jabref#7660 (#7663) Fix for issue 5850: Journal abbreviations in UTF-8 not recognized (#7639) Fix SSLHandshake Exception by using bypass (#7657) Fix for issue 7633: Unable to download arXiv pdfs if Title contains curly brackets (#7652) Fix#7195 partly Opacity of disabled icon-buttons

tobiasdiez approved these changes Apr 23, 2021

View reviewed changes

tobiasdiez added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Apr 23, 2021

Siedlerchr reviewed Apr 23, 2021

View reviewed changes

Siedlerchr approved these changes Apr 23, 2021

View reviewed changes

Siedlerchr merged commit f815050 into JabRef:main Apr 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes Jabref#7660 Unable to download some arXiv links if the "eprint" field is missing #7663

Fixes Jabref#7660 Unable to download some arXiv links if the "eprint" field is missing #7663

JavuesZhang commented Apr 23, 2021

tobiasdiez left a comment

Siedlerchr Apr 23, 2021

JavuesZhang Apr 23, 2021

JavuesZhang commented Apr 23, 2021

Siedlerchr left a comment

Fixes Jabref#7660 Unable to download some arXiv links if the "eprint" field is missing #7663

Fixes Jabref#7660 Unable to download some arXiv links if the "eprint" field is missing #7663

Conversation

JavuesZhang commented Apr 23, 2021

Brief summary

Problem

Solution

Screenshots

tobiasdiez left a comment

Choose a reason for hiding this comment

Siedlerchr Apr 23, 2021

Choose a reason for hiding this comment

JavuesZhang Apr 23, 2021

Choose a reason for hiding this comment

JavuesZhang commented Apr 23, 2021

Siedlerchr left a comment

Choose a reason for hiding this comment