Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logic for parsing references from last page of PDF #11156

Merged
merged 19 commits into from
Apr 8, 2024
Merged

Conversation

koppor
Copy link
Member

@koppor koppor commented Apr 7, 2024

A scientific paper has a "References" section. Especially when reviewing papers, it would be nice if all references from there would appear parsed within JabRef. This PR implements that. Thus, this PR implements #10200 via offline parsing (no online services used!), follow-up to #10437.

How to use:

Pre Condition

Steps

  1. Create an entry in JabRef
  2. Attach the PDF to JabRef
  3. Open the context menu
  4. Select "Extract references"
    image
  5. A dialog for importing is shown.
  6. Select "Select all entries" and then "Import entries"
    image

Status

  • Functionality implemented. The UI should show "online" and "offline" more transparently. This is the current work I am implementing.
  • Works for IEEE papers. This functionality will be used for 1.000+ papers in this field, thus, it is "OK for now". If other reviewers (e.g., for Springer papers) will raise their voice, we can refine the parser.

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

…FromPdfImporter)

- Support more date formats
- Increase log level for issues for date parsing
Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

for (BibEntry importedEntry : result.getDatabase().getEntries()) {
count++;
Optional<String> citationKey = importedEntry.getCitationKey();
if (citationKey.isPresent()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

citationKey.map(cites:add).orElseGet( () ->

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if new code is more readable --> "orElseGet" result needs to be added to the list, too. Uses outer variable "count", which is non final. I needed to wrap in anonymous object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then better use the original code

// Y. Shimosaki et al., “Lattice design for 5 MeV – 125 mA CW RFQ operation in LIPAc”, in Proc. IPAC’19, Mel- bourne, Australia, May 2019, pp. 977-979. doi:10.18429/ JACoW-IPAC2019-MOPTS051
int pos = reference.indexOf("doi:");
if (pos >= 0) {
String doi = reference.substring(pos + 4).trim();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that this are always 4 characters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure that the constant string "doi:" alwas has 4 characters. But in a parallel universe this might change. Thus, I will change to "doi:'.length() later

@Siedlerchr
Copy link
Member

You should resolve the conflicts in changelog so that the tests are running

@koppor koppor changed the title [WIP] Add logic for parsing references from last page of PDF Add logic for parsing references from last page of PDF Apr 7, 2024
@koppor koppor marked this pull request as ready for review April 7, 2024 11:31
@koppor koppor enabled auto-merge April 7, 2024 11:31
@koppor koppor mentioned this pull request Apr 7, 2024
6 tasks
This reverts commit 7adb334.
@koppor koppor added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Apr 7, 2024
Copy link
Contributor

github-actions bot commented Apr 8, 2024

The build for this PR is no longer available. Please visit https://builds.jabref.org/main/ for the latest build.

@koppor koppor added this pull request to the merge queue Apr 8, 2024
Merged via the queue into main with commit a0080ba Apr 8, 2024
21 checks passed
@koppor koppor deleted the parse-from-pdf branch April 8, 2024 09:05
@koppor koppor mentioned this pull request Jul 21, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants