-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fulltext fetcher for IACR eprints #9651
Conversation
The failing fetcher tests seem unrelated to my changes. Will take a closer look later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codewise lgtm! Thanks for the addition.
if (urlField.isPresent()) { | ||
String descriptiveHtml = getHtml(urlField.get()); | ||
String startOfFulltextLink = "<a class=\"btn btn-sm btn-outline-dark\""; | ||
String fulltextLinkAsInHtml = getRequiredValueBetween(startOfFulltextLink, ".pdf", descriptiveHtml); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, one could use JSOUP here https://jsoup.org/ (see e.g. SemanticScholar) to operate directly on the HTML dom elements, but as this getValueBeteween is already used here it's fine
@SECtim you can ignore the unrelated fetcher tests. Sometimes they fail just on GitHub if they are executed to often or Github IPs are blocked |
Just did a small simplification of the tests, it's easier to directly compare the Optionals |
…rg.mariadb.jdbc-mariadb-java-client-3.1.0 * upstream/main: (357 commits) Fix syntax Add experimental Fetcher for Bibliotheksverbund Bayern with MarcXML parser (#9641) Update guidelines-for-setting-up-a-local-workspace.md Update guidelines-for-setting-up-a-local-workspace.md Bump org.tinylog:slf4j-tinylog from 2.6.0 to 2.6.1 (#9665) Bump apple-actions/import-codesign-certs from 1 to 2 (#9662) Bump com.puppycrawl.tools:checkstyle from 10.8.0 to 10.8.1 (#9661) Bump gittools/actions from 0.9.15 to 0.10.2 (#9663) Bump hmarr/auto-approve-action from 3.1.0 to 3.2.0 (#9664) Bump io.github.classgraph:classgraph from 4.8.156 to 4.8.157 (#9666) Bump org.tinylog:tinylog-api from 2.6.0 to 2.6.1 (#9667) Add option to open arks in the browser from an ark identifier (#9601) remove "jdk 19 does not work" (#9658) Fulltext fetcher for IACR eprints (#9651) Observable Preferences S (#9619) Issue 9646: Right-click context menu "Attach file from URL" (#9648) Improve the INSPIREFetcher in "Update with bibliographic information from the web" (#9645) Bump appleboy/ssh-action from 0.1.7 to 0.1.8 (#9653) Bump com.fasterxml.jackson.datatype:jackson-datatype-jsr310 (#9656) Bump com.puppycrawl.tools:checkstyle from 10.7.0 to 10.8.0 (#9655) ...
Adds fulltext fetching for IACR eprint PDFs.
My Java skills are a bit rusty, and while the code works, I'm happy to make additional changes if there are more elegant ways to do things.
I am unsure about whether to add anything to the developer docs on fetchers, as they seem to only cover a subset of all fetchers anyway (so far, the IACR fetcher isn't mentioned at all).
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)