-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create new fetcher infrastructure #1594
Conversation
Are these classes logic or gui? What is the complete work flow? ImportInspector? Merging? (Adding empty fields may not be enough.) |
They are logic and try to extract the non-gui part from the current |
Is generally fine for me. One remark: There are currently two different workflows for the "SearchBasedFetchers". Is the proposed "shallow" search method used in this case? Or is this difference in the workflow currently not really covered, yet? |
I think the workflow is supported by the two methods My concern is that the intermediate results might not be available as bibtex entries, but as HTML only. I think, the What about error messages? I assume the aim is to use @zellerdev will try to port GVKFetcher and the ISBNToBibTeX fetcher to the new infrastructure. He will do a PR on your repo, is that OK, @tobiasdiez? |
For the FulltextFetcher it would be awesome if the FullText finding could be run as part of a Cleanup operation, e.g. globally finding fulltext for all entries. |
c73592c
to
8724591
Compare
Thanks for the feedback. @koppor The result from the server is of course HTML (or XML, JSON,...). It is the task of the fetcher to translate the response to something meaningful, i.e. to @Siedlerchr Good idea. However a bit out of the scope of this PR. Could you file a new issue for it? Thanks! So the plan for this PR is now to use the new interfaces parallel to the old code (so that I don't have to rewrite every fetcher). |
So PR is ready for review. If I get the ok for merge, then I (or @zellerdev) will move |
Many of the current fetchers will only fetch a limited number of entries. Typically based on user input. How is that going to be handled here? |
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. | ||
*/ | ||
|
||
package net.sf.jabref.logic.fulltext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gui?
Whereas I think the interface redesign is a good thing, I'd like to throw a few things for discussion.
The web fetcher hierarchy seems artificial to me. Why do we ultimately need one method interfaces?
|
On another note: I don't think that |
@oscargus Right now every parameter has to be passed to the fetcher via the constructor. Maybe it would make sense to include the number of results as a method parameter. Right now I don't have a feeling on how many fetcher actually support this (or if a common fixed value of say 30 would also work). @stefan-kolb The idea to distinct between ID-based and text-based fetchers came from #383 and #550. Right now the interfaces only have one method but this might change in the future (for example, a method to check validity of a given ID would make sense for the ID-based fetcher). |
I still don't get why these two types of fetcher methods cannot live in one interface. |
The helpfiles could be under the logic,, as they do not contain any gui related methods or any other related thing. |
I think a two-stage search is preferred in many situations. All is not
really an option and sometimes one would really like to go for more than
100 (which is a more reasonable number). Personally, I often use JabRef as
a search engine to find interesting papers (or rather, I search for a
specific paper and then find related ones so it makes sense to not have a
too narrow query, also some publishers has a logic or on the search terms
so you get more hits the more terms you add).
Just try searching Springer or DOAJ for some common term, "efficient" or
"Smith" or something.
I think this can be solved though by adding another call so query -> number
of hits -> number to fetch -> list of entries.
|
Thanks Oscar for bringing up this aspect. I think the main reason for introducing this step was to reduce the number of requests fired to the different providers as some of them are rather quick with blocking IPs crawling to much information in a short time. |
@simonharrer and @matthiasgeiger have persuaded me that I'm dumb, please merge as intended. Maybe the naming can be improved a little bit but this is NP-hard. |
d414b06
to
8bf2c9b
Compare
OK, so how should blocking be avoided? I agree that @matthiasgeiger's approach is preferred, but there must be some support for fetching an arbitrary number of items. |
* master: Added LayoutFormatterPreferences (and related files) (JabRef#1608) [WIP] Create new fetcher infrastructure (JabRef#1594) Set user agent to fix 403 status error More fields added to FieldName (JabRef#1614)
* master: (268 commits) Added DateFormatter to LayoutEntry so that it actually works... (JabRef#1619) Implemented JabRef#1345 - cleanup ISSN (JabRef#1590) Always use https for help files (JabRef#1615) Resolves JabRef#1613 Use Jabref default icon for uninstaller for now (JabRef#1616) Added more fields and fixed some issues (JabRef#1617) Added LayoutFormatterPreferences (and related files) (JabRef#1608) [WIP] Create new fetcher infrastructure (JabRef#1594) Set user agent to fix 403 status error More fields added to FieldName (JabRef#1614) Added model.entry.FieldName that contains field name constants (JabRef#1602) Fixes imports Test CustomImporter (JabRef#1501) The field list gets the focus as soon as it is focused (JabRef#1541) When clicking on a tab, the first field now has the focus (JabRef#988) Add test in BibEntryWriterTest for type change Rewrite MedlineImporter with JAXB and add nbib fields (JabRef#1479) Some Globals.prefs injection in logic and model (JabRef#1593) Added filter to not show selected integrity checks (JabRef#1588) Replace getField with getFieldOptional in all of the tests and in som… (JabRef#1591) Move preferences (JabRef#1604) ... # Conflicts: # src/main/java/net/sf/jabref/importer/ImportMenuItem.java # src/main/java/net/sf/jabref/importer/OpenDatabaseAction.java
Move event (JabRef#1601) * Move event package to model Update dependencies: postgres 9.4.1208 -> 9.4.1209 and wiremock from 2.1.6 to 2.1.7 Added ISBN integrity checker (JabRef#1586) Added ISBN integrity checker * Extracted ISBN class Reenable errorprone (see http://errorprone.info/) Extend the OpenConsoleFeature (JabRef#1582) * Extend the OpenConsoleFeature by selection of custom terminal emulator. - Add radio selection to the AdvancedTab - Add new JabRefPreferences - Add file check and execution commands - Add localization keys * Fix localization key. * Move console selection to ExternalTab.java * Change localization entry. * Add command executor. * Fix placeholder replacement. * Fix codacy. * Update localization key. * Remove "Specify terminal emulator" option. Add GUI outputs. * Set default command for Windows. Fix localization entries. * Remove empty lines in language files. * Use lambda expressions insead of ActionListeners * Refactoring. * Update CHANGELOG.md. * Small refactorings. Move preferences (JabRef#1604) * Move preferences-related classes into separate package * Rename JabRefPreferencesFilterDialog -> PreferencesFilterDialog and move it to gui * Fix checkstyle warning Set user agent to fix 403 status error Replace getField with getFieldOptional in all of the tests and in som… (JabRef#1591) * Replace getField with getFieldOptional in all of the tests and in some more code * Some more conversions Added filter to not show selected integrity checks (JabRef#1588) * Added filter to not show selected integrity checks * Removed unused variable Some Globals.prefs injection in logic and model (JabRef#1593) * Some Globals.prefs injection in logic and model * Some more conversions and some fixes * More injections * Even more injections * Yes, even more injections * Indeed, even more injections * Probably the last injections for now * Removed unrequired dependency and fixed issue * Dropped support for selecting sub/super to equations * Added preference classes for LatexFieldFormatter and FieldContentParser * Removed some left over code * Added JournalAbbreviationPreferences * Encapsulated LatexFieldFormatterPreferences in SavePreferences * Changed getShortDescription to accept boolean argument * Removed Globals.prefs from tests and removed unused imports * Unused import * Unused import Rewrite MedlineImporter with JAXB and add nbib fields (JabRef#1479) Add test in BibEntryWriterTest for type change When clicking on a tab, the first field now has the focus (JabRef#988) * the first Field does now have focus when clicking on a tab in the entry editor * Make first field focused when selecting a tab in entry editor The field list gets the focus as soon as it is focused (JabRef#1541) Test CustomImporter (JabRef#1501) * Test CustomImporter Fixes imports Added model.entry.FieldName that contains field name constants (JabRef#1602) * Added model.entry.FieldName that contains field name constants * More constants * Renamed and added more constants * Some more fields and cleanups * Removed MedlineHandler left from merge conflicts More fields added to FieldName (JabRef#1614) * More fields added to FieldName * Some Medline fixes [WIP] Create new fetcher infrastructure (JabRef#1594) * Introduce new Fetcher interfaces * Refactor arXiv fulltext fetcher * Add query based arXiv fetcher * Reformat code * Add a few tests for the arxiv parser * Make new arXiv fetcher available * Fix small problems related to files * Fix tests * Rename interface methods * Add changelog entry * Mark old EntryFetcher interface as deprecated * Move fetcher to importer \ fetcher * Move HelpFile from gui.help to logic.help * Rename fetchers * Rename FulltextFinder * Optimize imports * Fix failing test * Ignore failing test Added LayoutFormatterPreferences (and related files) (JabRef#1608) * Added LayoutFormatterPreferences (and related files) * Rebased * Included JournalAbbreviationLoader in LayoutPreferences Added more fields and fixed some issues (JabRef#1617) Resolves JabRef#1613 Use Jabref default icon for uninstaller for now (JabRef#1616) Always use https for help files (JabRef#1615) Implemented JabRef#1345 - cleanup ISSN (JabRef#1590) * Implemented JabRef#1345 - cleanup ISSN * Fixed comments * Extracted ISSN class * Added tests for ISSN and ISBN Added DateFormatter to LayoutEntry so that it actually works... (JabRef#1619) Converted a few getField to getFieldOptional (JabRef#1625) * Converted a few getField to getFieldOptional Fixed JabRef#636 by using DOICheck and DOIStrip in export filters (JabRef#1624) Improved LaTeX to Unicode/HTML formatters to output more sensible values for unknown commands (JabRef#1622) Updated preview entries (JabRef#1606) * Updated preview entries, which return new entry Moved, removed, and used String constants (JabRef#1618) * Moved, removed, and used String constants * Some more fixes * Moved NEWLINE, made FILE_SEPARATOR public and used it * Moved NEWLINE and FILE_SEPARATOR to OS * Moved ENCODING_PREFIX and SIGNATURE * Corrected Globals in a few comments... * Apparently the localization tests find commented out text... More field names and a method (JabRef#1627) * Introduced FieldName in ArXiV * Some more field names * More field names Cleanup FindFile and asssociated tests (JabRef#1596) * Cleanup FindFile and rework it using Streams and nio methods- * Unignore test for trying on CI * Use explicit List and Set in findFiles and caller methods * Use Lazy Stream to find files changes should be tested manually Some enhancements and cleanups related to dates (JabRef#1575) * Some enhancements and cleanups related to dates * Fixed some time zone issues * Replaced SimpleDateFormat in ZipFileChooser and replaced arrays with Lists * Changed EasyDateFormat constructors * Fixed stupid mistake * Added CHANGELOG entry * Maybe tests are passing now? * Some server side print debugging... * As it should be * Tryng LocalDateTime * No time zone * Added test for Cookie * Fixed imports... * Added a third possible date format as it turns out that the server changed while developing this PR Builds are now stored via build-upload.jabref.org Consistent file name casing (and other localization improvements) (JabRef#1629) * AUX files * ZIP files * BIB files * JAR files * didn't * Couldn't what's * Consistent casing * AUX apparently is commonly used in French words... * Fixed the flawed quick-and-dirty find-and-replace failures define xjc input/ouput dir (subsequent builds will be faster) (JabRef#1628) Execute task only when input/output dir changed. Fixed a minor issue and refactored MergeEntries (JabRef#1634) * Fixed a minor issue and refactored MergeEntries * Fixed import * Added CHANGELOG entry Added LabelPatternPreferences (JabRef#1607) * Added LabelPatternPreferences * Removed static initializer More tests (JabRef#1635) * Added more tests for Cookie * Enabled some layout tests and added test for StringUtil.intValueOfWithNull * Updated a test * Split tests Updated Errorprone to 2.0.11 (JabRef#1636) * Updated Errorprone to 2.0.11 * Corrected test Keep @comment text in a bib file (JabRef#1638) * Kep @comment text in a bib file * Add test for @comment that contains regular entries Replaced some getField and fixed some bugs (JabRef#1631) * Replaced some getField and fixed some bugs * Fixed a few things * Added CHANGELOG entries * Improved equals implementation * Text book equals and hashCode Fixed JabRef#1639 (JabRef#1641) * Fixed JabRef#1639 * Removed old code Export OO/LO citations to new database (JabRef#1630) * Export OO/LO citations to new database * Fixed problem with duplicates * Added some comments * Fixed spelling in comment * Removed general Exception Unified some equals (JabRef#1640) * Unified some equals * Imported correct Objects... Fixed one more NPE which should have been fixed in JabRef#1631 (JabRef#1649) Finished method to hide visible fields and show hidden fields - Hide method done - Show method done - ToDo repaint hidden field - ToDo test class finished field repaint remove sysouts
The aim of this PR is to refactor the fetcher interface, in particular separate logic and GUI code as well as implement #383 and #550.
I propose the following class hierarchy for fetchers: