Create new fetcher infrastructure #1594

tobiasdiez · 2016-07-16T15:10:11Z

The aim of this PR is to refactor the fetcher interface, in particular separate logic and GUI code as well as implement #383 and #550.

I propose the following class hierarchy for fetchers:

WebFetcher: search web resources for matching BibEntries
- SearchBasedFetcher: based on a free-text query (may return multiple results)
- IdBasedFetcher: based on an ID (single item as result)
IdFetcher: looks for an ID (arXiv, DOI,...) based on an existing BibEntry
FulltextFetcher: find fulltext of an existing BibEntry (i.e. rebranding of FullTextFinder)

Change in CHANGELOG.md described
Tests created for changes
Screenshots added (for bigger UI changes)

oscargus · 2016-07-17T18:46:12Z

Are these classes logic or gui? What is the complete work flow? ImportInspector? Merging? (Adding empty fields may not be enough.)

tobiasdiez · 2016-07-17T19:22:30Z

They are logic and try to extract the non-gui part from the current EntryFetcher. Thus the workflow is not changed (at least not with this PR).

matthiasgeiger · 2016-07-18T08:51:43Z

Is generally fine for me.

One remark: There are currently two different workflows for the "SearchBasedFetchers".
Normally one hits the "Search" button after inserting some search query and a list of found entries is presented, after selecting the desired entries they are imported.
However, at least for the GoogleScholar fetcher there is another dialog presenting more details to "preselect" the entries which should be shown in the list of found entries (I think the intention is to reduce the amount of queries).

Is the proposed "shallow" search method used in this case? Or is this difference in the workflow currently not really covered, yet?

koppor · 2016-07-18T16:52:08Z

I think the workflow is supported by the two methods performShallowSearch and performDeepSearch. I assume that the dialog is presented if performDeepSearch is present. I think that one needs an additional class having only performShallowSearch and a second one containing both methods.

My concern is that the intermediate results might not be available as bibtex entries, but as HTML only. I think, the ImportInspectionDialog was used for that, wasn't it.

What about error messages? I assume the aim is to use LOGGER only and to use the error console? I used to like the GUI dialogs showing errors, but OK for me to remove them to improve the code. (Until now, this was done by the OutputPrinter status, wasn't

@zellerdev will try to port GVKFetcher and the ISBNToBibTeX fetcher to the new infrastructure. He will do a PR on your repo, is that OK, @tobiasdiez?

Siedlerchr · 2016-07-18T18:33:23Z

For the FulltextFetcher it would be awesome if the FullText finding could be run as part of a Cleanup operation, e.g. globally finding fulltext for all entries.
This should be kept in mind

tobiasdiez · 2016-07-18T20:33:34Z

Thanks for the feedback.
@matthiasgeiger, @koppor yes I thought about covering the two-stage search by the deep and shallow methods. Maybe this is not the best idea and one needs a different approach for the Google search. Thus for now I would concentrate on the simple fetchers which get all the metadata in one query.

@koppor The result from the server is of course HTML (or XML, JSON,...). It is the task of the fetcher to translate the response to something meaningful, i.e. to BibEntries. I expanded the arXiv fetcher to outline how this works (in fact the arXiv fetcher now understands free text queries and not just ids). I also included a way to notify the caller about errors...by, well, throwing exceptions (with meaningful error messages). Maybe the fetchers also need to post some events to let the GUI know about the progress.

@Siedlerchr Good idea. However a bit out of the scope of this PR. Could you file a new issue for it? Thanks!

So the plan for this PR is now to use the new interfaces parallel to the old code (so that I don't have to rewrite every fetcher).

tobiasdiez · 2016-07-19T19:35:53Z

So PR is ready for review.
The new fetcher can now be included via a wrapper. In this way the fetcher can be transformed to the new interface step-by-step without the need to refactor everything at once.

If I get the ok for merge, then I (or @zellerdev) will move HelpFiles to logic and thereby fix the failing architecture test.

oscargus · 2016-07-20T15:00:44Z

Many of the current fetchers will only fetch a limited number of entries. Typically based on user input. How is that going to be handled here?

oscargus · 2016-07-20T15:05:01Z

src/main/java/net/sf/jabref/logic/fulltext/SearchBasedEntryFetcher.java

+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+package net.sf.jabref.logic.fulltext;


stefan-kolb · 2016-07-20T20:52:19Z

Whereas I think the interface redesign is a good thing, I'd like to throw a few things for discussion.

The interface hierarchy is too complex in my opinion why not just use:
- WebFetcher: Has two methods for SearchBasedFetcher & IdBasedFetcher
- IdFetcher
- FulltextFetcher

The web fetcher hierarchy seems artificial to me. Why do we ultimately need one method interfaces?

I really don't like the effect that we merge all fetchers into one class, e.g. your ArxiV.java class.
Some fetcher are complex enough on their own so why create new 1000 lines classes being capable of searching, id fetching and full text retrieval. Let us just use 3 classes for the three types of fetching.

stefan-kolb · 2016-07-20T20:58:20Z

On another note: I don't think that HelpFile.java belongs into the logic package.

tobiasdiez · 2016-07-20T21:49:47Z

@oscargus Right now every parameter has to be passed to the fetcher via the constructor. Maybe it would make sense to include the number of results as a method parameter. Right now I don't have a feeling on how many fetcher actually support this (or if a common fixed value of say 30 would also work).

@stefan-kolb The idea to distinct between ID-based and text-based fetchers came from #383 and #550. Right now the interfaces only have one method but this might change in the future (for example, a method to check validity of a given ID would make sense for the ID-based fetcher).
If a fetcher is complex enough, then one of course still decompose the functionality in different classes. For the arXiv fetcher it felt like all the functionality is essentially given by one method (callApi) and the other methods are just different wrappers around it.
Do you have a good idea how to handle for the help files in fetchers (logic) if HelpFiles stays in gui?

stefan-kolb · 2016-07-20T22:04:44Z

I still don't get why these two types of fetcher methods cannot live in one interface.
Maybe this is more of a problem of our implementations than of the architecture abstraction?
I haven't checked the usage of the HelpFiles yet, tho I think this has to be managed by the GUI somehow.

Siedlerchr · 2016-07-21T08:10:25Z

The helpfiles could be under the logic,, as they do not contain any gui related methods or any other related thing.
They just serve as a kind of Container class, they simply store the name of the HelpPage-Markdown file in an Enum.
The assembling of the url and opening is in the gui in HelpAction.

oscargus · 2016-07-21T12:26:28Z

I think a two-stage search is preferred in many situations. All is not really an option and sometimes one would really like to go for more than 100 (which is a more reasonable number). Personally, I often use JabRef as a search engine to find interesting papers (or rather, I search for a specific paper and then find related ones so it makes sense to not have a too narrow query, also some publishers has a logic or on the search terms so you get more hits the more terms you add). Just try searching Springer or DOAJ for some common term, "efficient" or "Smith" or something. I think this can be solved though by adding another call so query -> number of hits -> number to fetch -> list of entries.

matthiasgeiger · 2016-07-21T12:44:27Z

Thanks Oscar for bringing up this aspect.

I think the main reason for introducing this step was to reduce the number of requests fired to the different providers as some of them are rather quick with blocking IPs crawling to much information in a short time.
Thus the number of matches to be shown is not only dependent on the intention of the user (find a single paper vs. search for all possible papers matching the search) but also might have technical reasons...
However, personally I don't like the workflow "Enter term" > "Start search" > "Determine number of hits and ask user how much hits should be displayed" > "List chosen number of hits"
I think "Enter term" > "Start search" >"List first X hits" + "Label: Showing first X of Y hits" + Button "get next X hits" would be better... but this would require some more methods in the Interfaces to start fetching from/to a given offset...

stefan-kolb · 2016-07-22T08:17:42Z

@simonharrer and @matthiasgeiger have persuaded me that I'm dumb, please merge as intended. Maybe the naming can be improved a little bit but this is NP-hard.

oscargus · 2016-07-22T16:53:46Z

OK, so how should blocking be avoided? I agree that @matthiasgeiger's approach is preferred, but there must be some support for fetching an arbitrary number of items.

* master: Added LayoutFormatterPreferences (and related files) (JabRef#1608) [WIP] Create new fetcher infrastructure (JabRef#1594) Set user agent to fix 403 status error More fields added to FieldName (JabRef#1614)

* master: (268 commits) Added DateFormatter to LayoutEntry so that it actually works... (JabRef#1619) Implemented JabRef#1345 - cleanup ISSN (JabRef#1590) Always use https for help files (JabRef#1615) Resolves JabRef#1613 Use Jabref default icon for uninstaller for now (JabRef#1616) Added more fields and fixed some issues (JabRef#1617) Added LayoutFormatterPreferences (and related files) (JabRef#1608) [WIP] Create new fetcher infrastructure (JabRef#1594) Set user agent to fix 403 status error More fields added to FieldName (JabRef#1614) Added model.entry.FieldName that contains field name constants (JabRef#1602) Fixes imports Test CustomImporter (JabRef#1501) The field list gets the focus as soon as it is focused (JabRef#1541) When clicking on a tab, the first field now has the focus (JabRef#988) Add test in BibEntryWriterTest for type change Rewrite MedlineImporter with JAXB and add nbib fields (JabRef#1479) Some Globals.prefs injection in logic and model (JabRef#1593) Added filter to not show selected integrity checks (JabRef#1588) Replace getField with getFieldOptional in all of the tests and in som… (JabRef#1591) Move preferences (JabRef#1604) ... # Conflicts: # src/main/java/net/sf/jabref/importer/ImportMenuItem.java # src/main/java/net/sf/jabref/importer/OpenDatabaseAction.java

@comment

Move event (JabRef#1601) * Move event package to model Update dependencies: postgres 9.4.1208 -> 9.4.1209 and wiremock from 2.1.6 to 2.1.7 Added ISBN integrity checker (JabRef#1586) Added ISBN integrity checker * Extracted ISBN class Reenable errorprone (see http://errorprone.info/) Extend the OpenConsoleFeature (JabRef#1582) * Extend the OpenConsoleFeature by selection of custom terminal emulator. - Add radio selection to the AdvancedTab - Add new JabRefPreferences - Add file check and execution commands - Add localization keys * Fix localization key. * Move console selection to ExternalTab.java * Change localization entry. * Add command executor. * Fix placeholder replacement. * Fix codacy. * Update localization key. * Remove "Specify terminal emulator" option. Add GUI outputs. * Set default command for Windows. Fix localization entries. * Remove empty lines in language files. * Use lambda expressions insead of ActionListeners * Refactoring. * Update CHANGELOG.md. * Small refactorings. Move preferences (JabRef#1604) * Move preferences-related classes into separate package * Rename JabRefPreferencesFilterDialog -> PreferencesFilterDialog and move it to gui * Fix checkstyle warning Set user agent to fix 403 status error Replace getField with getFieldOptional in all of the tests and in som… (JabRef#1591) * Replace getField with getFieldOptional in all of the tests and in some more code * Some more conversions Added filter to not show selected integrity checks (JabRef#1588) * Added filter to not show selected integrity checks * Removed unused variable Some Globals.prefs injection in logic and model (JabRef#1593) * Some Globals.prefs injection in logic and model * Some more conversions and some fixes * More injections * Even more injections * Yes, even more injections * Indeed, even more injections * Probably the last injections for now * Removed unrequired dependency and fixed issue * Dropped support for selecting sub/super to equations * Added preference classes for LatexFieldFormatter and FieldContentParser * Removed some left over code * Added JournalAbbreviationPreferences * Encapsulated LatexFieldFormatterPreferences in SavePreferences * Changed getShortDescription to accept boolean argument * Removed Globals.prefs from tests and removed unused imports * Unused import * Unused import Rewrite MedlineImporter with JAXB and add nbib fields (JabRef#1479) Add test in BibEntryWriterTest for type change When clicking on a tab, the first field now has the focus (JabRef#988) * the first Field does now have focus when clicking on a tab in the entry editor * Make first field focused when selecting a tab in entry editor The field list gets the focus as soon as it is focused (JabRef#1541) Test CustomImporter (JabRef#1501) * Test CustomImporter Fixes imports Added model.entry.FieldName that contains field name constants (JabRef#1602) * Added model.entry.FieldName that contains field name constants * More constants * Renamed and added more constants * Some more fields and cleanups * Removed MedlineHandler left from merge conflicts More fields added to FieldName (JabRef#1614) * More fields added to FieldName * Some Medline fixes [WIP] Create new fetcher infrastructure (JabRef#1594) * Introduce new Fetcher interfaces * Refactor arXiv fulltext fetcher * Add query based arXiv fetcher * Reformat code * Add a few tests for the arxiv parser * Make new arXiv fetcher available * Fix small problems related to files * Fix tests * Rename interface methods * Add changelog entry * Mark old EntryFetcher interface as deprecated * Move fetcher to importer \ fetcher * Move HelpFile from gui.help to logic.help * Rename fetchers * Rename FulltextFinder * Optimize imports * Fix failing test * Ignore failing test Added LayoutFormatterPreferences (and related files) (JabRef#1608) * Added LayoutFormatterPreferences (and related files) * Rebased * Included JournalAbbreviationLoader in LayoutPreferences Added more fields and fixed some issues (JabRef#1617) Resolves JabRef#1613 Use Jabref default icon for uninstaller for now (JabRef#1616) Always use https for help files (JabRef#1615) Implemented JabRef#1345 - cleanup ISSN (JabRef#1590) * Implemented JabRef#1345 - cleanup ISSN * Fixed comments * Extracted ISSN class * Added tests for ISSN and ISBN Added DateFormatter to LayoutEntry so that it actually works... (JabRef#1619) Converted a few getField to getFieldOptional (JabRef#1625) * Converted a few getField to getFieldOptional Fixed JabRef#636 by using DOICheck and DOIStrip in export filters (JabRef#1624) Improved LaTeX to Unicode/HTML formatters to output more sensible values for unknown commands (JabRef#1622) Updated preview entries (JabRef#1606) * Updated preview entries, which return new entry Moved, removed, and used String constants (JabRef#1618) * Moved, removed, and used String constants * Some more fixes * Moved NEWLINE, made FILE_SEPARATOR public and used it * Moved NEWLINE and FILE_SEPARATOR to OS * Moved ENCODING_PREFIX and SIGNATURE * Corrected Globals in a few comments... * Apparently the localization tests find commented out text... More field names and a method (JabRef#1627) * Introduced FieldName in ArXiV * Some more field names * More field names Cleanup FindFile and asssociated tests (JabRef#1596) * Cleanup FindFile and rework it using Streams and nio methods- * Unignore test for trying on CI * Use explicit List and Set in findFiles and caller methods * Use Lazy Stream to find files changes should be tested manually Some enhancements and cleanups related to dates (JabRef#1575) * Some enhancements and cleanups related to dates * Fixed some time zone issues * Replaced SimpleDateFormat in ZipFileChooser and replaced arrays with Lists * Changed EasyDateFormat constructors * Fixed stupid mistake * Added CHANGELOG entry * Maybe tests are passing now? * Some server side print debugging... * As it should be * Tryng LocalDateTime * No time zone * Added test for Cookie * Fixed imports... * Added a third possible date format as it turns out that the server changed while developing this PR Builds are now stored via build-upload.jabref.org Consistent file name casing (and other localization improvements) (JabRef#1629) * AUX files * ZIP files * BIB files * JAR files * didn't * Couldn't what's * Consistent casing * AUX apparently is commonly used in French words... * Fixed the flawed quick-and-dirty find-and-replace failures define xjc input/ouput dir (subsequent builds will be faster) (JabRef#1628) Execute task only when input/output dir changed. Fixed a minor issue and refactored MergeEntries (JabRef#1634) * Fixed a minor issue and refactored MergeEntries * Fixed import * Added CHANGELOG entry Added LabelPatternPreferences (JabRef#1607) * Added LabelPatternPreferences * Removed static initializer More tests (JabRef#1635) * Added more tests for Cookie * Enabled some layout tests and added test for StringUtil.intValueOfWithNull * Updated a test * Split tests Updated Errorprone to 2.0.11 (JabRef#1636) * Updated Errorprone to 2.0.11 * Corrected test Keep @comment text in a bib file (JabRef#1638) * Kep @comment text in a bib file * Add test for @comment that contains regular entries Replaced some getField and fixed some bugs (JabRef#1631) * Replaced some getField and fixed some bugs * Fixed a few things * Added CHANGELOG entries * Improved equals implementation * Text book equals and hashCode Fixed JabRef#1639 (JabRef#1641) * Fixed JabRef#1639 * Removed old code Export OO/LO citations to new database (JabRef#1630) * Export OO/LO citations to new database * Fixed problem with duplicates * Added some comments * Fixed spelling in comment * Removed general Exception Unified some equals (JabRef#1640) * Unified some equals * Imported correct Objects... Fixed one more NPE which should have been fixed in JabRef#1631 (JabRef#1649) Finished method to hide visible fields and show hidden fields - Hide method done - Show method done - ToDo repaint hidden field - ToDo test class finished field repaint remove sysouts

tobiasdiez added status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers type: code-quality Issues related to code or architecture decisions labels Jul 16, 2016

tobiasdiez force-pushed the fetcherRefactoring branch from c73592c to 8724591 Compare July 18, 2016 20:21

oscargus reviewed Jul 20, 2016
View reviewed changes

tobiasdiez added 8 commits July 22, 2016 16:34

Introduce new Fetcher interfaces

05a2c28

Refactor arXiv fulltext fetcher

b23d70e

Add query based arXiv fetcher

e39adb3

Reformat code

47bee56

Add a few tests for the arxiv parser

f0233b6

Make new arXiv fetcher available

69e395c

Fix small problems related to files

6c16694

Fix tests

915b9e1

tobiasdiez added 3 commits July 22, 2016 16:35

Rename interface methods

c8f3d22

Add changelog entry

e189517

Mark old EntryFetcher interface as deprecated

8bf2c9b

tobiasdiez force-pushed the fetcherRefactoring branch from d414b06 to 8bf2c9b Compare July 22, 2016 14:36

tobiasdiez added 7 commits July 22, 2016 16:51

Move fetcher to importer \ fetcher

00e64f5

Move HelpFile from gui.help to logic.help

0bc2d2e

Rename fetchers

b065322

Rename FulltextFinder

2ea18a5

Optimize imports

2000c7c

Fix failing test

b4fc648

Ignore failing test

5879aa9

tobiasdiez merged commit c6aa7da into JabRef:master Jul 22, 2016

tobiasdiez deleted the fetcherRefactoring branch July 22, 2016 16:15

zesaro mentioned this pull request Aug 1, 2016

ISBN Fetcher using the new fetcher infrastructure #1654

Merged

3 tasks

zesaro mentioned this pull request Aug 1, 2016

GvkFetcher #1656

Merged

3 tasks

koppor changed the title ~~[WIP] Create new fetcher infrastructure~~ Create new fetcher infrastructure Aug 19, 2016

koppor removed the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Aug 19, 2016

zesaro mentioned this pull request Aug 29, 2016

DOI Fetcher using the new fetcher infrastructure #1885

Merged

2 tasks

zesaro mentioned this pull request Sep 5, 2016

ADS Fetcher using the new fetcher infrastructure #1923

Merged

2 tasks

koppor mentioned this pull request Apr 15, 2024

Add EndNote XML Exporter + Rehaul Importer #11157

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create new fetcher infrastructure #1594

Create new fetcher infrastructure #1594

tobiasdiez commented Jul 16, 2016

oscargus commented Jul 17, 2016

tobiasdiez commented Jul 17, 2016

matthiasgeiger commented Jul 18, 2016

koppor commented Jul 18, 2016

Siedlerchr commented Jul 18, 2016

tobiasdiez commented Jul 18, 2016 •

edited

Loading

tobiasdiez commented Jul 19, 2016

oscargus commented Jul 20, 2016

oscargus Jul 20, 2016

stefan-kolb commented Jul 20, 2016 •

edited

Loading

stefan-kolb commented Jul 20, 2016

tobiasdiez commented Jul 20, 2016

stefan-kolb commented Jul 20, 2016 •

edited

Loading

Siedlerchr commented Jul 21, 2016

oscargus commented Jul 21, 2016 via email

matthiasgeiger commented Jul 21, 2016

stefan-kolb commented Jul 22, 2016

oscargus commented Jul 22, 2016

Create new fetcher infrastructure #1594

Create new fetcher infrastructure #1594

Conversation

tobiasdiez commented Jul 16, 2016

oscargus commented Jul 17, 2016

tobiasdiez commented Jul 17, 2016

matthiasgeiger commented Jul 18, 2016

koppor commented Jul 18, 2016

Siedlerchr commented Jul 18, 2016

tobiasdiez commented Jul 18, 2016 • edited Loading

tobiasdiez commented Jul 19, 2016

oscargus commented Jul 20, 2016

oscargus Jul 20, 2016

Choose a reason for hiding this comment

stefan-kolb commented Jul 20, 2016 • edited Loading

stefan-kolb commented Jul 20, 2016

tobiasdiez commented Jul 20, 2016

stefan-kolb commented Jul 20, 2016 • edited Loading

Siedlerchr commented Jul 21, 2016

oscargus commented Jul 21, 2016 via email

matthiasgeiger commented Jul 21, 2016

stefan-kolb commented Jul 22, 2016

oscargus commented Jul 22, 2016

tobiasdiez commented Jul 18, 2016 •

edited

Loading

stefan-kolb commented Jul 20, 2016 •

edited

Loading

stefan-kolb commented Jul 20, 2016 •

edited

Loading