Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new fetcher infrastructure #1594

Merged
merged 18 commits into from
Jul 22, 2016

Conversation

tobiasdiez
Copy link
Member

The aim of this PR is to refactor the fetcher interface, in particular separate logic and GUI code as well as implement #383 and #550.

I propose the following class hierarchy for fetchers:

  • WebFetcher: search web resources for matching BibEntries
    • SearchBasedFetcher: based on a free-text query (may return multiple results)
    • IdBasedFetcher: based on an ID (single item as result)
  • IdFetcher: looks for an ID (arXiv, DOI,...) based on an existing BibEntry
  • FulltextFetcher: find fulltext of an existing BibEntry (i.e. rebranding of FullTextFinder)

  • Change in CHANGELOG.md described
  • Tests created for changes
  • Screenshots added (for bigger UI changes)

@tobiasdiez tobiasdiez added status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers type: code-quality Issues related to code or architecture decisions labels Jul 16, 2016
@oscargus
Copy link
Contributor

Are these classes logic or gui? What is the complete work flow? ImportInspector? Merging? (Adding empty fields may not be enough.)

@tobiasdiez
Copy link
Member Author

They are logic and try to extract the non-gui part from the current EntryFetcher. Thus the workflow is not changed (at least not with this PR).

@matthiasgeiger
Copy link
Member

Is generally fine for me.

One remark: There are currently two different workflows for the "SearchBasedFetchers".
Normally one hits the "Search" button after inserting some search query and a list of found entries is presented, after selecting the desired entries they are imported.
However, at least for the GoogleScholar fetcher there is another dialog presenting more details to "preselect" the entries which should be shown in the list of found entries (I think the intention is to reduce the amount of queries).

Is the proposed "shallow" search method used in this case? Or is this difference in the workflow currently not really covered, yet?

@koppor
Copy link
Member

koppor commented Jul 18, 2016

I think the workflow is supported by the two methods performShallowSearch and performDeepSearch. I assume that the dialog is presented if performDeepSearch is present. I think that one needs an additional class having only performShallowSearch and a second one containing both methods.

My concern is that the intermediate results might not be available as bibtex entries, but as HTML only. I think, the ImportInspectionDialog was used for that, wasn't it.

What about error messages? I assume the aim is to use LOGGER only and to use the error console? I used to like the GUI dialogs showing errors, but OK for me to remove them to improve the code. (Until now, this was done by the OutputPrinter status, wasn't

@zellerdev will try to port GVKFetcher and the ISBNToBibTeX fetcher to the new infrastructure. He will do a PR on your repo, is that OK, @tobiasdiez?

@Siedlerchr
Copy link
Member

For the FulltextFetcher it would be awesome if the FullText finding could be run as part of a Cleanup operation, e.g. globally finding fulltext for all entries.
This should be kept in mind

@tobiasdiez
Copy link
Member Author

tobiasdiez commented Jul 18, 2016

Thanks for the feedback.
@matthiasgeiger, @koppor yes I thought about covering the two-stage search by the deep and shallow methods. Maybe this is not the best idea and one needs a different approach for the Google search. Thus for now I would concentrate on the simple fetchers which get all the metadata in one query.

@koppor The result from the server is of course HTML (or XML, JSON,...). It is the task of the fetcher to translate the response to something meaningful, i.e. to BibEntries. I expanded the arXiv fetcher to outline how this works (in fact the arXiv fetcher now understands free text queries and not just ids). I also included a way to notify the caller about errors...by, well, throwing exceptions (with meaningful error messages). Maybe the fetchers also need to post some events to let the GUI know about the progress.

@Siedlerchr Good idea. However a bit out of the scope of this PR. Could you file a new issue for it? Thanks!

So the plan for this PR is now to use the new interfaces parallel to the old code (so that I don't have to rewrite every fetcher).

@tobiasdiez
Copy link
Member Author

So PR is ready for review.
The new fetcher can now be included via a wrapper. In this way the fetcher can be transformed to the new interface step-by-step without the need to refactor everything at once.

If I get the ok for merge, then I (or @zellerdev) will move HelpFiles to logic and thereby fix the failing architecture test.

@oscargus
Copy link
Contributor

Many of the current fetchers will only fetch a limited number of entries. Typically based on user input. How is that going to be handled here?

* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/

package net.sf.jabref.logic.fulltext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gui?

@stefan-kolb
Copy link
Member

stefan-kolb commented Jul 20, 2016

Whereas I think the interface redesign is a good thing, I'd like to throw a few things for discussion.

  1. The interface hierarchy is too complex in my opinion why not just use:
    • WebFetcher: Has two methods for SearchBasedFetcher & IdBasedFetcher
    • IdFetcher
    • FulltextFetcher

The web fetcher hierarchy seems artificial to me. Why do we ultimately need one method interfaces?

  1. I really don't like the effect that we merge all fetchers into one class, e.g. your ArxiV.java class.
    Some fetcher are complex enough on their own so why create new 1000 lines classes being capable of searching, id fetching and full text retrieval. Let us just use 3 classes for the three types of fetching.

@stefan-kolb
Copy link
Member

On another note: I don't think that HelpFile.java belongs into the logic package.

@tobiasdiez
Copy link
Member Author

@oscargus Right now every parameter has to be passed to the fetcher via the constructor. Maybe it would make sense to include the number of results as a method parameter. Right now I don't have a feeling on how many fetcher actually support this (or if a common fixed value of say 30 would also work).

@stefan-kolb The idea to distinct between ID-based and text-based fetchers came from #383 and #550. Right now the interfaces only have one method but this might change in the future (for example, a method to check validity of a given ID would make sense for the ID-based fetcher).
If a fetcher is complex enough, then one of course still decompose the functionality in different classes. For the arXiv fetcher it felt like all the functionality is essentially given by one method (callApi) and the other methods are just different wrappers around it.
Do you have a good idea how to handle for the help files in fetchers (logic) if HelpFiles stays in gui?

@stefan-kolb
Copy link
Member

stefan-kolb commented Jul 20, 2016

I still don't get why these two types of fetcher methods cannot live in one interface.
Maybe this is more of a problem of our implementations than of the architecture abstraction?
I haven't checked the usage of the HelpFiles yet, tho I think this has to be managed by the GUI somehow.

@Siedlerchr
Copy link
Member

The helpfiles could be under the logic,, as they do not contain any gui related methods or any other related thing.
They just serve as a kind of Container class, they simply store the name of the HelpPage-Markdown file in an Enum.
The assembling of the url and opening is in the gui in HelpAction.

@oscargus
Copy link
Contributor

oscargus commented Jul 21, 2016 via email

@matthiasgeiger
Copy link
Member

Thanks Oscar for bringing up this aspect.

I think the main reason for introducing this step was to reduce the number of requests fired to the different providers as some of them are rather quick with blocking IPs crawling to much information in a short time.
Thus the number of matches to be shown is not only dependent on the intention of the user (find a single paper vs. search for all possible papers matching the search) but also might have technical reasons...
However, personally I don't like the workflow "Enter term" > "Start search" > "Determine number of hits and ask user how much hits should be displayed" > "List chosen number of hits"
I think "Enter term" > "Start search" >"List first X hits" + "Label: Showing first X of Y hits" + Button "get next X hits" would be better... but this would require some more methods in the Interfaces to start fetching from/to a given offset...

@stefan-kolb
Copy link
Member

@simonharrer and @matthiasgeiger have persuaded me that I'm dumb, please merge as intended. Maybe the naming can be improved a little bit but this is NP-hard.

@tobiasdiez tobiasdiez merged commit c6aa7da into JabRef:master Jul 22, 2016
@tobiasdiez tobiasdiez deleted the fetcherRefactoring branch July 22, 2016 16:15
@oscargus
Copy link
Contributor

OK, so how should blocking be avoided? I agree that @matthiasgeiger's approach is preferred, but there must be some support for fetching an arbitrary number of items.

Siedlerchr added a commit to Siedlerchr/jabref that referenced this pull request Jul 22, 2016
* master:
  Added LayoutFormatterPreferences (and related files) (JabRef#1608)
  [WIP] Create new fetcher infrastructure (JabRef#1594)
  Set user agent to fix 403 status error
  More fields added to FieldName (JabRef#1614)
Siedlerchr added a commit to Siedlerchr/jabref that referenced this pull request Jul 24, 2016
* master: (268 commits)
  Added DateFormatter to LayoutEntry so that it actually works... (JabRef#1619)
  Implemented JabRef#1345 - cleanup ISSN (JabRef#1590)
  Always use https for help files (JabRef#1615)
  Resolves JabRef#1613 Use Jabref default icon for uninstaller for now (JabRef#1616)
  Added more fields and fixed some issues (JabRef#1617)
  Added LayoutFormatterPreferences (and related files) (JabRef#1608)
  [WIP] Create new fetcher infrastructure (JabRef#1594)
  Set user agent to fix 403 status error
  More fields added to FieldName (JabRef#1614)
  Added model.entry.FieldName that contains field name constants (JabRef#1602)
  Fixes imports
  Test CustomImporter (JabRef#1501)
  The field list gets the focus as soon as it is focused (JabRef#1541)
  When clicking on a tab, the first field now has the focus (JabRef#988)
  Add test in BibEntryWriterTest for type change
  Rewrite MedlineImporter with JAXB and add nbib fields (JabRef#1479)
  Some Globals.prefs injection in logic and model (JabRef#1593)
  Added filter to not show selected integrity checks (JabRef#1588)
  Replace getField with getFieldOptional in all of the tests and in som… (JabRef#1591)
  Move preferences (JabRef#1604)
  ...

# Conflicts:
#	src/main/java/net/sf/jabref/importer/ImportMenuItem.java
#	src/main/java/net/sf/jabref/importer/OpenDatabaseAction.java
ayanai1 pushed a commit to ayanai1/jabref that referenced this pull request Aug 1, 2016
Move event (JabRef#1601)

* Move event package to model

Update dependencies: postgres 9.4.1208 -> 9.4.1209 and wiremock from 2.1.6 to 2.1.7

Added ISBN integrity checker (JabRef#1586)

 Added ISBN integrity checker

* Extracted ISBN class

Reenable errorprone (see http://errorprone.info/)

Extend the OpenConsoleFeature (JabRef#1582)

* Extend the OpenConsoleFeature by selection of custom terminal emulator.

- Add radio selection to the AdvancedTab
- Add new JabRefPreferences
- Add file check and execution commands
- Add localization keys

* Fix localization key.

* Move console selection to ExternalTab.java

* Change localization entry.

* Add command executor.

* Fix placeholder replacement.

* Fix codacy.

* Update localization key.

* Remove "Specify terminal emulator" option. Add GUI outputs.

* Set default command for Windows. Fix localization entries.

* Remove empty lines in language files.

* Use lambda expressions insead of ActionListeners

* Refactoring.

* Update CHANGELOG.md.

* Small refactorings.

Move preferences (JabRef#1604)

* Move preferences-related classes into separate package

* Rename JabRefPreferencesFilterDialog -> PreferencesFilterDialog and move it to gui

* Fix checkstyle warning

Set user agent to fix 403 status error

Replace getField with getFieldOptional in all of the tests and in som… (JabRef#1591)

* Replace getField with getFieldOptional in all of the tests and in some more code

* Some more conversions

Added filter to not show selected integrity checks (JabRef#1588)

* Added filter to not show selected integrity checks

* Removed unused variable

Some Globals.prefs injection in logic and model (JabRef#1593)

* Some Globals.prefs injection in logic and model

* Some more conversions and some fixes

* More injections

* Even more injections

* Yes, even more injections

* Indeed, even more injections

* Probably the last injections for now

* Removed unrequired dependency and fixed issue

* Dropped support for selecting sub/super to equations

* Added preference classes for LatexFieldFormatter and FieldContentParser

* Removed some left over code

* Added JournalAbbreviationPreferences

* Encapsulated LatexFieldFormatterPreferences in SavePreferences

* Changed getShortDescription to accept boolean argument

* Removed Globals.prefs from tests and removed unused imports

* Unused import

* Unused import

Rewrite MedlineImporter with JAXB and add nbib fields (JabRef#1479)

Add test in BibEntryWriterTest for type change

When clicking on a tab, the first field now has the focus (JabRef#988)

* the first Field does now have focus when clicking on a tab in the entry editor
* Make first field focused when selecting a tab in entry editor

The field list gets the focus as soon as it is focused (JabRef#1541)

Test CustomImporter (JabRef#1501)

* Test CustomImporter

Fixes imports

Added model.entry.FieldName that contains field name constants (JabRef#1602)

* Added model.entry.FieldName that contains field name constants

* More constants

* Renamed and added more constants

* Some more fields and cleanups

* Removed MedlineHandler left from merge conflicts

More fields added to FieldName (JabRef#1614)

* More fields added to FieldName

* Some Medline fixes

[WIP] Create new fetcher infrastructure (JabRef#1594)

* Introduce new Fetcher interfaces

* Refactor arXiv fulltext fetcher

* Add query based arXiv fetcher

* Reformat code

* Add a few tests for the arxiv parser

* Make new arXiv fetcher available

* Fix small problems related to files

* Fix tests

* Rename interface methods

* Add changelog entry

* Mark old EntryFetcher interface as deprecated

* Move fetcher to importer \ fetcher

* Move HelpFile from gui.help to logic.help

* Rename fetchers

* Rename FulltextFinder

* Optimize imports

* Fix failing test

* Ignore failing test

Added LayoutFormatterPreferences (and related files) (JabRef#1608)

* Added LayoutFormatterPreferences (and related files)

* Rebased

* Included JournalAbbreviationLoader in LayoutPreferences

Added more fields and fixed some issues (JabRef#1617)

Resolves JabRef#1613 Use Jabref default icon for uninstaller for now (JabRef#1616)

Always use https for help files (JabRef#1615)

Implemented JabRef#1345 - cleanup ISSN (JabRef#1590)

* Implemented JabRef#1345 - cleanup ISSN

* Fixed comments

* Extracted ISSN class

* Added tests for ISSN and ISBN

Added DateFormatter to LayoutEntry so that it actually works... (JabRef#1619)

Converted a few getField to getFieldOptional (JabRef#1625)

* Converted a few getField to getFieldOptional

Fixed JabRef#636 by using DOICheck and DOIStrip in export filters (JabRef#1624)

Improved LaTeX to Unicode/HTML formatters to output more sensible values for unknown commands (JabRef#1622)

Updated preview entries (JabRef#1606)

* Updated preview entries, which return new entry

Moved, removed, and used String constants (JabRef#1618)

* Moved, removed, and used String constants

* Some more fixes

* Moved NEWLINE, made FILE_SEPARATOR public and used it

* Moved NEWLINE and FILE_SEPARATOR to OS

* Moved ENCODING_PREFIX and SIGNATURE

* Corrected Globals in a few comments...

* Apparently the localization tests find commented out text...

More field names and a method (JabRef#1627)

* Introduced FieldName in ArXiV

* Some more field names

* More field names

Cleanup FindFile and asssociated tests (JabRef#1596)

* Cleanup FindFile and rework it using Streams and nio methods-
* Unignore test for trying on CI
* Use explicit List and Set in findFiles and caller methods
* Use Lazy Stream to find files

changes should be tested manually

Some enhancements and cleanups related to dates (JabRef#1575)

* Some enhancements and cleanups related to dates

* Fixed some time zone issues

* Replaced SimpleDateFormat in ZipFileChooser and replaced arrays with Lists

* Changed EasyDateFormat constructors

* Fixed stupid mistake

* Added CHANGELOG entry

* Maybe tests are passing now?

* Some server side print debugging...

* As it should be

* Tryng LocalDateTime

* No time zone

* Added test for Cookie

* Fixed imports...

* Added a third possible date format as it turns out that the server changed while developing this PR

Builds are now stored via build-upload.jabref.org

Consistent file name casing (and other localization improvements) (JabRef#1629)

* AUX files

* ZIP files

* BIB files

* JAR files

* didn't

* Couldn't what's

* Consistent casing

* AUX apparently is commonly used in French words...

* Fixed the flawed quick-and-dirty find-and-replace failures

define xjc input/ouput dir (subsequent builds will be faster) (JabRef#1628)

Execute task only when input/output dir changed.

Fixed a minor issue and refactored MergeEntries (JabRef#1634)

* Fixed a minor issue and refactored MergeEntries

* Fixed import

* Added CHANGELOG entry

Added LabelPatternPreferences (JabRef#1607)

* Added LabelPatternPreferences

* Removed static initializer

More tests (JabRef#1635)

* Added more tests for Cookie

* Enabled some layout tests and added test for StringUtil.intValueOfWithNull

* Updated a test

* Split tests

Updated Errorprone to 2.0.11 (JabRef#1636)

* Updated Errorprone to 2.0.11

* Corrected test

Keep @comment text in a bib file (JabRef#1638)

* Kep @comment text in a bib file

* Add test for @comment that contains regular entries

Replaced some getField and fixed some bugs (JabRef#1631)

* Replaced some getField and fixed some bugs

* Fixed a few things

* Added CHANGELOG entries

* Improved equals implementation

* Text book equals and hashCode

Fixed JabRef#1639 (JabRef#1641)

* Fixed JabRef#1639

* Removed old code

Export OO/LO citations to new database (JabRef#1630)

* Export OO/LO citations to new database

* Fixed problem with duplicates

* Added some comments

* Fixed spelling in comment

* Removed general Exception

Unified some equals (JabRef#1640)

* Unified some equals

* Imported correct Objects...

Fixed one more NPE which should have been fixed in JabRef#1631 (JabRef#1649)

Finished method to hide visible fields and show hidden fields

- Hide method done
- Show method done
- ToDo repaint hidden field
- ToDo test class

finished field repaint

remove sysouts
@zesaro zesaro mentioned this pull request Aug 1, 2016
3 tasks
@koppor koppor changed the title [WIP] Create new fetcher infrastructure Create new fetcher infrastructure Aug 19, 2016
@koppor koppor removed the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Aug 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: code-quality Issues related to code or architecture decisions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants