Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable automated cross library search using a cross library query lan… #7124

Conversation

DominikVoigt
Copy link
Contributor

This PR adds the capabilities to perform certain aspects of literature studies.
Fixes koppor#369

It adds the capabilities to:
- Create studies and share the result with others using git
- Define certain aspects of the study, such as search terms and used E-Libraries
- Automatically crawl the specified E-Libraries and import their results to JabRef,
and merge the results of all E-Libraries into one result BibDatabase.
- Create diffs to inform the user of newly found publications since the last crawl.
- To filter the results of the crawling systematically.

  • Change in CHANGELOG.md described (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

…guage.

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>
Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot say anything about the functionality, but I have some suggestions/improvements for the code

src/main/java/org/jabref/logic/crawler/Crawler.java Outdated Show resolved Hide resolved
src/test/java/org/jabref/model/study/StudyTest.java Outdated Show resolved Hide resolved
@Siedlerchr Siedlerchr added the status: changes required Pull requests that are not yet complete label Nov 24, 2020
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides comments by @Siedlerchr LGTM

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>
…test classes

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>
Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>
@Siedlerchr Siedlerchr merged commit b19c3e4 into JabRef:master Nov 25, 2020
@tobiasdiez
Copy link
Member

Two general questions:

  • Why did you decided to implement the study format as a bib file? That is highly counter-intuitive (from a user as well as developer standpoint). Yml or json would have been a more logical choice.
  • Why do you require that the study file has to be in a git repository? Doesn't this reduce possible use cases? Moreover, what happens if the user want's to see the new papers, without pushing the repo? I would leave git management to the user.

Moreover, we should think about how to improve the experience for users trying out the feature the first time. Right now they are confronted with a file selection dialog, without any explanation of what they should actually select etc.

@DominikVoigt
Copy link
Contributor Author

Thanks for the feedback :)!

Regarding the first bullet point:
One reason was that JabRef already knew how to parse such a file, making the implementation fairly easy. Additionally, the entries can easily be modified directly in JabRef. Therefore it was a straightforward choice. This certainly is not the most user-friendly variant, but this can and will be addressed by offering a GUI for the creation of this file in the future.

Regarding the second bullet point:
The reason for this is that this allows easy versioning of search sessions allowing to track new entries without reevaluating old entries. I do not see how this could reduce the number of use cases? Furthermore, if the user does not want to create a remote repository for pushing a local one will suffice. In case a remote repository is configured I do not see any use case where a search should not be documented?

I agree that the UX is currently poor, however with the GUI implementation and the existing documentation for the feature it should suffice :).

@tobiasdiez
Copy link
Member

One reason was that JabRef already knew how to parse such a file, making the implementation fairly easy.

I agree, and think it's a good short term solution. However, in the long term, I would strongly favor a solution based on a different file format. There are also parsers that can be reused easily.

The reason for this is that this allows easy versioning of search sessions allowing to track new entries without reevaluating old entries.

That also works if the user manages the git repo on its own, right? Then one can also manually remove false-positives etc before commit.

I do not see how this could reduce the number of use cases?

Many users don't even know what a git repository is.

@koppor
Copy link
Member

koppor commented Nov 26, 2020

One reason was that JabRef already knew how to parse such a file, making the implementation fairly easy.

I agree, and think it's a good short term solution. However, in the long term, I would strongly favor a solution based on a different file format. There are also parsers that can be reused easily.

Sure. The idea was as follows:

  • The project had a time-constraint of 6 month with fixed costs. Thus, we had to adjust the content. (See Stakeholdererwartungen)
  • We wanted a GUI at the end of the thesis.
  • JabRef offers a GUI. Editing .bib files.
  • Thus, the currently delivered feature can be used with JabRef as UI.

The reason for this is that this allows easy versioning of search sessions allowing to track new entries without reevaluating old entries.

That also works if the user manages the git repo on its own, right? Then one can also manually remove false-positives etc before commit.

The user creates a separate branch (e.g., jabref-slr-search). The SLR functionality prepends new entry. Thus, there won't be any git merge conflicts when mergin the survey results into the own branch. This merging has to be done manually.

In other words: the SLR feature maintaines the branch jabref-slr-search. The user maintains the other branches. When a user wants to remove search results, he does it in his branch. With a commit message indicating why he removed the search result. Thus, another person can trace what happened in the search:

  • What did the crawler find?
  • What did the user remove?

Future work: SLR feature switches to branch slr-search, searches, commits, and switches back to the branch before. Thus, the user can choose an arbitrary branch name (such as main).

I do not see how this could reduce the number of use cases?

Many users don't even know what a git repository is.

With the upcoming GUI support, JabRef will create the git repository magically. No external dependencies required. The connection to an upstream repository is depatable. Maybe the thought-of target group is not that large as thought. Maybe a discussion with @xJREB could help here. Maybe, we should really do not do any git magic (push/pull), but leave that to the user.

Nevertheless, I would keep the "branch switchting magic". The SLR feature only works at the branch jabref-slr-search.

@tobiasdiez
Copy link
Member

The project had a time-constraint of 6 month with fixed costs. Thus, we had to adjust the content.

That I fully understand. And I'm impressed by what Dominik accomplished in such a short time period.
I just wanted to point out that for the integration in JabRef one needs to rethink a few of these decisions. For example, once we release a version that uses bib files as the study definition, it's way harder do later change it to say yaml, because you have to think about migration etc.

For the git support, it would maybe be good if you could outline what the goal and the workflow looks like. What are the use cases?
For example,

With the upcoming GUI support, JabRef will create the git repository magically.
Why would I want to have this as a user?

By skimming the code, I couldn't see how JabRef only maintains only the branch jabref-slr-search. It looked like it operates on the currently selected branch, and mostly commits everything (staged) - so even unrelated files in other directories. That might be undesired/unexpected from a user perspective.

Siedlerchr added a commit that referenced this pull request Dec 5, 2020
* upstream/master: (36 commits)
  Fix remembering password for sql db (#7154)
  Update to libre office 7.0.3 (#7150)
  Add IdBasedSearchFetcher to jstor (#7145)
  Squashed 'src/main/resources/csl-styles/' changes from 55200d0..a20406d
  Bump antlr4-runtime from 4.8-1 to 4.9 (#7136)
  Bump antlr4 from 4.8-1 to 4.9 (#7138)
  Bump mariadb-java-client from 2.7.0 to 2.7.1 (#7134)
  Bump classgraph from 4.8.90 to 4.8.92 (#7139)
  Bump mockito-core from 3.6.0 to 3.6.28 (#7135)
  Bump gittools/actions from v0.9.6 to v0.9.7 (#7144)
  Bump checkstyle from 8.37 to 8.38 (#7142)
  Add missing author
  Fix document viewer not showing first page (#7132)
  Add githandler mock to crawler test to fix NPE (#7133)
  Searchbar glyph icon colors in Dark Theme [FIXED] (#7131)
  Fix binding issue for the regex and case sensitive search buttons (#7125)
  Enable automated cross library search using a cross library query lan… (#7124)
  Add tracking
  Update Java Version
  Welcome Dominik ✌
  ...
@DominikVoigt DominikVoigt deleted the feature/create-crawler-for-literature-studies branch January 1, 2021 16:05
@DominikVoigt DominikVoigt removed the status: changes required Pull requests that are not yet complete label Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance literature survey
4 participants