Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lucene search backend #8963

Closed
wants to merge 111 commits into from
Closed

Lucene search backend #8963

wants to merge 111 commits into from

Conversation

btut
Copy link
Contributor

@btut btut commented Jul 9, 2022

First draft of the switch to the Lucene Search Backend

Whats working

  • Added all bib-fields to the index
  • BibEntry hash is used as an identifier in the index
  • Both fulltext and bib-fields are indexed together, in the same index, in the background. BibEntry-fields changes are prioritized as they are very quick.
  • Can see in the code that searches in bib-fields are working as expected
  • Switch current search to use the lucene backend
  • Sort search-results by lucene score. This means search will not filter the table anymore, but merely sort according to results
  • Global search (in contrast to main table, the global search window filters for matches with score > 0)
  • Change the file button-icon depending on whether there was a fulltext-search match for this entry
  • Search-Groups
  • Floating search
  • Switch to force sort by score or not

Todo

  • Remove old search
  • Display x.y at score only (two decimal places)
    • The cell itself should be colored. From green to red (all in light shade): 10=green, 0= red
  • Decide whether to display score column. Currently, it shows when a search is active. Users probably cannot do anything with the values, though. Maybe use it to shade entries with high relevance?
    • --> We keep the score column
  • Use gray background for score==0. See Issues with search and groups and associated feature request (floating mode) #4237 (comment).
    • Scroll to top until hits are shown (scroll down 1 shows a gray line)

Follow-ups

  • Improve fulltext-search-results display
    • A little improvement was already done. The file icon in the table is now different (shows a magnifying glass) when there were search results in a linked file.
    • Could be improved even more by inserting a fulltext-results row below the actual row.
  • Improve query-entry (like Prototype Implement better search user interface #341 #8356?)

Problems:

  • The score-column is always used to sort the table as highest-priority. JabRef will only sort by two columns max. This means with the search-score being one of them, users can only sort by one column.
    • Solution: add sort-columns with shift-click
    • Solution: Sorting by score column is not forced but optional
  • Everything is indexed with the JabRef field names, which are in English. This means, either we translate the fields when creating an index, or people need to use the english terms for search queries. Translating the fields in english would mean a re-index needs to be triggered whenever the language changes and JabRef would need to notice if the language changed while it was closed.
    • Solution: leave as-is, search-syntax is English and was English before the migration to lucene
  • Search group migration.

Fixes #8857
Peek 2022-08-17 20-01

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@koppor koppor changed the base branch from main to version6 July 9, 2022 18:27
@koppor koppor added this to the v6.0 milestone Jul 9, 2022
@ThiloteE ThiloteE added the search label Jul 9, 2022
Comment on lines 92 to 97
HashMap<String, Float> boosts = new HashMap<String, Float>();
SearchFieldConstants.searchableBibFields.stream().forEach(field -> boosts.put(field, Float.valueOf(4)));

if (query.getSearchFlags().contains(SearchRules.SearchFlags.FULLTEXT)) {
Arrays.stream(SearchFieldConstants.PDF_FIELDS).forEach(field -> boosts.put(field, Float.valueOf(1)));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy, but ugly. Maybe define a boost property for the standard Fields and get that by Field-name?

@tobiasdiez
Copy link
Member

Very nice to see this implemented! A few comments from my side:

  • I wouldn't bother with the migration of search groups. There is going to be friction anyway if users use advanced search functionality. Maybe show a info message if the Bib file contains search groups saying that those groups might not work as expected and need to be migrated manually?
  • What about showing the search score as a graphic similar to the battery or signal icon on the phone?
  • I have not yet tested this version, but I am a bit sceptical about implementing the search via sorting instead of filtering. I suspect that the search score will differ slightly between similar entries and thus the secondary sorting by the user will have no impact. A particular use case: Find all works by a certain author and sort them according to the read status column.
  • I'm definitely a big fan of adding the search score as a column, but would leave it to the user to sort the results according to the score.

@btut
Copy link
Contributor Author

btut commented Jul 22, 2022

Very nice to see this implemented! A few comments from my side:

  • I wouldn't bother with the migration of search groups. There is going to be friction anyway if users use advanced search functionality. Maybe show a info message if the Bib file contains search groups saying that those groups might not work as expected and need to be migrated manually?

Would definitely reduce coding effort and migration code.

  • What about showing the search score as a graphic similar to the battery or signal icon on the phone?

Thought about that too, maybe with the option for power-users to show floats instead. I am very bad at UI design though, did not get good feedback whenever I implemented something, so would need some help with that.

  • I have not yet tested this version, but I am a bit sceptical about implementing the search via sorting instead of filtering. I suspect that the search score will differ slightly between similar entries and thus the secondary sorting by the user will have no impact. A particular use case: Find all works by a certain author and sort them according to the read status column.

Thats true, but filtering kind of defeats the purpose of a fulltext/fuzzy search IMO. Where do you draw the line for the filtering predicate? Score==0? In large libraries that will result in a table with lots of "bad" search results with scores slightly above 0 and people could need to scroll to find much better matches.

  • I'm definitely a big fan of adding the search score as a column, but would leave it to the user to sort the results according to the score.

In any case, we would need to have a very nice way to display the score if we do not sort on it, so users can still tell better fits apart.

@calixtus
Copy link
Member

gradlew clean?

@ThiloteE
Copy link
Member

Gradlew clean fixed the compilation, but now I get the following errors whenever I try to open a library:

java.lang.NullPointerException: Cannot invoke "org.jabref.gui.maintable.columns.MainTableColumn.getModel()" because "column" is null
	at org.jabref@100.0.0/org.jabref.gui.maintable.MainTable.lambda$new$7(MainTable.java:198)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:178)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.AbstractList$RandomAccessSpliterator.tryAdvance(AbstractList.java:708)
	at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:129)
	at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:527)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:150)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:647)
	at org.jabref@100.0.0/org.jabref.gui.maintable.MainTable.lambda$new$10(MainTable.java:200)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.jabref@100.0.0/org.jabref.gui.maintable.MainTable.<init>(MainTable.java:195)
	at org.jabref@100.0.0/org.jabref.gui.LibraryTab.createMainTable(LibraryTab.java:516)
	at org.jabref@100.0.0/org.jabref.gui.LibraryTab.setupMainPanel(LibraryTab.java:535)
	at org.jabref@100.0.0/org.jabref.gui.LibraryTab.<init>(LibraryTab.java:155)
	at org.jabref@100.0.0/org.jabref.gui.LibraryTab.createLibraryTab(LibraryTab.java:849)
	at org.jabref@100.0.0/org.jabref.gui.importer.actions.OpenDatabaseAction.openTheFile(OpenDatabaseAction.java:195)
	at org.jabref@100.0.0/org.jabref.gui.importer.actions.OpenDatabaseAction.lambda$openFiles$0(OpenDatabaseAction.java:172)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.jabref@100.0.0/org.jabref.gui.importer.actions.OpenDatabaseAction.openFiles(OpenDatabaseAction.java:170)
	at org.jabref@100.0.0/org.jabref.gui.JabRefGUI.openLastEditedDatabases(JabRefGUI.java:332)
	at org.jabref@100.0.0/org.jabref.gui.JabRefGUI.openDatabases(JabRefGUI.java:185)
	at javafx.graphics@20/com.sun.javafx.application.PlatformImpl.lambda$runLater$10(PlatformImpl.java:456)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:400)
	at javafx.graphics@20/com.sun.javafx.application.PlatformImpl.lambda$runLater$11(PlatformImpl.java:455)
	at javafx.graphics@20/com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
	at javafx.graphics@20/com.sun.glass.ui.win.WinApplication._runLoop(Native Method)
	at javafx.graphics@20/com.sun.glass.ui.win.WinApplication.lambda$runLoop$3(WinApplication.java:185)
	at java.base/java.lang.Thread.run(Thread.java:1623)

and


java.lang.NullPointerException: Cannot invoke "javafx.scene.control.TableColumn.setTableView(javafx.scene.control.TableView)" because "<local7>" is null
	at javafx.controls@20/javafx.scene.control.TableView$4.onChanged(TableView.java:792)
	at javafx.base@20/javafx.collections.WeakListChangeListener.onChanged(WeakListChangeListener.java:88)
	at javafx.base@20/com.sun.javafx.collections.ListListenerHelper$SingleChange.fireValueChangedEvent(ListListenerHelper.java:162)
	at javafx.base@20/com.sun.javafx.collections.ListListenerHelper.fireValueChangedEvent(ListListenerHelper.java:71)
	at javafx.base@20/javafx.collections.ObservableListBase.fireChange(ObservableListBase.java:238)
	at javafx.base@20/javafx.collections.ListChangeBuilder.commit(ListChangeBuilder.java:482)
	at javafx.base@20/javafx.collections.ListChangeBuilder.endChange(ListChangeBuilder.java:541)
	at javafx.base@20/javafx.collections.ObservableListBase.endChange(ObservableListBase.java:210)
	at javafx.base@20/javafx.collections.ModifiableObservableListBase.addAll(ModifiableObservableListBase.java:109)
	at org.jabref@100.0.0/org.jabref.gui.maintable.MainTable.<init>(MainTable.java:124)
	at org.jabref@100.0.0/org.jabref.gui.LibraryTab.createMainTable(LibraryTab.java:516)
	at org.jabref@100.0.0/org.jabref.gui.LibraryTab.setupMainPanel(LibraryTab.java:535)
	at org.jabref@100.0.0/org.jabref.gui.LibraryTab.<init>(LibraryTab.java:155)
	at org.jabref@100.0.0/org.jabref.gui.LibraryTab.createLibraryTab(LibraryTab.java:849)
	at org.jabref@100.0.0/org.jabref.gui.importer.actions.OpenDatabaseAction.openTheFile(OpenDatabaseAction.java:195)
	at org.jabref@100.0.0/org.jabref.gui.importer.actions.OpenDatabaseAction.lambda$openFiles$0(OpenDatabaseAction.java:172)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.jabref@100.0.0/org.jabref.gui.importer.actions.OpenDatabaseAction.openFiles(OpenDatabaseAction.java:170)
	at org.jabref@100.0.0/org.jabref.gui.JabRefGUI.openLastEditedDatabases(JabRefGUI.java:332)
	at org.jabref@100.0.0/org.jabref.gui.JabRefGUI.openDatabases(JabRefGUI.java:185)
	at javafx.graphics@20/com.sun.javafx.application.PlatformImpl.lambda$runLater$10(PlatformImpl.java:456)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:400)
	at javafx.graphics@20/com.sun.javafx.application.PlatformImpl.lambda$runLater$11(PlatformImpl.java:455)
	at javafx.graphics@20/com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
	at javafx.graphics@20/com.sun.glass.ui.win.WinApplication._runLoop(Native Method)
	at javafx.graphics@20/com.sun.glass.ui.win.WinApplication.lambda$runLoop$3(WinApplication.java:185)
	at java.base/java.lang.Thread.run(Thread.java:1623)

@calixtus
Copy link
Member

hm, interesting, can reproduce the bug on my windows machine, but not on my arch linux laptop

# Conflicts:
#	CHANGELOG.md
#	src/main/java/org/jabref/model/search/rules/ContainsBasedSearchRule.java
#	src/main/java/org/jabref/model/search/rules/GrammarBasedSearchRule.java
#	src/main/java/org/jabref/model/search/rules/RegexBasedSearchRule.java
# Conflicts:
#	CHANGELOG.md
#	src/main/java/org/jabref/model/search/rules/ContainsBasedSearchRule.java
#	src/main/java/org/jabref/model/search/rules/GrammarBasedSearchRule.java
#	src/main/java/org/jabref/model/search/rules/RegexBasedSearchRule.java
@github-actions
Copy link
Contributor

The build of this PR is available at https://builds.jabref.org/pull/8963/merge.

@koppor
Copy link
Member

koppor commented Oct 17, 2023

@AEgit
Copy link

AEgit commented Oct 17, 2023

I will just mention this my old comment here as well, so that it is not forgotten: #4237 (comment)

I have noted that for the Lucene search approach (#8963 (comment)), the following solution has been suggested to the 'floating mode' problem:

Use gray background for score==0. See https://github.com/JabRef/jabref/issues/4237#issuecomment-993325334.

I am not familiar with the technicalities of Lucene, but I reckon a score of 0 represents an item, that does not match the search criteria at all. That is not (!) the same as what the old floating mode in JabRef 3.8.2 offered - the items in gray did match the search criteria to a certain degree. The only difference between the items with white background and those with gray background was, that the white ones belonged to the group, that the user had selected at the moment. The gray ones did not (!) belong to the selected group.

Hopefully, this old behaviour of JabRef can be restored with the new Lucene search.

@koppor
Copy link
Member

koppor commented Nov 15, 2023

Some notes on the search syntax. Users might find it confusing if partial-word-search is not the default search. See laurent22/joplin#6349 and laurent22/joplin#6455 for discussions in a related project.

Full search documentation of Joplin: https://joplinapp.org/help/apps/search

LoayGhreeb added a commit to LoayGhreeb/jabref that referenced this pull request Jul 1, 2024
@koppor koppor unassigned btut Jul 3, 2024
@koppor koppor mentioned this pull request Jul 3, 2024
6 tasks
github-merge-queue bot pushed a commit that referenced this pull request Jul 24, 2024
* Fix selecting group triggers in all open libraries

* Change color of the search modifier buttons for dark theme

#8963 (comment)

* Add shortcut to open global search window "Ctrl+shift+F"

* fix keep on top subscription

* Fix context menu of the search bar

* Store preview divider position in global search window

Remove constructor

* prevent NPE

* Fix update search query triggers in all open libraries

* Adapt GUI test

* Clear search highlight in SourceTab after resetting search query

* Revert "Fix update search query triggers in all open libraries"

This reverts commit d6d455c.

* Revert "Fix selecting group triggers in all open libraries"

This reverts commit b0017fb.

* Remove regexValidator

* listen for changes for search flags

---------

Co-authored-by: Siedlerchr <siedlerkiller@gmail.com>
@koppor
Copy link
Member

koppor commented Aug 28, 2024

Follow-up PR: #11542

@koppor koppor closed this Aug 28, 2024
@koppor koppor deleted the luceneSearchBackend branch August 28, 2024 18:15
@LoayGhreeb LoayGhreeb mentioned this pull request Sep 2, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate JabRef search to Lucene
7 participants