-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support term presence in queries #332
Conversation
Prohibited, required and optional presence modifiers returning expected results in search tests.
Thank you for this nice feature! IHMO searches that only contain prohibited terms should have their behavior configurable since the results for the "empty field" is something which vary for each application. Adding another suggestion:. Ex:
Why? I'm pretty sure such a search-mode exist in full-search engines, I'd naively call that "ignore AND-ed terms that don't provide any result" |
Thanks for the feedback, some great suggestions...
It depends on what you mean by 'configurable'... If Lunr threw an I'm pretty sure the current behaviour of returning nothing is not right, I just sidestepped the issue when implementing this. I'd like to see what properly implementing this case looks like. I think it should perform the search, i.e. if all documents do not have the term then return all documents. More interesting is when some documents do have the term, that could easily be a valid use case. I'm guessing though that you had something more concrete in mind when you mentioned configuring the behaviour. I'm going to pass on this for now, but I'm not ruling it out in the future.
I looked (though not very hard) for this behaviour in Lucene and couldn't find anything, do you have any examples (for any search engine/library)? It should be possible to implement this in application code though right? If a query returns no results, drop required terms and try again. Possibly not the most efficient, or simple solution I guess. Rather than implement this directly in Lunr I think Lunr could provide the relevant information (matches per term) so that applications can handle this cases however they like. Probably isn't going to make it into 2.2.x though. It would almost definitely require a major version bump as |
I don't have examples and I surely don't want such a discussion about exotic feature to block AND-search :)
yes (but with a performance impact because it could not reuse AND search capabilities). |
I've hacked together an implementation where by searches containing only prohibited terms work as expected. Specifically, if the document does not contain that term it gets returned, even if that means all documents. The last sticking point is how these results are scored, and therefore sorted. The way I've got it implemented now would mean all results have a score of Fixing the score to be |
Previously calculating the similarity with an empty vector would give a score of NaN. This might technically be correct but probably isn't very useful. Instead we change this to zero. Scoring empty vectors is a side effect of introducing totally negated queries.
In totally negated queries there will be no matches and therefore an empty MatchData object is required for returning in the search results.
A negated query is one in which _all_ terms have a presence of prohibited. When executing a negated query some special handling is required and this method is used to trigger that handling.
A negated query is one in which all terms have a prohibited presence. Handling these queries requires some special handling. Specifically empty match data should be created as well as extracting _all_ field refs to construct the results.
In cases where we know a document/field is not going to make it into the results due to a terms presence (either required or prohibited) there is no point in calculating a similarity score. This leads to a minor performance improvement for queries that contain either required or prohibited terms.
Finally got round to working on this again. Totally negated queries are now supported, each document that does not contain that term will be in the results, the score should be zero for each of them. An alpha release is available on npm, lunr@2.2.0-alpha.2 I think that covers everything on this feature, if there are no objections I aim to get a release with this and a couple of other small features in the next few days. |
Adds support for modifying the required presence of a given term in matching documents.
Currently, when searching with multiple terms, each term is optional in the search results. That is, a search for "foo bar" will return results with either "foo" or "bar" or both "foo" and "bar", effectively terms are combined with a logical OR.
This change adds support for marking terms as either required or prohibited. A required term must appear in a document for that document to be returned in the search results. A prohibited term must not appear in a document for that document to be returned.
A term's presence is indicated by either a prefixed "+" (plus) for required terms, and a "-" (minus) for prohibited term when using
lunr.Index#search
. Constructing queries programatically usinglunr.Index#query
also support presence using thepresence
option with a value thelunr.Query.presence
enum.For example, to perform a search for documents that must contain "foo" and optionally contain "bar" use the following search string:
+foo bar
.A simple, editable, example should help make this clear.
There is one open question, when a search contains only prohibited terms currently no documents are returned. Technically this should return all documents in the index, but I'm not sure that is too useful. Is the current behaviour correct, i.e. just no search results, or should an InvalidQuery exception be thrown?
An alpha release is available on npm, lunr@2.2.0-alpha.1