-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More Like This queries return 0 results if field in source document has 0 tokens in analyzed field #30148
Comments
This reproduces on master. Another way of looking at it is that a MLT document with an empty field will produce other documents with an empty field if it's a keyword type, but not if it's a text type. |
Pinging @elastic/es-search-aggs |
I'm not sure if this would be considered a bug or just a difference in handling analyzed vs non analyzed fields, but either way it seems unintuitive |
It's probably also worth noting the MLT query produces an exception when the documents only have the empty field
|
Hello, Can I take up this bug? any pointers are much appreciated |
At this point -> fieldTermVector is getting null. I initialized fieldTermVector with EMPTY_TERMS in case of null. |
Fixes and edge case when using `more_like_this` where TermVectorsWriter could throw an NPE when a field produced zero tokens after analysis. This changes the implementation to use an empty list of tokens in this case. Closes #30148
Fixes and edge case when using `more_like_this` where TermVectorsWriter could throw an NPE when a field produced zero tokens after analysis. This changes the implementation to use an empty list of tokens in this case. Closes #30148
Fixes and edge case when using `more_like_this` where TermVectorsWriter could throw an NPE when a field produced zero tokens after analysis. This changes the implementation to use an empty list of tokens in this case. Closes #30148
Elasticsearch version: 6.2.4
Plugins installed: []
JVM version: 1.8.0_101
OS version: MacOS (Darwin Kernel Version 15.6.0)
Description of the problem including expected versus actual behavior:
"More Like This" queries do not return any results when a field on the source document produces no tokens at index time. Using a keyword field and manually specifying the analyzer at query time works as expected.
Steps to reproduce:
This query correctly returns 1 result:
This query returns no results when using both fields:
If you update the "empty" field in document 1 to contain non-analyzable characters (like punctuation), the first query still gives 0 results. Changing the "empty" field to be a keyword field works as expected.
The text was updated successfully, but these errors were encountered: