Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Like This queries return 0 results if field in source document has 0 tokens in analyzed field #30148

Closed
kylelyk opened this issue Apr 25, 2018 · 6 comments
Labels
>bug good first issue low hanging fruit :Search/Search Search-related issues that do not fall into other categories

Comments

@kylelyk
Copy link

kylelyk commented Apr 25, 2018

Elasticsearch version: 6.2.4

Plugins installed: []

JVM version: 1.8.0_101

OS version: MacOS (Darwin Kernel Version 15.6.0)

Description of the problem including expected versus actual behavior:
"More Like This" queries do not return any results when a field on the source document produces no tokens at index time. Using a keyword field and manually specifying the analyzer at query time works as expected.

Steps to reproduce:

PUT test
{
  "mappings": {
    "type": {
      "properties": {
        "myField": {
          "type": "text"
        },
        "empty": {
          "type": "text"
        }
      }
    }
  }
}
POST /_bulk
{ "index":  { "_index": "test", "_type": "type","_id":1}}
{"myField":"and_foo", "empty":""}
{ "index":  { "_index": "test", "_type": "type","_id":2}}
{"myField":"and_foo", "empty":""}

This query correctly returns 1 result:

GET /_search
{
  "query": {
    "more_like_this": {
      "fields": [
        "myField"
      ],
      "like": [
        {
          "_index": "test",
          "_type": "type",
          "_id": "1"
        }
      ],
      "min_term_freq": 1,
      "min_doc_freq": 1
    }
  }
}

This query returns no results when using both fields:

GET /_search
{
  "query": {
    "more_like_this": {
      "fields": [
        "myField", "empty"
      ],
      "like": [
        {
          "_index": "test",
          "_type": "type",
          "_id": "1"
        }
      ],
      "min_term_freq": 1,
      "min_doc_freq": 1
    }
  }
}

If you update the "empty" field in document 1 to contain non-analyzable characters (like punctuation), the first query still gives 0 results. Changing the "empty" field to be a keyword field works as expected.

@andyb-elastic
Copy link
Contributor

This reproduces on master. Another way of looking at it is that a MLT document with an empty field will produce other documents with an empty field if it's a keyword type, but not if it's a text type.

@andyb-elastic andyb-elastic added >bug :Search/Search Search-related issues that do not fall into other categories labels Apr 25, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@andyb-elastic
Copy link
Contributor

I'm not sure if this would be considered a bug or just a difference in handling analyzed vs non analyzed fields, but either way it seems unintuitive

@andyb-elastic
Copy link
Contributor

It's probably also worth noting the MLT query produces an exception when the documents only have the empty field

PUT test
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }, 
  "mappings": {
    "_doc": {
      "properties": {
        "nothing": {
          "type": "text"
        }
      }
    }
  }
}

POST /_bulk
{ "index":  { "_index": "test", "_type": "_doc","_id":1}}
{"nothing": ""}
{ "index":  { "_index": "test", "_type": "_doc","_id":2}}
{"nothing": ""}

GET /_search
{
  "query": {
    "more_like_this": {
      "fields": [
        "nothing"
      ],
      "like": [
        {
          "_index": "test",
          "_type": "_doc",
          "_id": "1"
        }
      ],
      "min_term_freq": 1,
      "min_doc_freq": 1
    }
  }
}
[2018-04-25T15:57:19,682][DEBUG][o.e.a.t.TransportShardMultiTermsVectorAction] [DFW1LSW] [test][0] failed to execute multi term vectors for [_doc]/[1]
org.elasticsearch.ElasticsearchException: failed to execute term vector request
        at org.elasticsearch.index.termvectors.TermVectorsService.getTermVectors(TermVectorsService.java:150) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.index.termvectors.TermVectorsService.getTermVectors(TermVectorsService.java:77) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.action.termvectors.TransportShardMultiTermsVectorAction.shardOperation(TransportShardMultiTermsVectorAction.java:85) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.action.termvectors.TransportShardMultiTermsVectorAction.shardOperation(TransportShardMultiTermsVectorAction.java:41) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$1.doRun(TransportSingleShardAction.java:112) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: java.lang.NullPointerException
        at org.elasticsearch.action.termvectors.TermVectorsWriter.setFields(TermVectorsWriter.java:82) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.action.termvectors.TermVectorsResponse.setFields(TermVectorsResponse.java:361) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.index.termvectors.TermVectorsService.getTermVectors(TermVectorsService.java:146) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        ... 9 more

@colings86 colings86 added the good first issue low hanging fruit label Apr 26, 2018
@aditya-agrawal
Copy link
Contributor

Hello, Can I take up this bug? any pointers are much appreciated

@aditya-agrawal
Copy link
Contributor

At this point ->
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/action/termvectors/TermVectorsWriter.java#L69

fieldTermVector is getting null. I initialized fieldTermVector with EMPTY_TERMS in case of null.
This solved both the above given issues.
Should I go ahead and make PR for it?

cbuescher pushed a commit that referenced this issue May 8, 2018
Fixes and edge case when using `more_like_this` where TermVectorsWriter
could throw an NPE when a field produced zero tokens after analysis. This
changes the implementation to use an empty list of tokens in this case.

Closes #30148
cbuescher pushed a commit that referenced this issue May 8, 2018
Fixes and edge case when using `more_like_this` where TermVectorsWriter
could throw an NPE when a field produced zero tokens after analysis. This
changes the implementation to use an empty list of tokens in this case.

Closes #30148
colings86 pushed a commit that referenced this issue May 8, 2018
Fixes and edge case when using `more_like_this` where TermVectorsWriter
could throw an NPE when a field produced zero tokens after analysis. This
changes the implementation to use an empty list of tokens in this case.

Closes #30148
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug good first issue low hanging fruit :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

5 participants