Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search returns nothing without specifying fields, but returns hits with fields #850

Closed
feliperaul opened this issue Feb 16, 2017 · 7 comments

Comments

@feliperaul
Copy link

I can't understand how the same query without fields specified returns no hits, but if I specify a field then it returns hits.

I have a simple model without any options in searchkick invocation:

model Property

searchkick callbacks: :async

I have an instance with lorem ipsum in the field description.

This should search all fields, but returns no hits:

Property.search "lorem ipsum"

 @options=
  {:page=>1,
   :per_page=>1000,
   :padding=>0,
   :load=>true,
   :includes=>nil,
   :json=>false,
   :match_suffix=>"analyzed",
   :highlighted_fields=>[]},
 @response=
  {"took"=>12,
   "timed_out"=>false,
   "_shards"=>{"total"=>5, "successful"=>5, "failed"=>0},
   "hits"=>{"total"=>0, "max_score"=>nil, "hits"=>[]}}>

But the same query, specifying a field, returns hits:

Property.search "lorem ipsum", fields: [:description_br]

 @options=
  {:page=>1,
   :per_page=>1000,
   :padding=>0,
   :load=>true,
   :includes=>nil,
   :json=>false,
   :match_suffix=>"analyzed",
   :highlighted_fields=>[]},
 @response=
  {"took"=>11,
   "timed_out"=>false,
   "_shards"=>{"total"=>5, "successful"=>5, "failed"=>0},
   "hits"=>
    {"total"=>2,
     "max_score"=>180.95953,
     "hits"=>
      [{"_index"=>"properties_development_20161222100541237",
        "_type"=>"property",
        "_id"=>"28036",
        "_score"=>180.95953},
       {"_index"=>"properties_development_20161222100541237",
        "_type"=>"property",
        "_id"=>"28037",
        "_score"=>180.95953}]}}>
@feliperaul
Copy link
Author

feliperaul commented Feb 16, 2017

I did some more tests. If field description_br is only a short string (like test lorem ipsum test), Property.search "lorem ipsum" will return a hit. But if description_br is really long text (90.000 chars long) it returns nothing, unless I specify a field.

@feliperaul
Copy link
Author

feliperaul commented Feb 16, 2017

Using debug: true in the query, I managed to get this info about description_br field:

"description_br": {
            "type": "keyword",
            "fields": {
              "analyzed": {
                "type": "text"
              }
            },
            "include_in_all": true,
            "ignore_above": 256
          },

If i understand correctly (not an expert on ES, that's why I use searchkick), this is a multi-field scenario, where description_br is being indexed as a keyword and also as text allowing full text search.

It should be present in _all (as include_in_all is true), but then there's ignore_above: 256 ...

If that's the default, how we should use ElasticSearch for full text search for long documents (type text or longtext in a MySQL database, being 20+ pages long of text each) if ignore_above is set to 256 and the max is 32766 bytes?

And I still don't understand why the search works if I specify a field. Is ignore_above only in effect for include_in_all ? I couldn't understand this by reading ES documentation.

@ankane
Copy link
Owner

ankane commented Feb 17, 2017

Hey @feliperaul, thanks for digging into this. After I dug in a bit more, here's an explanation of what's happening:

When you specify fields, Searchkick uses the analyzed multi field. Everything works as expected in this scenario.

When you don't specify fields, Searchkick uses the _all field. Since ignore_above is set, values longer than 256 are not being included.

ignore_above is set to protect against Lucene’s term byte-length limit, since for where queries, the entire value must be indexed as single term.

From my experience, you always want to specify fields. Elasticsearch 6 will disable the _all field by default. elastic/elasticsearch#22144

However, this behavior is pretty counterintuitive, so think it's worth addressing. A few options on my mind:

  1. Push a fix to bump ignore_above to 30000. 32766 is the limit, but want to leave some room for unicode characters. The downside is this will increase the index size for when indexing long fields. This is what I'm leaning towards.
  2. Add a note to the readme about specifying fields and the impact of not doing it. Not exactly sure how to do this without confusing people.

@feliperaul
Copy link
Author

@ankane Thanks for the quick reply.

Since when people think of ElasticSearch/SearchKick they think "full text search", I think solution 1 isn't the best one because the limit is still very low for most use cases.

Option 2 is by far my favorite. It should be high up in the readme, specially if ES6 will disable _all by default.

BTW, it seems that the ignore_abov max for UTF-8 text should be 32766 / 3, so 10.922 chars (source: the last note on this page https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html)

image

@ankane
Copy link
Owner

ankane commented Mar 9, 2017

Hey @feliperaul, going with option 1. 10,922 is suggested in case all of your characters are 3 bytes. In most circumstances, this won't be the case, but if it is, users will see an error, as they do with earlier versions of Searchkick. We can address the ES 6 issue when it gets closer (my current thoughts are to require either a default_fields option on the model or a fields option for the search method).

@ankane ankane closed this as completed Mar 9, 2017
@ankane ankane mentioned this issue Apr 24, 2017
@Xosmond
Copy link

Xosmond commented Jan 9, 2018

@ankane Elasticsearch 6 and it disables _all field and also mentions to use fields: "*", when somebody wants to do a search on all fields, but searchkick interpret fields to be an array and not a string.

Also says that we can define default_field on the index, but searchkick don't accept it.

https://github.com/elastic/elasticsearch/blob/master/docs/reference/query-dsl/simple-query-string-query.asciidoc

@Svashta
Copy link

Svashta commented Feb 21, 2018

Yes, I have the same "problem", can't search on _all fields anymore.
Have to specifcally define them in search. Not a major drawback, but when only one or two fields are used anyway, it can be annoying to allways set them specifically.

@lock lock bot locked as resolved and limited conversation to collaborators Dec 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants