Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only highlight ALL matching fields if query doesn't target specific fields #9091

Closed
wants to merge 1 commit into from

Conversation

lukasolson
Copy link
Member

Fixes #2358.

From the Elasticsearch highlighting docs:

require_field_match can be set to false which will cause any field to be highlighted regardless of whether the query matched specifically on them. The default behaviour is true, meaning that only fields that hold a query match will be highlighted.

Prior to this PR, we were setting require_field_match to false on all requests from discover. As a result, even if you queried specific fields, or added filters that matched specific fields, you would see matches highlighted from fields not specifically queried.

This PR changes the behavior such that the require_field_match parameter is only set to false if the search query is a simple query_string query that doesn't target any fields. As a result, if you query specific fields, then the highlighted matches should only apply to the queried fields.

This is, in my opinion, a step in the right direction. A couple of notes about this PR:

The code that detects when the query string applies to a specific field is quite primitive. It could definitely be improved, as it currently supposes that a query such as ":foo" targets a specific field, even though it doesn't. We could be smarter about this but I think it would require some sort of query parser that could be more complicated than it's worth at this point.

There is still a limitation of this implementation: If you have a filter that targets a specific field, then do a search query that doesn't target a specific field, then it's possible that over-highlighting could still occur.

For example, if I have a geo.dest: "CN" filter, then search for something like jpg in the query bar, you could still get highlighting of CN under other matching fields, such as geo.src. However, matches of jpg would still be highlighted, regardless of field.

The alternative is to have additional logic that checks if there are any filters that target specific fields, and if there are, then to not set require_field_match to false. However, this could have the side effect of under-highlighting; in the same scenario above, this would mean that no matches of jpg would be highlighted, since no field was specified. I am personally of the opinion that the former is a better trade-off, but I'm open to discussing this.

@jimmyjones2
Copy link
Contributor

@lukasolson LGTM, seems like a sensible tradeoff. Thanks for picking this up!

@Bargs
Copy link
Contributor

Bargs commented Nov 16, 2016

Thanks for the detailed description! It really helped me understand your thought process.

Seems like ES barfs on : in the search value unless it's double quoted or escaped so that might be a way to fix the :foo problem, only count unquoted colons in the condition. But perhaps that's a slippery slope towards more query parsing, as you said.

The changes to all query have an interesting impact on this issue. If I understand correctly, it searches all of the fields (as opposed to the _all field) if no field is specified, so in that case I would expect it to highlight all occurrences of the value in the doc even without require_field_match set to false. If that's the case, we're going to end up with inconsistent highlighting if the user is searching across indices where some have an _all field and some don't. I tested this, and the results seem agree with that conclusion:

note on index names logstashall-0 contains an _all field, logstashnoall-0 does not. So in other words, logstashnoall is the one using the brand new all query.

screen shot 2016-11-16 at 3 01 17 pm

IMO the highlighting with the new all query is exactly what we want. Terms with an explicit field will only highlight that field, while terms without a field specified will highlight any occurrence in any field. That made me re-consider whether we should leverage the flag for forcing the new all query functionality, but that would override any user provided default field in the index settings which I don't think we want to do :/

I tried the same query in master, and the results look more consistent in this one example at least:

screen shot 2016-11-16 at 3 20 19 pm

@Bargs
Copy link
Contributor

Bargs commented Nov 16, 2016

I wonder if we could make the case that this is leaking implementation details of highlighting with the query string query. Ideally highlighting should work the same regardless of whether you're using the new all query or querying _all under the hood. It would be awesome if implicit queries against _all could disable require_field_match just for that term, instead of the entire query.

What do you think @dakrone?

@tbragin
Copy link
Contributor

tbragin commented Nov 20, 2016

@lukasolson I agree this is a step in the right direction.

Quick question... how come when I combine predicates, when one of them does not have a field associated with it, it seems to get ignored completely? See last two screenshots, everything else works as I'd expect.

screen shot 2016-11-20 at 2 33 44 pm

screen shot 2016-11-20 at 2 33 59 pm

screen shot 2016-11-20 at 2 34 22 pm

screen shot 2016-11-20 at 2 34 38 pm

screen shot 2016-11-20 at 2 35 05 pm

screen shot 2016-11-20 at 2 35 16 pm

screen shot 2016-11-20 at 2 36 27 pm

screen shot 2016-11-20 at 2 36 56 pm

screen shot 2016-11-20 at 2 41 11 pm

@Bargs
Copy link
Contributor

Bargs commented Nov 21, 2016

@tbragin because a : exists in the query text, require_field_match is true. http is querying the _all field which we don't show, so there's no highlighting. This is what Lukas meant when he said the code detecting whether a query is targeting a specific field is a bit primitive, it simply looks for a : anywhere in the query string.

If you try indexing your data without the _all field in order to get the new "all query" mode, I think you'll get the highlighting you'd expect since the query is actually being made against all fields, instead of the _all field. That's why I was hoping to get some thoughts from the ES team on this difference.

@lukasolson
Copy link
Member Author

Perhaps related to elastic/elasticsearch#21621.

@lukasolson
Copy link
Member Author

@Bargs I've tested with _all disabled and removing require_field_match and you're right, it seems to be exactly what we want. The only problem is that for now, _all is enabled by default. And if you remove require_field_match when _all is enabled, then search for a simple term without specifying a field to search against, you'll get no highlighting in your results.

From what I can tell, we don't actually keep track of whether or not _all is enabled in the indexPattern object (in Kibana). What if we did keep track of it, and keep the behavior as implemented in this PR if _all is enabled, and if it's not, then don't add the require_field_match parameter at all? (This would possibly be a temporary workaround until we hear back something from the ES team about if this functions as it should.) Thoughts, @Bargs?

@dakrone Have you had a chance to look at @Bargs' question above?

@epixa epixa added v5.2.0 and removed v5.1.0 labels Nov 30, 2016
@Bargs
Copy link
Contributor

Bargs commented Dec 5, 2016

@lukasolson what if we forced the new all query behavior by default by setting the all_fields flag to true? I realized recently that users who want to opt out could override this setting (and optionally set their own default field) using the query:queryString:options advanced param we already expose.

I think that's the only way we can ensure users will have a consistent experience even if they have some indices with _all enabled and some with _all disabled.

The only downside is that this would be a breaking change because users who currently set the default field in their index settings would need to update a kibana advanced option to maintain that behavior.

@dakrone
Copy link
Member

dakrone commented Dec 5, 2016

Hi Everyone, apologies for the delayed response, I've been on vacation.

It would be awesome if implicit queries against _all could disable require_field_match just for that term, instead of the entire query.
What do you think @dakrone?

Unfortunately the detection of the auto-all mode is pretty distant from anything highlighter related, so I'm not sure this would be easy to do. That said, let me ask around with someone more familiar with highlighting and see if it could be possible.

@tbragin
Copy link
Contributor

tbragin commented Dec 7, 2016

Thanks @dakrone! I noticed @jimczi is working on something called "unified highlighter", is that work at all relevant here? elastic/elasticsearch#21621

@jimczi
Copy link

jimczi commented Dec 7, 2016

It would be awesome if implicit queries against _all could disable require_field_match just for that term, instead of the entire query.

That's seems feasible and this would also fix the discrepancy between _all as a field and the all_fields mode so I opened elastic/elasticsearch#22017 to continue the discussion.

@lukasolson
Copy link
Member Author

@Bargs @tbragin Seems like this is something that is now on the radar of the Elasticsearch team, and it will certainly be fixed by the issue linked above. In the meantime, is this PR a good short-term solution while the other is being worked on? In my opinion, it improves the current behavior, though it's not the exact solution we want, and we can't be sure how long this new issue will take to fix.

@clintongormley
Copy link
Contributor

@Bargs Kibana can use a highlight_query that uses all_fields mode and require_field_match: true (default). That way we get consistent and correct highlighting.

@Bargs
Copy link
Contributor

Bargs commented Dec 14, 2016

@clintongormley this will lead to inconsistent results if the user has defined something other than _all as their default_field in any of the index settings. Forcing all_fields in the highlight_query will overrule that default field setting.

I can also imagine scenarios where queries against _all and all_fields queries might be inconsistent. For instance, if a field is mapped to be excluded from _all, will it also be excluded from all_fields query? What if the analysis settings of a specific field are different from _all? I have to imagine there are some scenarios where _all and all_fields will differ.

I think it's really important that the highlighting be rock solid. If it's not, users will get confused and lose faith that the system is working correctly.

Also, what kind of impact does highlight_query have on performance?

@Bargs
Copy link
Contributor

Bargs commented Dec 14, 2016

@lukasolson I've been trying to think through the pros vs cons of the current solution in this PR.

Pro: Avoids over-highlighting (highlighting any field with a particular value even if the query specified a field)
Con: Can cause under-highlighting (nothing is highlighted for predicates without a term if a predicate with a term exists in the query)
Con: Can cause inconsistent results if user has a mix of indices with _all enabled and disabled.

Am I missing any other pros/cons? Personally, I feel like I'd prefer over-highlighting to under-highlighting. With over-highlighting my eyes can somewhat quickly filter out irrelevant matches, but with under-highlighting I might miss matches completely.

@clintongormley
Copy link
Contributor

@Bargs

this will lead to inconsistent results if the user has defined something other than _all as their default_field in any of the index settings. Forcing all_fields in the highlight_query will overrule that default field setting.

Agreed - I'd solve this by having an option in Kibana to disable all_fields highlighting. If the user has their own fields setup for default, then they should just work.

I can also imagine scenarios where queries against _all and all_fields queries might be inconsistent. For instance, if a field is mapped to be excluded from _all, will it also be excluded from all_fields query?

No, it'll be included in the all_fields query

What if the analysis settings of a specific field are different from _all?

This will work correctly

However, if a user has excluded certain fields from all with include_in_all: false, then that'll be ignored.

I think it's really important that the highlighting be rock solid. If it's not, users will get confused and lose faith that the system is working correctly.

I think we need to take into account the fact that we're on a journey here. all_fields has just been introduced, so there will be a transition period where we can't get everything right. It'll work well for almost all users. The minority will have the default_field set to another field or fields, and they can get correct behaviour with the option I mentioned in Kibana.

So it won't work perfectly for those who set include_in_all. This is a tiny percentage of people, and it hasn't been working properly for them up until now anyway. The _all field is going away, so let's not agonise too much on minor flaws while we move towards the destination.

@lukasolson
Copy link
Member Author

Replaced by #9671.

@lukasolson lukasolson closed this Dec 29, 2016
@lukasolson lukasolson deleted the fix/over-highlighting branch March 27, 2018 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Over-highlighting in Discover
9 participants