-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose splitOnWhitespace in Query String Query
#20965
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left two comments about changes that will be needed for the backport
@@ -200,6 +204,7 @@ public QueryStringQueryBuilder(StreamInput in) throws IOException { | |||
timeZone = in.readOptionalTimeZone(); | |||
escape = in.readBoolean(); | |||
maxDeterminizedStates = in.readVInt(); | |||
splitOnWhitespace = in.readBoolean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need serialization protection when backported to the 5.x branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually there should protection on 6.x too since we have not given up the idea about multi-version clusters yet
@@ -234,6 +239,7 @@ protected void doWriteTo(StreamOutput out) throws IOException { | |||
out.writeOptionalTimeZone(timeZone); | |||
out.writeBoolean(this.escape); | |||
out.writeVInt(this.maxDeterminizedStates); | |||
out.writeBoolean(this.splitOnWhitespace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about the serialization protection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. @dakrone might want another look just to be super sure though.
@@ -200,6 +207,9 @@ public QueryStringQueryBuilder(StreamInput in) throws IOException { | |||
timeZone = in.readOptionalTimeZone(); | |||
escape = in.readBoolean(); | |||
maxDeterminizedStates = in.readVInt(); | |||
if (in.getVersion().onOrAfter(V_5_1_0_UNRELEASED)) { | |||
splitOnWhitespace = in.readBoolean(); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably add an else
clause that sets splitOnWhitespace
to the appropriate value just to be super clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true. For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted. Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are: * A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined. * Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false. * Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true. For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted. Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are: * A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined. * Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false. * Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
This change adds an option called
split_on_whitespace
which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true.For instance the query
"foo bar"
would let the analyzer of the targeted field decide how the tokens should be splitted.Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are:
type
option which similarly to themulti_match
query defines how the free text should be parsed when multi fields are defined.split_on_whitespace
is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even whensplit_on_whitespace
is set to false.query_string_query
very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.Fixes #20841