-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some query extraction bugs. #29283
Conversation
While playing with the percolator I found two bugs: - Sometimes we set a min_should_match that is greater than the number of extractions. While this doesn't cause direct trouble, it does when the query is nested into a boolean query and the boolean query tries to compute the min_should_match for the entire query based on its own min_should_match and those of the sub queries. So I changed the code to throw an exception when min_should_match is greater than the number of extractions. - Boolean queries claim matches are verified when in fact they shouldn't. This is due to the fact that boolean queries assume that they are verified if all sub clauses are verified but things are more complex than that, eg. conjunctions that are nested in a disjunction or disjunctions that are nested in a conjunction can generally not be verified without running the query.
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Thanks for catching and fixing these bugs!
@@ -334,6 +334,9 @@ static Result analyze(Query query, Version indexVersion) { | |||
numOptionalClauses++; | |||
} | |||
} | |||
if (minimumShouldMatch > numOptionalClauses) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 because we know query will never match
Result subResult = analyze(clause.getQuery(), version); | ||
if (subResult.matchAllDocs == false && subResult.extractions.isEmpty()) { | ||
// doesn't match anything | ||
return subResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -313,7 +314,7 @@ private BooleanQuery createRandomBooleanQuery(int depth, List<String> fields, Ma | |||
numShouldClauses++; | |||
} | |||
} | |||
builder.setMinimumNumberShouldMatch(numShouldClauses); | |||
builder.setMinimumNumberShouldMatch(randomIntBetween(0, numShouldClauses)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did these changes to the randomized test catch the two bugs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make them more likely to catch bugs but unfortunately they did not.
* master: Reindex: Fix error in delete-by-query rest spec (elastic#29318) Improve similarity integration. (elastic#29187) Fix some query extraction bugs. (elastic#29283) [Docs] Correct experimental note formatting Move Nullable into core (elastic#29341) [Docs] Update getting-started.asciidoc (elastic#29294) Elasticsearch 6.3.0 is now on Lucene 7.3. [DOCS] Refer back to index API for full-document updates in _update API section (elastic#28677) Fix missing comma in ingest-node.asciidoc (elastic#29343) Improve exception handling on TransportMasterNodeAction (elastic#29314) Don't break allocation if resize source index is missing (elastic#29311) Use fixture to test repository-s3 plugin (elastic#29296) Fix NDCG for empty search results (elastic#29267) Pass through script params in scripted metric agg (elastic#29154) Fix Eclipse build. Upgrade to lucene-7.3.0-snapshot-98a6b3d. (elastic#29298) Painless: Remove extraneous INLINE constant. (elastic#29340)
While playing with the percolator I found two bugs: - Sometimes we set a min_should_match that is greater than the number of extractions. While this doesn't cause direct trouble, it does when the query is nested into a boolean query and the boolean query tries to compute the min_should_match for the entire query based on its own min_should_match and those of the sub queries. So I changed the code to throw an exception when min_should_match is greater than the number of extractions. - Boolean queries claim matches are verified when in fact they shouldn't. This is due to the fact that boolean queries assume that they are verified if all sub clauses are verified but things are more complex than that, eg. conjunctions that are nested in a disjunction or disjunctions that are nested in a conjunction can generally not be verified without running the query.
* master: Build: Fix Java9 MR build (elastic#29312) Reindex: Fix error in delete-by-query rest spec (elastic#29318) Improve similarity integration. (elastic#29187) Fix some query extraction bugs. (elastic#29283) [Docs] Correct experimental note formatting Move Nullable into core (elastic#29341) [Docs] Update getting-started.asciidoc (elastic#29294) Elasticsearch 6.3.0 is now on Lucene 7.3. [DOCS] Refer back to index API for full-document updates in _update API section (elastic#28677) Fix missing comma in ingest-node.asciidoc (elastic#29343) Improve exception handling on TransportMasterNodeAction (elastic#29314) Don't break allocation if resize source index is missing (elastic#29311) Use fixture to test repository-s3 plugin (elastic#29296) Fix NDCG for empty search results (elastic#29267) Pass through script params in scripted metric agg (elastic#29154) Fix Eclipse build. Upgrade to lucene-7.3.0-snapshot-98a6b3d. (elastic#29298) Painless: Remove extraneous INLINE constant. (elastic#29340) Remove HTTP max content length leniency (elastic#29337) Begin moving XContent to a separate lib/artifact (elastic#29300)
While playing with the percolator I found two bugs: - Sometimes we set a min_should_match that is greater than the number of extractions. While this doesn't cause direct trouble, it does when the query is nested into a boolean query and the boolean query tries to compute the min_should_match for the entire query based on its own min_should_match and those of the sub queries. So I changed the code to throw an exception when min_should_match is greater than the number of extractions. - Boolean queries claim matches are verified when in fact they shouldn't. This is due to the fact that boolean queries assume that they are verified if all sub clauses are verified but things are more complex than that, eg. conjunctions that are nested in a disjunction or disjunctions that are nested in a conjunction can generally not be verified without running the query.
* 6.x: Improve similarity integration. (elastic#29187) Fix some query extraction bugs. (elastic#29283) Fixed quote_field_suffix in query_string (elastic#29332) TEST: Update negative byte size setting error msg Fix bwc in GeoDistanceQuery serialization (elastic#29325)
* es/master: (68 commits) Allow using distance measure in the geo context precision (#29273) Disable failing query in QueryBuilderBWCIT. Fixed quote_field_suffix in query_string (#29332) Use fixture to test repository-url module (#29355) Remove undocumented action.master.force_local setting (#29351) Enhance error for out of bounds byte size settings (#29338) Fix QueryAnalyzerTests. Fix HasChildQueryBuilderTests to not use the `classic` similarity. [Docs] Correct javadoc of GetIndexRequest (#29364) Make TransportRankEvalAction members final Add awaits fix for a query analyzer test Check presence of multi-types before validating new mapping (#29316) Add awaits fix for HasChildQueryBuilderTests Remove silent batch mode from install plugin (#29359) Align cat thread pool info to thread pool config (#29195) Track Lucene operations in engine explicitly (#29357) Build: Fix Java9 MR build (#29312) Reindex: Fix error in delete-by-query rest spec (#29318) Improve similarity integration. (#29187) Fix some query extraction bugs. (#29283) ...
* es/6.x: (68 commits) Add note to migration docs on silent batch mode (#29365) Allow using distance measure in the geo context precision (#29273) Disable failing query in QueryBuilderBWCIT. Improve similarity integration. (#29187) Fix some query extraction bugs. (#29283) Fixed quote_field_suffix in query_string (#29332) TEST: Update negative byte size setting error msg Fix bwc in GeoDistanceQuery serialization (#29325) Move testMappingConflictRootCause to different class Enhance error for out of bounds byte size settings (#29338) [Docs] Correct javadoc of GetIndexRequest (#29364) Check presence of multi-types before validating new mapping (#29316) Make TransportRankEvalAction members final Pass through script params in scripted metric agg (#29154) Remove silent batch mode from install plugin (#29359) Track Lucene operations in engine explicitly (#29357) Build: Fix Java9 MR build (#29312) Reindex: Fix error in delete-by-query rest spec (#29318) Move Nullable into core (#29341) [Docs] Correct experimental note formatting ...
* master: (110 commits) Remove undocumented action.master.force_local setting (elastic#29351) Enhance error for out of bounds byte size settings (elastic#29338) Fix QueryAnalyzerTests. Fix HasChildQueryBuilderTests to not use the `classic` similarity. [Docs] Correct javadoc of GetIndexRequest (elastic#29364) Make TransportRankEvalAction members final Add awaits fix for a query analyzer test Check presence of multi-types before validating new mapping (elastic#29316) Add awaits fix for HasChildQueryBuilderTests Remove silent batch mode from install plugin (elastic#29359) Align cat thread pool info to thread pool config (elastic#29195) Track Lucene operations in engine explicitly (elastic#29357) Build: Fix Java9 MR build (elastic#29312) Reindex: Fix error in delete-by-query rest spec (elastic#29318) Improve similarity integration. (elastic#29187) Fix some query extraction bugs. (elastic#29283) [Docs] Correct experimental note formatting Move Nullable into core (elastic#29341) [Docs] Update getting-started.asciidoc (elastic#29294) Elasticsearch 6.3.0 is now on Lucene 7.3. ...
While playing with the percolator I found two bugs: - Sometimes we set a min_should_match that is greater than the number of extractions. While this doesn't cause direct trouble, it does when the query is nested into a boolean query and the boolean query tries to compute the min_should_match for the entire query based on its own min_should_match and those of the sub queries. So I changed the code to throw an exception when min_should_match is greater than the number of extractions. - Boolean queries claim matches are verified when in fact they shouldn't. This is due to the fact that boolean queries assume that they are verified if all sub clauses are verified but things are more complex than that, eg. conjunctions that are nested in a disjunction or disjunctions that are nested in a conjunction can generally not be verified without running the query.
While playing with the percolator I found two bugs:
extractions. While this doesn't cause direct trouble, it does when the query
is nested into a boolean query and the boolean query tries to compute the
min_should_match for the entire query based on its own min_should_match and
those of the sub queries. So I changed the code to throw an exception when
min_should_match is greater than the number of extractions.
is due to the fact that boolean queries assume that they are verified if all
sub clauses are verified but things are more complex than that, eg.
conjunctions that are nested in a disjunction or disjunctions that are nested
in a conjunction can generally not be verified without running the query.
For instance a query like
((a AND b) OR c OR d)
extracts all terms and claimsit is verified, which is not true.