Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some query extraction bugs. #29283

Merged
merged 1 commit into from
Apr 3, 2018
Merged

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Mar 28, 2018

While playing with the percolator I found two bugs:

  • Sometimes we set a min_should_match that is greater than the number of
    extractions. While this doesn't cause direct trouble, it does when the query
    is nested into a boolean query and the boolean query tries to compute the
    min_should_match for the entire query based on its own min_should_match and
    those of the sub queries. So I changed the code to throw an exception when
    min_should_match is greater than the number of extractions.
  • Boolean queries claim matches are verified when in fact they shouldn't. This
    is due to the fact that boolean queries assume that they are verified if all
    sub clauses are verified but things are more complex than that, eg.
    conjunctions that are nested in a disjunction or disjunctions that are nested
    in a conjunction can generally not be verified without running the query.
    For instance a query like ((a AND b) OR c OR d) extracts all terms and claims
    it is verified, which is not true.

While playing with the percolator I found two bugs:
 - Sometimes we set a min_should_match that is greater than the number of
   extractions. While this doesn't cause direct trouble, it does when the query
   is nested into a boolean query and the boolean query tries to compute the
   min_should_match for the entire query based on its own min_should_match and
   those of the sub queries. So I changed the code to throw an exception when
   min_should_match is greater than the number of extractions.
 - Boolean queries claim matches are verified when in fact they shouldn't. This
   is due to the fact that boolean queries assume that they are verified if all
   sub clauses are verified but things are more complex than that, eg.
   conjunctions that are nested in a disjunction or disjunctions that are nested
   in a conjunction can generally not be verified without running the query.
@jpountz jpountz added >bug :Search Relevance/Percolator Reverse search: find queries that match a document v7.0.0 v6.3.0 v6.2.4 labels Mar 28, 2018
@jpountz jpountz requested a review from martijnvg March 28, 2018 14:35
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks for catching and fixing these bugs!

@@ -334,6 +334,9 @@ static Result analyze(Query query, Version indexVersion) {
numOptionalClauses++;
}
}
if (minimumShouldMatch > numOptionalClauses) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 because we know query will never match

Result subResult = analyze(clause.getQuery(), version);
if (subResult.matchAllDocs == false && subResult.extractions.isEmpty()) {
// doesn't match anything
return subResult;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -313,7 +314,7 @@ private BooleanQuery createRandomBooleanQuery(int depth, List<String> fields, Ma
numShouldClauses++;
}
}
builder.setMinimumNumberShouldMatch(numShouldClauses);
builder.setMinimumNumberShouldMatch(randomIntBetween(0, numShouldClauses));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did these changes to the randomized test catch the two bugs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make them more likely to catch bugs but unfortunately they did not.

@jpountz jpountz merged commit 8cdd950 into elastic:master Apr 3, 2018
@jpountz jpountz deleted the fix/percolator_bugs branch April 3, 2018 14:44
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 3, 2018
* master:
  Reindex: Fix error in delete-by-query rest spec (elastic#29318)
  Improve similarity integration. (elastic#29187)
  Fix some query extraction bugs. (elastic#29283)
  [Docs] Correct experimental note formatting
  Move Nullable into core (elastic#29341)
  [Docs] Update getting-started.asciidoc (elastic#29294)
  Elasticsearch 6.3.0 is now on Lucene 7.3.
  [DOCS] Refer back to index API for full-document updates in _update API section (elastic#28677)
  Fix missing comma in ingest-node.asciidoc (elastic#29343)
  Improve exception handling on TransportMasterNodeAction (elastic#29314)
  Don't break allocation if resize source index is missing (elastic#29311)
  Use fixture to test repository-s3 plugin (elastic#29296)
  Fix NDCG for empty search results (elastic#29267)
  Pass through script params in scripted metric agg (elastic#29154)
  Fix Eclipse build.
  Upgrade to lucene-7.3.0-snapshot-98a6b3d. (elastic#29298)
  Painless: Remove extraneous INLINE constant. (elastic#29340)
jpountz added a commit to jpountz/elasticsearch that referenced this pull request Apr 3, 2018
While playing with the percolator I found two bugs:
 - Sometimes we set a min_should_match that is greater than the number of
   extractions. While this doesn't cause direct trouble, it does when the query
   is nested into a boolean query and the boolean query tries to compute the
   min_should_match for the entire query based on its own min_should_match and
   those of the sub queries. So I changed the code to throw an exception when
   min_should_match is greater than the number of extractions.
 - Boolean queries claim matches are verified when in fact they shouldn't. This
   is due to the fact that boolean queries assume that they are verified if all
   sub clauses are verified but things are more complex than that, eg.
   conjunctions that are nested in a disjunction or disjunctions that are nested
   in a conjunction can generally not be verified without running the query.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 3, 2018
* master:
  Build: Fix Java9 MR build (elastic#29312)
  Reindex: Fix error in delete-by-query rest spec (elastic#29318)
  Improve similarity integration. (elastic#29187)
  Fix some query extraction bugs. (elastic#29283)
  [Docs] Correct experimental note formatting
  Move Nullable into core (elastic#29341)
  [Docs] Update getting-started.asciidoc (elastic#29294)
  Elasticsearch 6.3.0 is now on Lucene 7.3.
  [DOCS] Refer back to index API for full-document updates in _update API section (elastic#28677)
  Fix missing comma in ingest-node.asciidoc (elastic#29343)
  Improve exception handling on TransportMasterNodeAction (elastic#29314)
  Don't break allocation if resize source index is missing (elastic#29311)
  Use fixture to test repository-s3 plugin (elastic#29296)
  Fix NDCG for empty search results (elastic#29267)
  Pass through script params in scripted metric agg (elastic#29154)
  Fix Eclipse build.
  Upgrade to lucene-7.3.0-snapshot-98a6b3d. (elastic#29298)
  Painless: Remove extraneous INLINE constant. (elastic#29340)
  Remove HTTP max content length leniency (elastic#29337)
  Begin moving XContent to a separate lib/artifact (elastic#29300)
jpountz added a commit that referenced this pull request Apr 4, 2018
While playing with the percolator I found two bugs:
 - Sometimes we set a min_should_match that is greater than the number of
   extractions. While this doesn't cause direct trouble, it does when the query
   is nested into a boolean query and the boolean query tries to compute the
   min_should_match for the entire query based on its own min_should_match and
   those of the sub queries. So I changed the code to throw an exception when
   min_should_match is greater than the number of extractions.
 - Boolean queries claim matches are verified when in fact they shouldn't. This
   is due to the fact that boolean queries assume that they are verified if all
   sub clauses are verified but things are more complex than that, eg.
   conjunctions that are nested in a disjunction or disjunctions that are nested
   in a conjunction can generally not be verified without running the query.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 4, 2018
* 6.x:
  Improve similarity integration. (elastic#29187)
  Fix some query extraction bugs. (elastic#29283)
  Fixed quote_field_suffix in query_string (elastic#29332)
  TEST: Update negative byte size setting error msg
  Fix bwc in GeoDistanceQuery serialization (elastic#29325)
martijnvg added a commit that referenced this pull request Apr 5, 2018
* es/master: (68 commits)
  Allow using distance measure in the geo context precision (#29273)
  Disable failing query in QueryBuilderBWCIT.
  Fixed quote_field_suffix in query_string (#29332)
  Use fixture to test repository-url module (#29355)
  Remove undocumented action.master.force_local setting (#29351)
  Enhance error for out of bounds byte size settings (#29338)
  Fix QueryAnalyzerTests.
  Fix HasChildQueryBuilderTests to not use the `classic` similarity.
  [Docs] Correct javadoc of GetIndexRequest (#29364)
  Make TransportRankEvalAction members final
  Add awaits fix for a query analyzer test
  Check presence of multi-types before validating new mapping (#29316)
  Add awaits fix for HasChildQueryBuilderTests
  Remove silent batch mode from install plugin (#29359)
  Align cat thread pool info to thread pool config (#29195)
  Track Lucene operations in engine explicitly (#29357)
  Build: Fix Java9 MR build (#29312)
  Reindex: Fix error in delete-by-query rest spec (#29318)
  Improve similarity integration. (#29187)
  Fix some query extraction bugs. (#29283)
  ...
martijnvg added a commit that referenced this pull request Apr 5, 2018
* es/6.x: (68 commits)
  Add note to migration docs on silent batch mode (#29365)
  Allow using distance measure in the geo context precision (#29273)
  Disable failing query in QueryBuilderBWCIT.
  Improve similarity integration. (#29187)
  Fix some query extraction bugs. (#29283)
  Fixed quote_field_suffix in query_string (#29332)
  TEST: Update negative byte size setting error msg
  Fix bwc in GeoDistanceQuery serialization (#29325)
  Move testMappingConflictRootCause to different class
  Enhance error for out of bounds byte size settings (#29338)
  [Docs] Correct javadoc of GetIndexRequest (#29364)
  Check presence of multi-types before validating new mapping (#29316)
  Make TransportRankEvalAction members final
  Pass through script params in scripted metric agg (#29154)
  Remove silent batch mode from install plugin (#29359)
  Track Lucene operations in engine explicitly (#29357)
  Build: Fix Java9 MR build (#29312)
  Reindex: Fix error in delete-by-query rest spec (#29318)
  Move Nullable into core (#29341)
  [Docs] Correct experimental note formatting
  ...
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 5, 2018
* master: (110 commits)
  Remove undocumented action.master.force_local setting (elastic#29351)
  Enhance error for out of bounds byte size settings (elastic#29338)
  Fix QueryAnalyzerTests.
  Fix HasChildQueryBuilderTests to not use the `classic` similarity.
  [Docs] Correct javadoc of GetIndexRequest (elastic#29364)
  Make TransportRankEvalAction members final
  Add awaits fix for a query analyzer test
  Check presence of multi-types before validating new mapping (elastic#29316)
  Add awaits fix for HasChildQueryBuilderTests
  Remove silent batch mode from install plugin (elastic#29359)
  Align cat thread pool info to thread pool config (elastic#29195)
  Track Lucene operations in engine explicitly (elastic#29357)
  Build: Fix Java9 MR build (elastic#29312)
  Reindex: Fix error in delete-by-query rest spec (elastic#29318)
  Improve similarity integration. (elastic#29187)
  Fix some query extraction bugs. (elastic#29283)
  [Docs] Correct experimental note formatting
  Move Nullable into core (elastic#29341)
  [Docs] Update getting-started.asciidoc (elastic#29294)
  Elasticsearch 6.3.0 is now on Lucene 7.3.
  ...
jpountz added a commit that referenced this pull request Apr 6, 2018
While playing with the percolator I found two bugs:
 - Sometimes we set a min_should_match that is greater than the number of
   extractions. While this doesn't cause direct trouble, it does when the query
   is nested into a boolean query and the boolean query tries to compute the
   min_should_match for the entire query based on its own min_should_match and
   those of the sub queries. So I changed the code to throw an exception when
   min_should_match is greater than the number of extractions.
 - Boolean queries claim matches are verified when in fact they shouldn't. This
   is due to the fact that boolean queries assume that they are verified if all
   sub clauses are verified but things are more complex than that, eg.
   conjunctions that are nested in a disjunction or disjunctions that are nested
   in a conjunction can generally not be verified without running the query.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Percolator Reverse search: find queries that match a document v6.2.4 v6.3.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants