Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: reset the filter built at segment level for date histogram optimization #12267

Merged
merged 6 commits into from
Feb 9, 2024

Conversation

bowenlan-amzn
Copy link
Member

@bowenlan-amzn bowenlan-amzn commented Feb 9, 2024

Description

#12073 handle the segment level match all scenario and build fast filters at segment level. However the built filter from a segment is saved in the FastFilterContext lived through the shard search and stop the subsequent segments from computing their filters. This change reset the filters built at segment level and add integration tests to cover that.

The reason previous flaky test catches this is pretty interesting. It's flaky because it happens at this scenario:
first search hit one shard (s1_1), second search hit another shard (s1_2)
The segments on these 2 shards are different, and the indexRandom method from test framework introduces random dummy document deletions which will stop the fast filter optimization we added in. So imagine s1_1 has multiple segments without deletion and gets the optimization kicks in while the segment level filters never change. s1_2 segments have deletion and optimization never kicks in and fall back to old code path. In the end, the results from these 2 searchs become different.

Related Issues

Resolves #12268

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Copy link
Contributor

github-actions bot commented Feb 9, 2024

Compatibility status:

Checks if related components are compatible with change 47f53be

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/performance-analyzer.git]

Copy link
Contributor

github-actions bot commented Feb 9, 2024

❕ Gradle check result for 096d7fa: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing
      1 org.opensearch.http.SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase
      1 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Feb 9, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (708d4df) 71.37% compared to head (47f53be) 71.47%.

Files Patch % Lines
...h/aggregations/bucket/FastFilterRewriteHelper.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12267      +/-   ##
============================================
+ Coverage     71.37%   71.47%   +0.09%     
- Complexity    59781    59834      +53     
============================================
  Files          4959     4959              
  Lines        281116   281110       -6     
  Branches      40857    40855       -2     
============================================
+ Hits         200639   200911     +272     
+ Misses        63818    63498     -320     
- Partials      16659    16701      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@msfroh
Copy link
Collaborator

msfroh commented Feb 9, 2024

Can you please clarify more why / how the optimization can "leak" from one copy of a shard to another?

The optimization should be computed local to the shard that it's executing on. If we have any state cached beyond the current shard-local IndexReader it's a problem.

In the example you cite above, once the deletions have refreshed on s1_1, the results should converge again. Again, is this related to the fast filter optimization or is it just the general eventual consistency that comes from having multiple copies of the data that refresh on their own schedule?

@bowenlan-amzn
Copy link
Member Author

bowenlan-amzn commented Feb 9, 2024

Can you please clarify more why / how the optimization can "leak" from one copy of a shard to another?

The optimization should be computed local to the shard that it's executing on. If we have any state cached beyond the current shard-local IndexReader it's a problem.

In the example you cite above, once the deletions have refreshed on s1_1, the results should converge again. Again, is this related to the fast filter optimization or is it just the general eventual consistency that comes from having multiple copies of the data that refresh on their own schedule?

Seems this sentence causes confusion

the built filter is cached at shard level FastFilterContext which stop the subsequent match all segments from computing their filters

Not saying optimization leak between shard, but between segments.
I think probably "cached at shard level" is a phrase we normally use for query cache, which causes this confusion, but here I was actually trying to indicate that for a shard search, FastFilterContext will stay within the Aggregator. And filters is a field in FastFilterContext and is supposed to be reused when computing bucket counts for each segment. So once filters is built and not null, every subsequent segments will use that to compute bucket.
The problem is filters built from a segment should not be reused by other segments, so this PR resets the filters if it's built from a segment.

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Copy link
Contributor

github-actions bot commented Feb 9, 2024

❌ Gradle check result for 7682fe7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Feb 9, 2024

❌ Gradle check result for 47f53be: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Feb 9, 2024

❕ Gradle check result for 47f53be: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@msfroh msfroh merged commit 197231f into opensearch-project:main Feb 9, 2024
30 checks passed
@msfroh msfroh added backport 2.x Backport to 2.x branch backport 2.12 Backport to 2.12 branch labels Feb 9, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-12267-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 197231fe06e35e0b0e9adeaa33b7f0dbde7c0031
# Push it to GitHub
git push --set-upstream origin backport/backport-12267-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-12267-to-2.x.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.12 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.12 2.12
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.12
# Create a new branch
git switch --create backport/backport-12267-to-2.12
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 197231fe06e35e0b0e9adeaa33b7f0dbde7c0031
# Push it to GitHub
git push --set-upstream origin backport/backport-12267-to-2.12
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.12

Then, create a pull request where the base branch is 2.12 and the compare/head branch is backport/backport-12267-to-2.12.

bowenlan-amzn added a commit to bowenlan-amzn/OpenSearch that referenced this pull request Feb 9, 2024
…ization (opensearch-project#12267)


---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
msfroh pushed a commit that referenced this pull request Feb 9, 2024
…level (#12279)

* Apply fast date histogram optimization at the segment level (#12073)

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
(cherry picked from commit 9a0a69f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix: reset the filter built at segment level for date histogram optimization (#12267)


---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 10, 2024
…level (#12279)

* Apply fast date histogram optimization at the segment level (#12073)

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
(cherry picked from commit 9a0a69f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix: reset the filter built at segment level for date histogram optimization (#12267)

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 25c2fde)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
msfroh pushed a commit that referenced this pull request Feb 10, 2024
…level (#12279) (#12284)

* Apply fast date histogram optimization at the segment level (#12073)

---------


(cherry picked from commit 9a0a69f)


* Fix: reset the filter built at segment level for date histogram optimization (#12267)

---------



---------




(cherry picked from commit 25c2fde)

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Mar 1, 2024
…ization (opensearch-project#12267)


---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
…ization (opensearch-project#12267)


---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…ization (opensearch-project#12267)

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
@bowenlan-amzn bowenlan-amzn deleted the flaky branch May 3, 2024 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport 2.12 Backport to 2.12 branch backport-failed bug Something isn't working Other skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] MinDocCountIT flaky test
2 participants