Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Light weight Transport action to verify local term before fetching cluster-state from remote #12252

Merged
merged 8 commits into from
Mar 20, 2024

Conversation

rajiv-kv
Copy link
Contributor

@rajiv-kv rajiv-kv commented Feb 8, 2024

Cluster State TermVersion Fetch

Description

If the local node has up-to-date cluster-state it can serve Read requests. This issue introduces a transport action to get the term and version of cluster-state from cluster-manager. If the retrieved data matches with local term and version, the subsequent call to retrieve cluster-state is avoided and the local cluster-state is used. This offloads the responsibility of serving read requests of cluster-data to local node there-by reducing the load on cluster-manager.

Related Issues

Resolves #12272

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Feb 8, 2024

Gradle Check (Jenkins) Run Completed with:

Copy link
Contributor

github-actions bot commented Feb 8, 2024

Compatibility status:

Checks if related components are compatible with change f54184f

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer.git]

Copy link
Contributor

@amkhar amkhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rajivkv-amz for adding this light weight transport !

Want to understand your view on name of the transport, it's actually state's term and version (cluster version can give different understand for someone).
Would StateTermVersionAction make sense ?

@amkhar
Copy link
Contributor

amkhar commented Feb 8, 2024

@rajivkv-amz
Assuming you'll add tests later (as this is first draft), could please check why gradle check is failing ?

@peternied peternied marked this pull request as draft February 8, 2024 18:27
@peternied peternied changed the title [Draft] Light weight Transport action to verify local term before fetching cluster-state from remote Light weight Transport action to verify local term before fetching cluster-state from remote Feb 8, 2024
Copy link
Contributor

❕ Gradle check result for cc737fe: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.classMethod
      1 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.testResizeQueueDown

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

…fetch of cluster-state

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
…on details

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
… case where the local node has the latest cluster-state

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
…check

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
@rajiv-kv rajiv-kv force-pushed the 2.11.0_term_check branch from cc737fe to f54184f Compare March 20, 2024 16:21
Copy link
Contributor

❌ Gradle check result for f54184f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@peternied
Copy link
Member

New test failure

org.opensearch.snapshots.DeleteSnapshotIT.testRemoteStoreCleanupForDeletedIndex

@rajiv-kv please follow the reporting workflow for flaky tests on that failure and the ones below

Flaky Tests

OpenSearch has a very large test suite with long running, often failing (flaky), integration tests. Such individual tests are labelled as Flaky Random Test Failure. Your help is wanted fixing these!

If you encounter a build/test failure in CI that is unrelated to the change in your pull request, it may be a known flaky test, or a new test failure.

  • Follow failed CI links, and locate the failing test(s).
  • Copy-paste the failure into a comment of your PR.
  • Search through issues using the name of the failed test for whether this is a known flaky test.
  • If an existing issue is found, paste a link to the known issue in a comment to your PR.
  • If no existing issue is found, open one.
  • Retry CI via the GitHub UX or by pushing an update to your PR.

@rajiv-kv
Copy link
Contributor Author

New test failure
org.opensearch.snapshots.DeleteSnapshotIT.testRemoteStoreCleanupForDeletedIndex

@rajiv-kv please follow the reporting workflow for flaky tests on that failure and the ones below

Flaky Tests
OpenSearch has a very large test suite with long running, often failing (flaky), integration tests. Such individual tests are labelled as Flaky Random Test Failure. Your help is wanted fixing these!
If you encounter a build/test failure in CI that is unrelated to the change in your pull request, it may be a known flaky test, or a new test failure.

  • Follow failed CI links, and locate the failing test(s).
  • Copy-paste the failure into a comment of your PR.
  • Search through issues using the name of the failed test for whether this is a known flaky test.
  • If an existing issue is found, paste a link to the known issue in a comment to your PR.
  • If no existing issue is found, open one.
  • Retry CI via the GitHub UX or by pushing an update to your PR.

Ack !
org.opensearch.snapshots.DeleteSnapshotIT.testRemoteStoreCleanupForDeletedIndex issue

Copy link
Contributor

✅ Gradle check result for f54184f: SUCCESS

@peternied peternied added the backport 2.x Backport to 2.x branch label Mar 20, 2024
@peternied peternied merged commit 7ad3017 into opensearch-project:main Mar 20, 2024
33 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 20, 2024
…uster-state from remote (#12252)

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
(cherry picked from commit 7ad3017)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
rajiv-kv added a commit to rajiv-kv/OpenSearch that referenced this pull request Mar 21, 2024
…uster-state from remote (opensearch-project#12252)

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
rajiv-kv added a commit to rajiv-kv/OpenSearch that referenced this pull request Mar 21, 2024
…uster-state from remote (opensearch-project#12252)

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
shwetathareja pushed a commit that referenced this pull request Mar 21, 2024
…uster-state from remote (#12252) (#12824)

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
shwetathareja pushed a commit that referenced this pull request Mar 21, 2024
…uster-state from remote (#12252) (#12825)

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…uster-state from remote (opensearch-project#12252)

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch Cluster Manager enhancement Enhancement or improvement to existing feature or request v2.13.0 Issues and PRs related to version 2.13.0
Projects
Status: ✅ Done
6 participants