-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip performOnPrimary step when executing PublishCheckpoint. #6366
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Marc Handalian <handalm@amazon.com> Fix spotless. Signed-off-by: Marc Handalian <handalm@amazon.com>
94377bf
to
33edf00
Compare
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6366 +/- ##
============================================
- Coverage 70.76% 70.68% -0.08%
+ Complexity 59051 58953 -98
============================================
Files 4799 4799
Lines 282432 282438 +6
Branches 40716 40718 +2
============================================
- Hits 199856 199638 -218
- Misses 66147 66313 +166
- Partials 16429 16487 +58
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
.put("index.number_of_replicas", replicaCount) | ||
.put("index.refresh_interval", -1) | ||
).get(); | ||
prepareCreate(INDEX_NAME, Settings.builder().put(SETTING_NUMBER_OF_REPLICAS, replicaCount)).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
/** | ||
* This test verifies happy path when primary shard is relocated newly added node (target) in the cluster. Before | ||
* relocation and after relocation documents are indexed and documents are verified | ||
*/ | ||
@AwaitsFix(bugUrl = "https://github.com/opensearch-project/OpenSearch/issues/5669") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: For completenes perspective, do we need relocation tests for IMMEDIATE
& default
refresh policy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good idea, I've added randomness to select a refresh policy in all of the tests in this class other than the explicit wait_until test.
@@ -146,12 +146,11 @@ public void handleResponse(ReplicationResponse response) { | |||
timer.time() | |||
) | |||
); | |||
task.setPhase("finished"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this setPhase
call is removed ? If it is not useful, let's remove same during failure as well (on line 156)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not intentional, thanks for catching this, it is useful in tracking state in the ReplicationTask
.
- Add random refresh policy to reloation ITs. - add back finishing ReplicationTask. Signed-off-by: Marc Handalian <handalm@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @mch2 for fixing the bug and updating tests here.
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-6366-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 72314f75807f5eb72d7b178180dda72d92997dfd
# Push it to GitHub
git push --set-upstream origin backport/backport-6366-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x Then, create a pull request where the |
…rch-project#6366) * Skip performOnPrimary step when executing PublishCheckpoint. Signed-off-by: Marc Handalian <handalm@amazon.com> Fix spotless. Signed-off-by: Marc Handalian <handalm@amazon.com> * PR Feedback: - Add random refresh policy to reloation ITs. - add back finishing ReplicationTask. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com> (cherry picked from commit 72314f7)
…rch-project#6366) * Skip performOnPrimary step when executing PublishCheckpoint. Signed-off-by: Marc Handalian <handalm@amazon.com> Fix spotless. Signed-off-by: Marc Handalian <handalm@amazon.com> * PR Feedback: - Add random refresh policy to reloation ITs. - add back finishing ReplicationTask. Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com>
…6366 Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
…estRelocateWhileContinuouslyIndexingAndWaitingForRefresh (#6637) * Trigger Refresh on NRT Engine. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Revert changes made to PublishCheckpointAction in #6366 Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix failing unit test Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Force flush on new elected primary after relocation. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix failing unit test. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary assertions Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Adding tests. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Address comments Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix indentation. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> --------- Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
…estRelocateWhileContinuouslyIndexingAndWaitingForRefresh (#6637) * Trigger Refresh on NRT Engine. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Revert changes made to PublishCheckpointAction in #6366 Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix failing unit test Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Force flush on new elected primary after relocation. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix failing unit test. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary assertions Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Adding tests. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Address comments Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix indentation. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> --------- Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> (cherry picked from commit 1e5d913) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…estRelocateWhileContinuouslyIndexingAndWaitingForRefresh (#6637) (#6675) * Trigger Refresh on NRT Engine. * Revert changes made to PublishCheckpointAction in #6366 * Fix failing unit test * Force flush on new elected primary after relocation. * Fix failing unit test. * Remove unnecessary assertions * Adding tests. * Address comments * Fix indentation. --------- (cherry picked from commit 1e5d913) Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…estRelocateWhileContinuouslyIndexingAndWaitingForRefresh (opensearch-project#6637) * Trigger Refresh on NRT Engine. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Revert changes made to PublishCheckpointAction in opensearch-project#6366 Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix failing unit test Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Force flush on new elected primary after relocation. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix failing unit test. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary assertions Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Adding tests. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Address comments Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix indentation. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> --------- Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Mingshi Liu <mingshl@amazon.com>
Description
This change fixes broken SegmentReplicationRelocationITs and unmutes them. The cause of the flaky tests is WAIT_UNTIL requests blocking relocation of primary shards because replicas have not refreshed. The issue is that primary shards cannot publish their checkpoints because PublishCheckpointAction is a ReplicationOperation, that first hits the primary and acquires a permit. Given we are blocking operations, the permit is delayed and the request never succeeds. This fixes the issue by ensuring PublishCheckpoint requests are not sent to the primary, and only to replicas within the replication group directly.
Issues Resolved
closes #6065
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.