[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh #6464

Rishikesh1159 · 2023-02-23T18:29:25Z

Description

This PR updates the wait_until refresh policy on replica shards with segment replication enabled. Here we are updating wait_until refresh listeners to wait until replica shards reaches a specific seqNo instead of translog location.

The wait_until refresh policy for replica and primary shards with document replication remains same as before which is based on translog location. Even the primary shards with segment replication enabled uses translog location for wait_until requests. Only change made is on replica shards with segment replication enabled.

Issues Resolved

#6045

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff
Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

mch2 · 2023-02-23T18:44:37Z

server/src/main/java/org/opensearch/indices/replication/SegmentReplicationTargetService.java

@@ -244,6 +244,7 @@ public void onReplicationDone(SegmentReplicationState state) {
                                runnable.run();
                            }
                        }
+                        replicaShard.refresh("replication complete refresh");


@Rishikesh1159 Why is this extra refresh required? The reader should refresh before replication is marked completed.

mch2 · 2023-02-23T18:51:23Z

server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java

@@ -822,8 +823,9 @@ public static Translog.Location performOnReplica(BulkShardRequest request, Index
            }
            assert operationResult != null : "operation result must never be null when primary response has no failure";
            location = syncOperationResultOrThrow(operationResult, location);
+            maxSeqNo = response.getResponse().getSeqNo();


we need to keep track of ongoing max here as we iterate all the items, this will overwrite the value with each iteration. The last item in the list is not guaranteed to have the highest seqNo.

makes sense. I will keep track of maxSeqNo here

github-actions · 2023-02-23T18:53:13Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/11646/
CommitID: 055f225
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-02-24T01:26:05Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/11686/
CommitID: 055f225
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-02-24T06:57:12Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/11691/
CommitID: 36f3851
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

into wait_until

github-actions · 2023-02-28T16:46:13Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/11804/
CommitID: c03fa8f
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-02-28T16:52:27Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/11805/
CommitID: 666dd0b
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

mch2 · 2023-03-07T20:04:49Z

@Rishikesh1159 This is looking almost ready to me, some minor nits & comment.

Two things I think we need to follow up with here before this is safe to use with SR:

I don't think we want to force refreshes on all replicas during async block, only the newly elected primary should force refresh. We need to understand the impact this would have with delaying primary term bumps in particular. This is not an issue when there is only a single replica, but with multiple replicas we will incorrectly release ongoing wait_until reqs on replicas during a failover event.
We should still implement a cap on the amount of outstanding wait_until requests to avoid resource issues, but we won't be able to force refresh because that would break the read/write guarantee. I think we will need to set a hard limit on the amount of accepted wait_until requests with SR, rather than forcing a refresh when the limit hits. With [DISCUSS] Add back preference for searching _primaries or _replicas #6046 users would still have a path toward read/write (though I believe this is best effort) by prioritizing primary shards. To have full guarantee with Sr enabled without a cap on requests, we will need a streaming Indexing API.. cc / @nknize

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

github-actions · 2023-03-07T21:25:18Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/12094/
CommitID: b33aca2

…orce refresh. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Rishikesh1159 · 2023-03-09T01:13:00Z

I don't think we want to force refreshes on all replicas during async block, only the newly elected primary should force refresh. We need to understand the impact this would have with delaying primary term bumps in particular. This is not an issue when there is only a single replica, but with multiple replicas we will incorrectly release ongoing wait_until reqs on replicas during a failover event.

@mch2 I checked the behaviour. Looks like in case of failover scenario every replica in the replication group gets a primary term bump. Inorder for primary term bump to complete on replica shards, the replica shard must release/fire all it's listeners, if listeners are not fired the replica will be stuck in blocking state and fail after 30min. Even if newly promoted primary shard sends new checkpoints to replicas, as the replicas are in blocking state they will not be able to receive a checkpoint. So, the only way out for us here is to release/fire all listeners during a primary term bump on replica.

We are doing our best effort here to give read after write consistency with wait_until requests. But this is not always guaranteed with wait_until requests, docs are not searchable immediately in case of segment replication. This is the comprimise we have to make with our current architecture. With segment replication for wait_until we can only give read(only on primary shard) after write guarantee.

We should still implement a cap on the amount of outstanding wait_until requests to avoid resource issues, but we won't be able to force refresh because that would break the read/write guarantee. I think we will need to set a hard limit on the amount of accepted wait_until requests with SR, rather than forcing a refresh when the limit hits. With [DISCUSS] Add back preference for searching _primaries or _replicas #6046 users would still have a path toward read/write (though I believe this is best effort) by prioritizing primary shards. To have full guarantee with Sr enabled without a cap on requests, we will need a [streaming Indexing API.]

From the behaviour mentioned above, we limit the listeners that a replica shard can hold. Limit will be same as primary shards limit which is set in index settings. We are putting this limit to avoid a situation where shard can go down with more and more listeners piling on replicas with no limit/cap. With current implementation of segment replication we will break the wait_unti's read after write guarantee. We will be able to provide only read (only on primary) after write guarantee. As you said to give guarantee we might need Streaming Indexing API.

github-actions · 2023-03-09T01:19:29Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/12146/
CommitID: 4b47039

github-actions · 2023-03-09T01:32:43Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/12147/
CommitID: 6bd0b3d

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

github-actions · 2023-03-09T02:17:00Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/12150/
CommitID: d62aaca
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

github-actions · 2023-03-09T04:07:03Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/12154/
CommitID: 0a79735

mch2 · 2023-03-09T23:35:30Z

server/src/test/java/org/opensearch/index/shard/RefreshListenersTests.java

@@ -118,7 +118,8 @@ public void setupListeners() throws Exception {
            () -> engine.refresh("too-many-listeners"),
            logger,
            threadPool.getThreadContext(),
-            refreshMetric
+            refreshMetric,
+            this::returnSeqNo


nit - () -> 10L instead of creating a function here.

mch2

@Rishikesh1159 this LGTM. Lets please open an issue to document the changes to wait_until with SR enabled.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

github-actions · 2023-03-10T00:09:17Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/12206/
CommitID: f285ba3
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

github-actions · 2023-03-10T00:27:56Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/12207/
CommitID: 83a1cac
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-03-10T00:58:22Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/12208/
CommitID: c0721b0

…rds with segment replication enabled to wait for replica refresh (#6464) * Initial draft PR for wait_until with segrep Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Refactor code and fix test failures. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * add comments and fix tests. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Refactor code, address comments and fix test failures. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Aplly spotless check Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Adress comments and add integ test. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Address comments and fix failing tests. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fixing failing test. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unused code. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Addressing comments and refactoring Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Adding max refreshlisteners limit that a replica shard can hold and force refresh. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Changing assert message Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix call to release refresh listeners on replica shards. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix call to release refresh listeners on replica shards. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Address comments. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fixing compile errors. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Spoltss Apply Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> --------- Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> (cherry picked from commit e8a4210) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…lica shards with segment replication enabled to wait for replica refresh (opensearch-project#6464)" This reverts commit e8a4210.

…lica shards with segment replication enabled to wait for replica refresh (opensearch-project#6464)" This reverts commit e8a4210. Signed-off-by: Suraj Singh <surajrider@gmail.com>

#6622) * Revert "[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh (#6464)" This reverts commit e8a4210. Signed-off-by: Suraj Singh <surajrider@gmail.com> * Add missing import statement Signed-off-by: Suraj Singh <surajrider@gmail.com> --------- Signed-off-by: Suraj Singh <surajrider@gmail.com>

…rds with segment replication enabled to wait for replica refresh (opensearch-project#6464) * Initial draft PR for wait_until with segrep Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Refactor code and fix test failures. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * add comments and fix tests. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Refactor code, address comments and fix test failures. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Aplly spotless check Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Adress comments and add integ test. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Address comments and fix failing tests. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fixing failing test. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unused code. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Addressing comments and refactoring Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Adding max refreshlisteners limit that a replica shard can hold and force refresh. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Changing assert message Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix call to release refresh listeners on replica shards. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix call to release refresh listeners on replica shards. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Address comments. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fixing compile errors. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Spoltss Apply Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> --------- Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Mingshi Liu <mingshl@amazon.com>

opensearch-project#6622) * Revert "[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh (opensearch-project#6464)" This reverts commit e8a4210. Signed-off-by: Suraj Singh <surajrider@gmail.com> * Add missing import statement Signed-off-by: Suraj Singh <surajrider@gmail.com> --------- Signed-off-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: Mingshi Liu <mingshl@amazon.com>

Initial draft PR for wait_until with segrep

055f225

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Rishikesh1159 added the skip-changelog label Feb 23, 2023

mch2 reviewed Feb 23, 2023

View reviewed changes

Merge branch 'opensearch-project:main' into wait_until

36f3851

Rishikesh1159 and others added 3 commits February 28, 2023 16:23

Refactor code and fix test failures.

c1a4c87

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Merge branch 'opensearch-project:main' into wait_until

c03fa8f

Merge branch 'wait_until' of https://github.com/Rishikesh1159/OpenSearch

666dd0b

into wait_until

add comments and fix tests.

3798512

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Rishikesh1159 changed the title ~~Initial draft PR for wait_until with segrep~~ [Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh Feb 28, 2023

Rishikesh1159 marked this pull request as ready for review February 28, 2023 17:25

Rishikesh1159 requested review from reta, anasalkouz, andrross, Bukhtawar, CEHENKLE, dblock, gbbafna, setiah, kartg, kotwanikunal, nknize, owaiskazi19, adnapibar and ryanbogan as code owners February 28, 2023 17:25

Addressing comments and refactoring

b33aca2

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Rishikesh1159 added 2 commits March 9, 2023 00:49

Adding max refreshlisteners limit that a replica shard can hold and f…

4b47039

…orce refresh. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Changing assert message

6bd0b3d

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Fix call to release refresh listeners on replica shards.

d62aaca

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Fix call to release refresh listeners on replica shards.

0a79735

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

mch2 reviewed Mar 9, 2023

View reviewed changes

mch2 approved these changes Mar 9, 2023

View reviewed changes

Rishikesh1159 added 2 commits March 9, 2023 23:40

Address comments.

f285ba3

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Fixing compile errors.

83a1cac

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Spoltss Apply

c0721b0

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Rishikesh1159 added the backport 2.x Backport to 2.x branch label Mar 10, 2023

Rishikesh1159 merged commit e8a4210 into opensearch-project:main Mar 10, 2023

opensearch-trigger-bot bot mentioned this pull request Mar 10, 2023

[Backport 2.x] [Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh #6612

Closed

dreamer-89 mentioned this pull request Mar 11, 2023

Revert "[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for rep… #6622

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh #6464

[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh #6464

Rishikesh1159 commented Feb 23, 2023 •

edited

Loading

mch2 Feb 23, 2023

mch2 Feb 23, 2023

Rishikesh1159 Feb 23, 2023

github-actions bot commented Feb 23, 2023

github-actions bot commented Feb 24, 2023

github-actions bot commented Feb 24, 2023

github-actions bot commented Feb 28, 2023

github-actions bot commented Feb 28, 2023

mch2 commented Mar 7, 2023 •

edited

Loading

github-actions bot commented Mar 7, 2023

Rishikesh1159 commented Mar 9, 2023 •

edited

Loading

github-actions bot commented Mar 9, 2023

github-actions bot commented Mar 9, 2023

github-actions bot commented Mar 9, 2023

github-actions bot commented Mar 9, 2023

mch2 Mar 9, 2023

mch2 left a comment

github-actions bot commented Mar 10, 2023

github-actions bot commented Mar 10, 2023

github-actions bot commented Mar 10, 2023

[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh #6464

[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh #6464

Conversation

Rishikesh1159 commented Feb 23, 2023 • edited Loading

Description

Issues Resolved

Check List

mch2 Feb 23, 2023

Choose a reason for hiding this comment

mch2 Feb 23, 2023

Choose a reason for hiding this comment

Rishikesh1159 Feb 23, 2023

Choose a reason for hiding this comment

github-actions bot commented Feb 23, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Feb 24, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Feb 24, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Feb 28, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Feb 28, 2023

Gradle Check (Jenkins) Run Completed with:

mch2 commented Mar 7, 2023 • edited Loading

github-actions bot commented Mar 7, 2023

Gradle Check (Jenkins) Run Completed with:

Rishikesh1159 commented Mar 9, 2023 • edited Loading

github-actions bot commented Mar 9, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Mar 9, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Mar 9, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Mar 9, 2023

Gradle Check (Jenkins) Run Completed with:

mch2 Mar 9, 2023

Choose a reason for hiding this comment

mch2 left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 10, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Mar 10, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Mar 10, 2023

Gradle Check (Jenkins) Run Completed with:

Rishikesh1159 commented Feb 23, 2023 •

edited

Loading

mch2 commented Mar 7, 2023 •

edited

Loading

Rishikesh1159 commented Mar 9, 2023 •

edited

Loading