Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh #6464

Merged
merged 25 commits into from
Mar 10, 2023

Conversation

Rishikesh1159
Copy link
Member

@Rishikesh1159 Rishikesh1159 commented Feb 23, 2023

Description

This PR updates the wait_until refresh policy on replica shards with segment replication enabled. Here we are updating wait_until refresh listeners to wait until replica shards reaches a specific seqNo instead of translog location.

The wait_until refresh policy for replica and primary shards with document replication remains same as before which is based on translog location. Even the primary shards with segment replication enabled uses translog location for wait_until requests. Only change made is on replica shards with segment replication enabled.

Issues Resolved

#6045

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@@ -244,6 +244,7 @@ public void onReplicationDone(SegmentReplicationState state) {
runnable.run();
}
}
replicaShard.refresh("replication complete refresh");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rishikesh1159 Why is this extra refresh required? The reader should refresh before replication is marked completed.

@@ -822,8 +823,9 @@ public static Translog.Location performOnReplica(BulkShardRequest request, Index
}
assert operationResult != null : "operation result must never be null when primary response has no failure";
location = syncOperationResultOrThrow(operationResult, location);
maxSeqNo = response.getResponse().getSeqNo();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to keep track of ongoing max here as we iterate all the items, this will overwrite the value with each iteration. The last item in the list is not guaranteed to have the highest seqNo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. I will keep track of maxSeqNo here

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@Rishikesh1159 Rishikesh1159 changed the title Initial draft PR for wait_until with segrep [Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh Feb 28, 2023
@Rishikesh1159 Rishikesh1159 marked this pull request as ready for review February 28, 2023 17:25
@mch2
Copy link
Member

mch2 commented Mar 7, 2023

@Rishikesh1159 This is looking almost ready to me, some minor nits & comment.

Two things I think we need to follow up with here before this is safe to use with SR:

  1. I don't think we want to force refreshes on all replicas during async block, only the newly elected primary should force refresh. We need to understand the impact this would have with delaying primary term bumps in particular. This is not an issue when there is only a single replica, but with multiple replicas we will incorrectly release ongoing wait_until reqs on replicas during a failover event.
  2. We should still implement a cap on the amount of outstanding wait_until requests to avoid resource issues, but we won't be able to force refresh because that would break the read/write guarantee. I think we will need to set a hard limit on the amount of accepted wait_until requests with SR, rather than forcing a refresh when the limit hits. With [DISCUSS] Add back preference for searching _primaries or _replicas #6046 users would still have a path toward read/write (though I believe this is best effort) by prioritizing primary shards. To have full guarantee with Sr enabled without a cap on requests, we will need a streaming Indexing API.. cc / @nknize

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2023

Gradle Check (Jenkins) Run Completed with:

…orce refresh.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@Rishikesh1159
Copy link
Member Author

Rishikesh1159 commented Mar 9, 2023

  1. I don't think we want to force refreshes on all replicas during async block, only the newly elected primary should force refresh. We need to understand the impact this would have with delaying primary term bumps in particular. This is not an issue when there is only a single replica, but with multiple replicas we will incorrectly release ongoing wait_until reqs on replicas during a failover event.

@mch2 I checked the behaviour. Looks like in case of failover scenario every replica in the replication group gets a primary term bump. Inorder for primary term bump to complete on replica shards, the replica shard must release/fire all it's listeners, if listeners are not fired the replica will be stuck in blocking state and fail after 30min. Even if newly promoted primary shard sends new checkpoints to replicas, as the replicas are in blocking state they will not be able to receive a checkpoint. So, the only way out for us here is to release/fire all listeners during a primary term bump on replica.

We are doing our best effort here to give read after write consistency with wait_until requests. But this is not always guaranteed with wait_until requests, docs are not searchable immediately in case of segment replication. This is the comprimise we have to make with our current architecture. With segment replication for wait_until we can only give read(only on primary shard) after write guarantee.

  1. We should still implement a cap on the amount of outstanding wait_until requests to avoid resource issues, but we won't be able to force refresh because that would break the read/write guarantee. I think we will need to set a hard limit on the amount of accepted wait_until requests with SR, rather than forcing a refresh when the limit hits. With [DISCUSS] Add back preference for searching _primaries or _replicas #6046 users would still have a path toward read/write (though I believe this is best effort) by prioritizing primary shards. To have full guarantee with Sr enabled without a cap on requests, we will need a [streaming Indexing API.]

From the behaviour mentioned above, we limit the listeners that a replica shard can hold. Limit will be same as primary shards limit which is set in index settings. We are putting this limit to avoid a situation where shard can go down with more and more listeners piling on replicas with no limit/cap. With current implementation of segment replication we will break the wait_unti's read after write guarantee. We will be able to provide only read (only on primary) after write guarantee. As you said to give guarantee we might need Streaming Indexing API.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

Gradle Check (Jenkins) Run Completed with:

@@ -118,7 +118,8 @@ public void setupListeners() throws Exception {
() -> engine.refresh("too-many-listeners"),
logger,
threadPool.getThreadContext(),
refreshMetric
refreshMetric,
this::returnSeqNo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - () -> 10L instead of creating a function here.

Copy link
Member

@mch2 mch2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rishikesh1159 this LGTM. Lets please open an issue to document the changes to wait_until with SR enabled.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@Rishikesh1159 Rishikesh1159 added the backport 2.x Backport to 2.x branch label Mar 10, 2023
@Rishikesh1159 Rishikesh1159 merged commit e8a4210 into opensearch-project:main Mar 10, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 10, 2023
…rds with segment replication enabled to wait for replica refresh (#6464)

* Initial draft PR for wait_until with segrep

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Refactor code and fix test failures.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* add comments and fix tests.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Refactor code, address comments and fix test failures.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Aplly spotless check

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Adress comments and add integ test.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Address comments and fix failing tests.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Fixing failing test.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Remove unused code.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Addressing comments and refactoring

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Adding max refreshlisteners limit that a replica shard can hold and force refresh.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Changing assert message

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Fix call to release refresh listeners on replica shards.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Fix call to release refresh listeners on replica shards.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Address comments.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Fixing compile errors.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Spoltss Apply

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

---------

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
(cherry picked from commit e8a4210)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dreamer-89 added a commit to dreamer-89/OpenSearch that referenced this pull request Mar 11, 2023
…lica shards with segment replication enabled to wait for replica refresh (opensearch-project#6464)"

This reverts commit e8a4210.
dreamer-89 added a commit to dreamer-89/OpenSearch that referenced this pull request Mar 11, 2023
…lica shards with segment replication enabled to wait for replica refresh (opensearch-project#6464)"

This reverts commit e8a4210.

Signed-off-by: Suraj Singh <surajrider@gmail.com>
andrross pushed a commit that referenced this pull request Mar 11, 2023
#6622)

* Revert "[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh (#6464)"

This reverts commit e8a4210.

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add missing import statement

Signed-off-by: Suraj Singh <surajrider@gmail.com>

---------

Signed-off-by: Suraj Singh <surajrider@gmail.com>
mingshl pushed a commit to mingshl/OpenSearch-Mingshl that referenced this pull request Mar 24, 2023
…rds with segment replication enabled to wait for replica refresh (opensearch-project#6464)

* Initial draft PR for wait_until with segrep

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Refactor code and fix test failures.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* add comments and fix tests.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Refactor code, address comments and fix test failures.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Aplly spotless check

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Adress comments and add integ test.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Address comments and fix failing tests.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Fixing failing test.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Remove unused code.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Addressing comments and refactoring

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Adding max refreshlisteners limit that a replica shard can hold and force refresh.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Changing assert message

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Fix call to release refresh listeners on replica shards.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Fix call to release refresh listeners on replica shards.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Address comments.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Fixing compile errors.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Spoltss Apply

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

---------

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Signed-off-by: Mingshi Liu <mingshl@amazon.com>
mingshl pushed a commit to mingshl/OpenSearch-Mingshl that referenced this pull request Mar 24, 2023
opensearch-project#6622)

* Revert "[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh (opensearch-project#6464)"

This reverts commit e8a4210.

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add missing import statement

Signed-off-by: Suraj Singh <surajrider@gmail.com>

---------

Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Mingshi Liu <mingshl@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Segment Replication] Update RefreshPolicy.WAIT_UNTIL to wait for replica refresh.
4 participants