-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] [Segment Replication] Fix isAheadOf logic for ReplicationCheckpoint comparison #4112
Merged
dreamer-89
merged 1 commit into
opensearch-project:main
from
dreamer-89:replication_checkpoint_ahead_of
Aug 3, 2022
Merged
[Bug] [Segment Replication] Fix isAheadOf logic for ReplicationCheckpoint comparison #4112
dreamer-89
merged 1 commit into
opensearch-project:main
from
dreamer-89:replication_checkpoint_ahead_of
Aug 3, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ons are equal Signed-off-by: Suraj Singh <surajrider@gmail.com>
kartg
approved these changes
Aug 3, 2022
Rishikesh1159
approved these changes
Aug 3, 2022
Gradle Check (Jenkins) Run Completed with:
|
|
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #4112 +/- ##
============================================
- Coverage 70.61% 70.53% -0.08%
- Complexity 56983 57004 +21
============================================
Files 4595 4595
Lines 274234 274236 +2
Branches 40181 40183 +2
============================================
- Hits 193638 193427 -211
- Misses 64342 64564 +222
+ Partials 16254 16245 -9
Help us with your feedback. Take ten seconds to tell us how you rate us. |
Rishikesh1159
pushed a commit
to Rishikesh1159/OpenSearch
that referenced
this pull request
Aug 17, 2022
…project#4112) Signed-off-by: Suraj Singh <surajrider@gmail.com>
Rishikesh1159
added a commit
that referenced
this pull request
Aug 17, 2022
…aining segment replication changes (#4243) * [Segment Replication] Checkpoint Replay on Replica Shard (#3658) * Adding Latest Recevied checkpoint, replay checkpoint logic along with tests Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * [Segment Replication] Wire up segment replication with peer recovery and add ITs. (#3743) * Add null check when computing max segment version. With segment replication enabled it is possible Lucene does not set the SegmentInfos min segment version, leaving the default value as null. Signed-off-by: Marc Handalian <handalm@amazon.com> * Update peer recovery to set the translogUUID of replicas to the UUID generated on the primary. This change updates the UUID when the translog is created to the value stored in the passed segment userdata. This is to ensure during failover scenarios that the replica can be promoted and not have a uuid mismatch with the value stored in user data. Signed-off-by: Marc Handalian <handalm@amazon.com> * Wire up Segment Replication under the feature flag. This PR wires up segment replication and adds some initial integration tests. Signed-off-by: Marc Handalian <handalm@amazon.com> * Add test to ensure replicas use primary translog uuid with segrep. Signed-off-by: Marc Handalian <handalm@amazon.com> * Update SegmentReplicationIT to assert previous commit points are valid and SegmentInfos can be built. Fix nitpicks in PR feedback. Signed-off-by: Marc Handalian <handalm@amazon.com> * Fix test with Assert.fail to include a message. Signed-off-by: Marc Handalian <handalm@amazon.com> * Disable shard idle with segment replication. (#4118) This change disables shard idle when segment replication is enabled. Primary shards will only push out new segments on refresh, we do not want to block this based on search behavior. Replicas will only refresh on an externally provided SegmentInfos, so we do not want search requests to hang waiting for a refresh. Signed-off-by: Marc Handalian <handalm@amazon.com> * Fix isAheadOf logic for ReplicationCheckpoint comparison (#4112) Signed-off-by: Suraj Singh <surajrider@gmail.com> * Fix waitUntil refresh policy for segrep enabled indices. (#4137) Signed-off-by: Marc Handalian <handalm@amazon.com> * Add IndexShard#getLatestReplicationCheckpoint behind segrep enable feature flag (#4163) * Add IndexShard#getLatestReplicationCheckpoint behind segrep enable feature flag Signed-off-by: Suraj Singh <surajrider@gmail.com> * Address review comment. Move tests to SegmentReplicationIndexShardTests Signed-off-by: Suraj Singh <surajrider@gmail.com> * Add segrep enbaled index settings in TargetServiceTests, SourceHandlerTests Signed-off-by: Suraj Singh <surajrider@gmail.com> * [Segment Replication] Fix OngoingSegmentReplications to key by allocation ID instead of DiscoveryNode. (#4182) * Fix OngoingSegmentReplications to key by allocation ID instead of DiscoveryNode. This change fixes segrep to work with multiple shards per node by keying ongoing replications on allocation ID. This also updates cancel methods to ensure state is properly cleared on shard cancel. Signed-off-by: Marc Handalian <handalm@amazon.com> * Clean up cancel methods. Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com> * [Bug] [Segment Replication] Update store metadata recovery diff logic to ignore missing files causing exception (#4185) * Update Store for segment replication dif Signed-off-by: Poojita Raj <poojiraj@amazon.com> Signed-off-by: Poojita Raj <poojiraj@amazon.com> * Update recoveryDiff logic to ingore missing files causing exception on replica during copy Signed-off-by: Suraj Singh <surajrider@gmail.com> * Address review comments Signed-off-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: Poojita Raj <poojiraj@amazon.com> Signed-off-by: Suraj Singh <surajrider@gmail.com> Co-authored-by: Poojita Raj <poojiraj@amazon.com> * [Segment Replication] Adding PrimaryMode check before publishing checkpoint and processing a received checkpoint. (#4157) * Adding PrimaryMode check before publishing checkpoint. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Applying spotless check Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Moving segrep specific tests to SegmentReplicationIndexShardTests. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Adding logic and tests for rejecting checkpoints if shard is in PrimaryMode. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Applying ./gradlew :server:spotlessApply. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Applying ./gradlew :server:spotlessApply Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Changing log level to warn in shouldProcessCheckpoint() of IndexShard.java class. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Removing unnecessary lazy logging in shouldProcessCheckpoint(). Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * [Segment Replication] Wait for documents to replicate to replica shards (#4236) * [Segment Replication] Add thread sleep to account for replica lag in delete operations test Signed-off-by: Suraj Singh <surajrider@gmail.com> * Address review comments, assertBusy on doc count rather than sleep Signed-off-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: Suraj Singh <surajrider@gmail.com> * Segment Replication - Remove unnecessary call to markAllocationIdAsInSync. (#4224) This PR Removes an unnecessary call to markAllocationIdAsInSync on the primary shard when replication events complete. Recovery will manage this initial call. Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com> * Segment Replication - Add additional unit tests for update & delete ops. (#4237) * Segment Replication - Add additional unit tests for update & delete operations. Signed-off-by: Marc Handalian <handalm@amazon.com> * Fix spotless. Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: Poojita Raj <poojiraj@amazon.com> Co-authored-by: Marc Handalian <handalm@amazon.com> Co-authored-by: Suraj Singh <surajrider@gmail.com> Co-authored-by: Poojita Raj <poojiraj@amazon.com>
2 tasks
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-4112-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7af8e37f81c0a124712a8cf86dab973df26f906f
# Push it to GitHub
git push --set-upstream origin backport/backport-4112-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Suraj Singh surajrider@gmail.com
Description
Coming from #3988 which relies on ReplicationCheckpoint ordering for correctly identifying the furthest ahead replica. The existing
ReplicationCheckpoint#isAheadOf
is not correct as it uses||
b/wsegmentInfosVersion
(1st) & PrimaryTerm(2nd) terms; instead of relying on 2nd term (PrimaryTerm field)ONLY WHEN
first field (segmentInfosVersion
) is equal.This PR also fixes, the comparison order i.e. PrimaryTerm first followed by segmentInfosVersion; this is to prioritize active primary over stale.
Issues Related
#3988
Related
#4041
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.