Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Implement Segment replication Backpressure #6669

Merged
merged 2 commits into from
Mar 15, 2023

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Mar 15, 2023

Manual backport of #6563 to 2.x.

* Add Segment Replication backpressure.

This PR introduces new mechanisms to keep track of the current replicas within a replication group and apply backpressure if they fall too far behind.

Writes will be rejected under the following conditions:

1. More than half (default setting) of the replication group is 'stale'.  Defined by setting MAX_ALLOWED_STALE_SHARDS.
2. A replica is stale if it is behind more than MAX_INDEXING_CHECKPOINTS, default 4 AND its current replication lag is over
MAX_REPLICATION_TIME_SETTING, default 5 minutes.

This PR intentionally implements rejections only for index operations,
allowing other TransportWriteActions to succeed, TransportResyncReplicationAction and RetentionLeaseSyncAction.
Blocking these requests will fail recoveries as new nodes are added.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add changelog

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix test class to match naming conventions.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Change setting keys to remove index scope.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member

@mch2 : Compilation failure, need fixes

> Task :server:compileJava FAILED

Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=pit/10_basic/Delete all}

@codecov-commenter
Copy link

Codecov Report

Merging #6669 (8852f3e) into 2.x (bb901f5) will increase coverage by 0.06%.
The diff coverage is 61.50%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##                2.x    #6669      +/-   ##
============================================
+ Coverage     70.36%   70.43%   +0.06%     
- Complexity    59339    59448     +109     
============================================
  Files          4799     4804       +5     
  Lines        285004   285245     +241     
  Branches      41436    41462      +26     
============================================
+ Hits         200548   200906     +358     
+ Misses        67674    67570     -104     
+ Partials      16782    16769      -13     
Impacted Files Coverage Δ
...rg/opensearch/common/settings/ClusterSettings.java 92.30% <ø> (ø)
...s/replication/SegmentReplicationTargetService.java 48.40% <0.00%> (-0.63%) ⬇️
...eplication/checkpoint/PublishCheckpointAction.java 23.80% <0.00%> (+0.37%) ⬆️
.../org/opensearch/index/SegmentReplicationStats.java 15.38% <15.38%> (ø)
...nsearch/index/SegmentReplicationPerGroupStats.java 28.57% <28.57%> (ø)
...opensearch/index/SegmentReplicationShardStats.java 32.35% <32.35%> (ø)
...ensearch/action/bulk/TransportShardBulkAction.java 76.10% <50.00%> (-0.02%) ⬇️
...org/opensearch/index/seqno/ReplicationTracker.java 67.69% <70.68%> (-0.21%) ⬇️
.../replication/checkpoint/ReplicationCheckpoint.java 63.04% <75.00%> (+9.19%) ⬆️
...in/java/org/opensearch/index/shard/IndexShard.java 68.65% <81.25%> (-1.04%) ⬇️
... and 7 more

... and 461 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@dreamer-89 dreamer-89 merged commit 6a8595e into opensearch-project:2.x Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants