Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Segment replication Backpressure #6563

Merged
merged 5 commits into from
Mar 15, 2023
Merged

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Mar 7, 2023

Description

This PR is a re-cut of #6520 that includes implementing backpressure and removes the addition of these metrics to NodeStats API. This is to make it easier to see in this PR how this will be used to implement pressure. The metrics additions will be in a separate change.

This PR adds backpressure for index operations when Segment Replication is enabled.

This PR implements backpressure mechanisms for segment replication to prevent lagging
replicas from falling too far behind. Writes will be rejected under the following conditions:

  1. More than half (default setting) of the replication group is 'stale'. Defined by setting MAX_ALLOWED_STALE_SHARDS.
  2. A replica is stale if it is behind more than MAX_INDEXING_CHECKPOINTS, default 4 AND its current replication lag is over
    MAX_REPLICATION_TIME_SETTING, default 5 minutes.

This PR intentionally implements rejections only for index operations,
allowing other TransportWriteActions to succeed, TransportResyncReplicationAction and RetentionLeaseSyncAction.
Blocking these requests will fail recoveries as new nodes are added.

Issues Resolved

#4478

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions

This comment was marked as outdated.

@mch2 mch2 force-pushed the bpimpl branch 2 times, most recently from 195543a to 976effd Compare March 7, 2023 07:07
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@mch2 mch2 force-pushed the bpimpl branch 2 times, most recently from 33c5011 to 2bd0f7c Compare March 12, 2023 20:09
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Mar 12, 2023

Codecov Report

Merging #6563 (33198a1) into main (73a2279) will decrease coverage by 0.55%.
The diff coverage is 61.50%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6563      +/-   ##
============================================
- Coverage     71.22%   70.68%   -0.55%     
+ Complexity    59521    59129     -392     
============================================
  Files          4803     4808       +5     
  Lines        283208   283449     +241     
  Branches      40842    40868      +26     
============================================
- Hits         201712   200348    -1364     
- Misses        65266    66608    +1342     
- Partials      16230    16493     +263     
Impacted Files Coverage Δ
...rg/opensearch/common/settings/ClusterSettings.java 92.30% <ø> (ø)
...s/replication/SegmentReplicationTargetService.java 48.40% <0.00%> (-0.63%) ⬇️
...eplication/checkpoint/PublishCheckpointAction.java 23.80% <0.00%> (+0.37%) ⬆️
.../org/opensearch/index/SegmentReplicationStats.java 15.38% <15.38%> (ø)
...nsearch/index/SegmentReplicationPerGroupStats.java 28.57% <28.57%> (ø)
...opensearch/index/SegmentReplicationShardStats.java 32.35% <32.35%> (ø)
...ensearch/action/bulk/TransportShardBulkAction.java 76.99% <50.00%> (+0.87%) ⬆️
...org/opensearch/index/seqno/ReplicationTracker.java 67.75% <70.68%> (-0.70%) ⬇️
.../replication/checkpoint/ReplicationCheckpoint.java 63.04% <75.00%> (+6.63%) ⬆️
...in/java/org/opensearch/index/shard/IndexShard.java 69.87% <81.25%> (-0.69%) ⬇️
... and 7 more

... and 460 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@mch2 mch2 marked this pull request as ready for review March 12, 2023 22:26
@mch2 mch2 changed the title Draft - Implement Segment replication Backpressure. Implement Segment replication Backpressure. Mar 12, 2023
@mch2 mch2 changed the title Implement Segment replication Backpressure. Implement Segment replication Backpressure Mar 12, 2023
4,
1,
Setting.Property.Dynamic,
Setting.Property.NodeScope
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this (and other) be index scoped IndexScope ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my thinking initially, but felt it would get a bit difficult to manage. I think we can start with node scope and extend to index if the need is there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we can rename the setting constants here to reflect here cluster or node scope.
index.segrep.pressure.checkpoint.limit -> node.segrep.... ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack good catch - I've just removed the index. prefix. ex. segrep.pressure.checkpoint.limit

mch2 added 3 commits March 14, 2023 13:21
This PR introduces new mechanisms to keep track of the current replicas within a replication group and apply backpressure if they fall too far behind.

Writes will be rejected under the following conditions:

1. More than half (default setting) of the replication group is 'stale'.  Defined by setting MAX_ALLOWED_STALE_SHARDS.
2. A replica is stale if it is behind more than MAX_INDEXING_CHECKPOINTS, default 4 AND its current replication lag is over
MAX_REPLICATION_TIME_SETTING, default 5 minutes.

This PR intentionally implements rejections only for index operations,
allowing other TransportWriteActions to succeed, TransportResyncReplicationAction and RetentionLeaseSyncAction.
Blocking these requests will fail recoveries as new nodes are added.

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
@mch2
Copy link
Member Author

mch2 commented Mar 14, 2023

force pushed a rebase from main.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member

Gradle Check (Jenkins) Run Completed with:

Spotless check failure.

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.index.SegmentReplicationPressureIT.testWritesRejected
      1 org.opensearch.index.SegmentReplicationPressureIT.testAddReplicaWhileWritesBlocked

@mch2 mch2 merged commit 6babc08 into opensearch-project:main Mar 15, 2023
@mch2 mch2 added the backport 2.x Backport to 2.x branch label Mar 15, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-6563-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6babc087f7bd5774b97e67f9e386187fe0db3ecb
# Push it to GitHub
git push --set-upstream origin backport/backport-6563-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-6563-to-2.x.

mch2 added a commit to mch2/OpenSearch that referenced this pull request Mar 15, 2023
* Add Segment Replication backpressure.

This PR introduces new mechanisms to keep track of the current replicas within a replication group and apply backpressure if they fall too far behind.

Writes will be rejected under the following conditions:

1. More than half (default setting) of the replication group is 'stale'.  Defined by setting MAX_ALLOWED_STALE_SHARDS.
2. A replica is stale if it is behind more than MAX_INDEXING_CHECKPOINTS, default 4 AND its current replication lag is over
MAX_REPLICATION_TIME_SETTING, default 5 minutes.

This PR intentionally implements rejections only for index operations,
allowing other TransportWriteActions to succeed, TransportResyncReplicationAction and RetentionLeaseSyncAction.
Blocking these requests will fail recoveries as new nodes are added.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add changelog

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix test class to match naming conventions.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Change setting keys to remove index scope.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
mch2 added a commit to mch2/OpenSearch that referenced this pull request Mar 15, 2023
* Add Segment Replication backpressure.

This PR introduces new mechanisms to keep track of the current replicas within a replication group and apply backpressure if they fall too far behind.

Writes will be rejected under the following conditions:

1. More than half (default setting) of the replication group is 'stale'.  Defined by setting MAX_ALLOWED_STALE_SHARDS.
2. A replica is stale if it is behind more than MAX_INDEXING_CHECKPOINTS, default 4 AND its current replication lag is over
MAX_REPLICATION_TIME_SETTING, default 5 minutes.

This PR intentionally implements rejections only for index operations,
allowing other TransportWriteActions to succeed, TransportResyncReplicationAction and RetentionLeaseSyncAction.
Blocking these requests will fail recoveries as new nodes are added.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add changelog

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix test class to match naming conventions.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Change setting keys to remove index scope.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
dreamer-89 pushed a commit that referenced this pull request Mar 15, 2023
* Implement Segment replication Backpressure (#6563)

* Add Segment Replication backpressure.

This PR introduces new mechanisms to keep track of the current replicas within a replication group and apply backpressure if they fall too far behind.

Writes will be rejected under the following conditions:

1. More than half (default setting) of the replication group is 'stale'.  Defined by setting MAX_ALLOWED_STALE_SHARDS.
2. A replica is stale if it is behind more than MAX_INDEXING_CHECKPOINTS, default 4 AND its current replication lag is over
MAX_REPLICATION_TIME_SETTING, default 5 minutes.

This PR intentionally implements rejections only for index operations,
allowing other TransportWriteActions to succeed, TransportResyncReplicationAction and RetentionLeaseSyncAction.
Blocking these requests will fail recoveries as new nodes are added.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add changelog

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix test class to match naming conventions.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Change setting keys to remove index scope.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix Xcontent imports.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
mingshl pushed a commit to mingshl/OpenSearch-Mingshl that referenced this pull request Mar 24, 2023
* Add Segment Replication backpressure.

This PR introduces new mechanisms to keep track of the current replicas within a replication group and apply backpressure if they fall too far behind.

Writes will be rejected under the following conditions:

1. More than half (default setting) of the replication group is 'stale'.  Defined by setting MAX_ALLOWED_STALE_SHARDS.
2. A replica is stale if it is behind more than MAX_INDEXING_CHECKPOINTS, default 4 AND its current replication lag is over
MAX_REPLICATION_TIME_SETTING, default 5 minutes.

This PR intentionally implements rejections only for index operations,
allowing other TransportWriteActions to succeed, TransportResyncReplicationAction and RetentionLeaseSyncAction.
Blocking these requests will fail recoveries as new nodes are added.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add changelog

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix test class to match naming conventions.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Change setting keys to remove index scope.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Mingshi Liu <mingshl@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants