-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix stepping down on timeout #24590
Open
mmaslankaprv
wants to merge
7
commits into
redpanda-data:dev
Choose a base branch
from
mmaslankaprv:fix-stepping-down-on-timeout
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Fix stepping down on timeout #24590
mmaslankaprv
wants to merge
7
commits into
redpanda-data:dev
from
mmaslankaprv:fix-stepping-down-on-timeout
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mmaslankaprv
force-pushed
the
fix-stepping-down-on-timeout
branch
3 times, most recently
from
December 17, 2024 15:45
717a5fa
to
c321e29
Compare
mmaslankaprv
requested review from
dotnwat,
bharathv,
bashtanov,
ztlpn and
travisdowns
December 17, 2024 16:22
Retry command for Build#59862please wait until all jobs are finished before running the slash command
|
CI test resultstest results on build#59862test results on build#59902
|
The `raft::reply_result::follower_busy` is indicating that the follower was unable to process the heartbeat fast enough to generate a response. Renaming the reply from `timeout` will make it less confusing for the reader and differentiate the error code from an RPC timeout. Signed-off-by: Michał Maślanka <michal@redpanda.com>
Signed-off-by: Michał Maślanka <michal@redpanda.com>
Wired raft RPC service handler into Raft fixture to make the tests more accurate and cover the service code with tests. Signed-off-by: Michał Maślanka <michal@redpanda.com>
Propagating timeout to the node sending RPC request is crucial for accurate testing of Raft implementation. Signed-off-by: Michał Maślanka <michal@redpanda.com>
Added a wrapper around the `storage::log` allowing us to inject storage layer failures in Raft fixture tests. Signed-off-by: Michał Maślanka <michal@redpanda.com>
When follower is busy it may fail fast processing full heartbeat requests sent by the leader. In this case a follower RPC handler sets the `follower_busy` result in heartbeat_reply. Leader should still treat a follower replica as online in this case. The replica hosting node must be online to reply with the `follower_busy` error. This way we prevent to eager leader step downs when follower replicas are slow. Signed-off-by: Michał Maślanka <michal@redpanda.com>
Signed-off-by: Michał Maślanka <michal@redpanda.com>
mmaslankaprv
force-pushed
the
fix-stepping-down-on-timeout
branch
from
December 18, 2024 08:02
c321e29
to
e203f89
Compare
Retry command for Build#59902please wait until all jobs are finished before running the slash command
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When follower is busy it may fail fast processing full heartbeat
requests sent by the leader. In this case a follower RPC handler sets
the
follower_busy
result in heartbeat_reply. Leader should still treata follower replica as online in this case. The replica hosting node must
be online to reply with the
follower_busy
error.This way we prevent to eager leader step downs when follower replicas
are slow.
Backports Required
Release Notes
Improvements