-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To avoid bouncing -REDIRECT during FAILOVER #871
Merged
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may block reading clients as well, which is a behavioral change since we are only in
CLIENT PAUSE WRITE
during this phase.The simplest solution I came up with was to add
&& server.failover_state == NO_FAILOVER
to theif
for the redirect case (in line 3903) and to theif
for the read only replica case in https://github.com/valkey-io/valkey/pull/871/files#diff-1abc5651133d108c0c420d9411925373c711133e7748d9e4f4c97d5fb543fdd9R4012.The rationale for this is that during a failover, we should prefer to block clients (which will happen here https://github.com/valkey-io/valkey/pull/871/files#diff-1abc5651133d108c0c420d9411925373c711133e7748d9e4f4c97d5fb543fdd9R4082) instead of redirecting them or giving them answers that may not be valid anymore after the failover (keep in mind that failover may still fail at this point in time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the client does not execute the
readonly
command, read operations will also be redirected, so read operations also need to be suspended. This is a special state ofFAILOVER_IN_PROGRESS
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what this PR does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand you correctly, you are saying that reading (which is possible in both failover states up to now) is a problem during
FAILOVER_IN_PROGRESS
. Sorry, but I don't understand the reason why this is the case.But if so, why are we only blocking clients that understand REDIRECT? Shouldn't we block all clients in this phase, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both read and write commands may receive a
-REDIRECT
. The issue we are currently addressing is when the primary is demoted to a replica and is in theFAILOVER_IN_PROGRESS
state, the replica may not have become the primary yet. The solution is to pause both read and write commands duringFAILOVER_IN_PROGRESS
, there is no need to pause non-read-write commands any time.We don't care about clients without redirect capa, they would never receive
-REDIRECT
, instead they can receive-READONLY
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't seem to get my message across, but maybe I understand now why you consider
FAILOVER_IN_PROGRESS
to be special:During the entire failover procedure, no change can happen on the primary since all writing commands are blocked (by postponing them). Therefore, I thought that continuing to answer reads is fine in
FAILOVER_WAIT_FOR_SYNC
as well as inFAILOVER_IN_PROGRESS
.But there is a time delay between the new primary becoming primary and the old primary realizing that the switch happened. If a new client happens to connect to the new primary and writes during this time, we may return stale data to a reading "READWRITE" client on the old primary if we allow reading instead of blocking it.
Is this why we need to block reads as well?
Do we need to document this change? (the documentation of the FAILOVER command does say that only writing clients are blocked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see valkey-io/valkey-doc#162