To avoid bouncing -REDIRECT during FAILOVER #871

soloestoy · 2024-08-05T09:51:08Z

During the FAILOVER process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the FAILOVER_IN_PROGRESS state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the PSYNC FAILOVER command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the FAILOVER_IN_PROGRESS state, there may be a period of time where both nodes are replicas. In this scenario, if a -REDIRECT is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the FAILOVER_IN_PROGRESS state, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution.

During the FAILOVER process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the FAILOVER_IN_PROGRESS state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the PSYNC FAILOVER command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the FAILOVER_IN_PROGRESS state, there may be a period of time where both nodes are replicas. In this scenario, if a -REDIRECT is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the FAILOVER_IN_PROGRESS state, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>

codecov · 2024-08-05T10:04:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.18%. Comparing base (3cca268) to head (82894ee).
Report is 93 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #871      +/-   ##
============================================
- Coverage     70.35%   70.18%   -0.18%     
============================================
  Files           112      112              
  Lines         61310    61473     +163     
============================================
+ Hits          43136    43146      +10     
- Misses        18174    18327     +153

Files with missing lines	Coverage Δ
src/server.c	`88.54% <100.00%> (-0.02%)`	⬇️

... and 21 files with indirect coverage changes

gmbnomis

fyi, a potential test case for this bug is part of my PR #848 , see FIXME at https://github.com/valkey-io/valkey/pull/848/files#diff-fdef28e02ac8048541ebebbb829193ffd25622dc438bd8d921b167e51564bdd0R177. Just saying 😉

gmbnomis · 2024-08-05T11:07:44Z

src/server.c

+             * FAILOVER_IN_PROGRESS state, we temporarily suspend the clients
+             * that need to be redirected until the replica truly becomes the primary,
+             * and then resume the execution. */
+            blockPostponeClient(c);


I think this may block reading clients as well, which is a behavioral change since we are only in CLIENT PAUSE WRITE during this phase.

The simplest solution I came up with was to add && server.failover_state == NO_FAILOVER to the if for the redirect case (in line 3903) and to the if for the read only replica case in https://github.com/valkey-io/valkey/pull/871/files#diff-1abc5651133d108c0c420d9411925373c711133e7748d9e4f4c97d5fb543fdd9R4012.

The rationale for this is that during a failover, we should prefer to block clients (which will happen here https://github.com/valkey-io/valkey/pull/871/files#diff-1abc5651133d108c0c420d9411925373c711133e7748d9e4f4c97d5fb543fdd9R4082) instead of redirecting them or giving them answers that may not be valid anymore after the failover (keep in mind that failover may still fail at this point in time).

I think this may block reading clients as well, which is a behavioral change since we are only in CLIENT PAUSE WRITE during this phase.

When the client does not execute the readonly command, read operations will also be redirected, so read operations also need to be suspended. This is a special state of FAILOVER_IN_PROGRESS.

The rationale for this is that during a failover, we should prefer to block clients (which will happen here https://github.com/valkey-io/valkey/pull/871/files#diff-1abc5651133d108c0c420d9411925373c711133e7748d9e4f4c97d5fb543fdd9R4082) instead of redirecting them or giving them answers that may not be valid anymore after the failover (keep in mind that failover may still fail at this point in time).

This is exactly what this PR does.

When the client does not execute the readonly command, read operations will also be redirected, so read operations also need to be suspended. This is a special state of FAILOVER_IN_PROGRESS.

If I understand you correctly, you are saying that reading (which is possible in both failover states up to now) is a problem during FAILOVER_IN_PROGRESS. Sorry, but I don't understand the reason why this is the case.

But if so, why are we only blocking clients that understand REDIRECT? Shouldn't we block all clients in this phase, then?

Both read and write commands may receive a -REDIRECT. The issue we are currently addressing is when the primary is demoted to a replica and is in the FAILOVER_IN_PROGRESS state, the replica may not have become the primary yet. The solution is to pause both read and write commands during FAILOVER_IN_PROGRESS, there is no need to pause non-read-write commands any time.

why are we only blocking clients that understand REDIRECT?

We don't care about clients without redirect capa, they would never receive -REDIRECT, instead they can receive -READONLY.

I don't seem to get my message across, but maybe I understand now why you consider FAILOVER_IN_PROGRESS to be special:

During the entire failover procedure, no change can happen on the primary since all writing commands are blocked (by postponing them). Therefore, I thought that continuing to answer reads is fine in FAILOVER_WAIT_FOR_SYNC as well as in FAILOVER_IN_PROGRESS.

But there is a time delay between the new primary becoming primary and the old primary realizing that the switch happened. If a new client happens to connect to the new primary and writes during this time, we may return stale data to a reading "READWRITE" client on the old primary if we allow reading instead of blocking it.

Is this why we need to block reads as well?

Do we need to document this change? (the documentation of the FAILOVER command does say that only writing clients are blocked)

Do we need to document this change? (the documentation of the FAILOVER command does say that only writing clients are blocked)

Yes, see valkey-io/valkey-doc#162

madolson

It makes sense to me. It would be nice to add a test though.

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>

To valkey-io/valkey#871 --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

tests/integration/replica-redirect.tcl

Fix valkey-io#821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Signed-off-by: mwish <maplewish117@gmail.com>

Fix #821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>

Fix valkey-io#821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Signed-off-by: Ping Xie <pingxie@google.com>

soloestoy requested a review from madolson August 5, 2024 09:59

gmbnomis reviewed Aug 5, 2024

View reviewed changes

madolson approved these changes Aug 12, 2024

View reviewed changes

add test

82894ee

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>

soloestoy added the needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. label Aug 14, 2024

soloestoy mentioned this pull request Aug 14, 2024

Explain the special handling of redirection in failover valkey-io/valkey-doc#162

Merged

soloestoy merged commit 131857e into valkey-io:unstable Aug 14, 2024
47 checks passed

madolson reviewed Aug 15, 2024

View reviewed changes

tests/integration/replica-redirect.tcl Show resolved Hide resolved

gmbnomis mentioned this pull request Aug 15, 2024

Standalone FAILOVER: Fix disconnect of blocked clients in standalone failover and support REDIRECT response #848

Open

enjoy-binbin added the release-notes This issue should get a line item in the release notes label Sep 4, 2024

gmbnomis mentioned this pull request Nov 6, 2024

Enhance SENTINEL FAILOVER to use the FAILOVER command to avoid data loss #1238

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To avoid bouncing -REDIRECT during FAILOVER #871

To avoid bouncing -REDIRECT during FAILOVER #871

soloestoy commented Aug 5, 2024 •

edited

Loading

codecov bot commented Aug 5, 2024 •

edited

Loading

gmbnomis left a comment

gmbnomis Aug 5, 2024 •

edited

Loading

soloestoy Aug 5, 2024

soloestoy Aug 5, 2024

gmbnomis Aug 5, 2024

soloestoy Aug 13, 2024

gmbnomis Aug 13, 2024

soloestoy Aug 14, 2024

madolson left a comment •

edited

Loading

To avoid bouncing -REDIRECT during FAILOVER #871

To avoid bouncing -REDIRECT during FAILOVER #871

Conversation

soloestoy commented Aug 5, 2024 • edited Loading

codecov bot commented Aug 5, 2024 • edited Loading

Codecov Report

gmbnomis left a comment

Choose a reason for hiding this comment

gmbnomis Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

soloestoy Aug 5, 2024

Choose a reason for hiding this comment

soloestoy Aug 5, 2024

Choose a reason for hiding this comment

gmbnomis Aug 5, 2024

Choose a reason for hiding this comment

soloestoy Aug 13, 2024

Choose a reason for hiding this comment

gmbnomis Aug 13, 2024

Choose a reason for hiding this comment

soloestoy Aug 14, 2024

Choose a reason for hiding this comment

madolson left a comment • edited Loading

Choose a reason for hiding this comment

soloestoy commented Aug 5, 2024 •

edited

Loading

codecov bot commented Aug 5, 2024 •

edited

Loading

gmbnomis Aug 5, 2024 •

edited

Loading

madolson left a comment •

edited

Loading