Fix cluster replica unable to establish replication link in race condition #965

enjoy-binbin · 2024-08-29T05:30:08Z

In d00b8af, we fixed the bug that the
replica may set the primary host to '?'. Normally, we will correct '?'
to the correct IP when processing the next packet from primary, because
we will call nodeUpdateAddressIfNeeded when processing the packet.

int clusterProcessPacket(clusterLink *link) {
    ...
    /* Update the node address if it changed. */
    if (sender && type == CLUSTERMSG_TYPE_PING && !nodeInHandshake(sender) &&
        nodeUpdateAddressIfNeeded(sender, link, hdr)) {
        clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_UPDATE_STATE);
    }
    ...
}

But there is a race will casue it to be unable to correct the primary host,
see the associated issue for more details about the race.

Simply put, if the replica just received a packet from another node that
contains new address about the primary, we will update primary node address
in here (the code i touched), when the replica receives the packet from
the primary, the check in nodeUpdateAddressIfNeeded will fail since the
address has been updated correctly, and we will not have the opportunity
to call replicationSetPrimary and update the primary host.

In places where the node address may be updated, we need to add this
replicationSetPrimary check to avoid this race.

…ition In d00b8af, we fixed the bug that the replica may set the primary host to '?'. Normally, we will correct '?' to the correct IP when processing the next packet from primary, because we will call nodeUpdateAddressIfNeeded when processing the packet. ``` int clusterProcessPacket(clusterLink *link) { ... /* Update the node address if it changed. */ if (sender && type == CLUSTERMSG_TYPE_PING && !nodeInHandshake(sender) && nodeUpdateAddressIfNeeded(sender, link, hdr)) { clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_UPDATE_STATE); } ... } ``` But there is a race will casue it to be unable to correct the primary host, see the associated issue for more details about the race. Simply put, if the replica just received a packet from another node that contains new address about the primary, we will update primary node address in here (the code i touched), when the replica receives the packet from the primary, the check in nodeUpdateAddressIfNeeded will fail since the address has been updated correctly, and we will not have the opportunity to call replicationSetPrimary and update the primary host. In places where the node address may be updated, we need to add this replicationSetPrimary check to avoid this race. Signed-off-by: Binbin <binloveplay1314@qq.com>

codecov · 2024-08-29T05:44:40Z

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 70.45%. Comparing base (25dd943) to head (8a92dd8).
Report is 21 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/cluster_legacy.c	0.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #965      +/-   ##
============================================
+ Coverage     70.41%   70.45%   +0.03%     
============================================
  Files           114      114              
  Lines         61744    61730      -14     
============================================
+ Hits          43480    43494      +14     
+ Misses        18264    18236      -28

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`85.51% <0.00%> (+0.04%)`	⬆️

... and 14 files with indirect coverage changes

zuiderkwast

Interesting. Was there a failing test case because of this?

The fix looks good to me, but I wish that Ping will take a look too.

Is there a risk that the gossip is wrong and that we should instead wait for a message from the primary?

enjoy-binbin · 2024-08-29T11:09:25Z

Interesting. Was there a failing test case because of this?

no, i just meet a d00b8af issue and i looked at it in depth and think there is a race condition in here.

Is there a risk that the gossip is wrong and that we should instead wait for a message from the primary?

you mean some node gossip a wrong address about the primary? Yes, it may happend, and then when the primary send its PING, the replica can update it via nodeUpdateAddressIfNeeded and everything work fine. The only downside i think is the replica will need to do a reconnect in here (a psync hopefully).

PingXie

This fix looks correct to me and is very reasonable for 8.0.

however, conceptually, I feel that nodeUpdateAddressIfNeeded is the more appropriate abstraction that we need here and potentially for other similar cases.

enjoy-binbin · 2024-08-30T05:17:15Z

however, conceptually, I feel that nodeUpdateAddressIfNeeded is the more appropriate abstraction that we need here and potentially for other similar cases.

you mean to call nodeUpdateAddressIfNeeded here, or find a way to merge them in one way? I think it's possible (i'll find time to look at it), the current code is OK to me now since it has comments and references, so people can easily see it. But a abstraction is also better i will try to if i can merge them

madolson

This seems necessary, it's whether or not there are other edge cases around here I'm less sure about. Each day makes me want to move towards a new cluster approach :/

PingXie · 2024-09-01T06:49:50Z

you mean to call nodeUpdateAddressIfNeeded here, or find a way to merge them in one way?

Yes this is technically the "atomicity" contract that we need to enforce, i.e., any change to the node endpoint should be followed up by this check of myself's primary.

That said, I think we should merge this one as-is but look into refactoring post 8.0.

This seems necessary, it's whether or not there are other edge cases around here I'm less sure about. Each day makes me want to move towards a new cluster approach :/

I agree with you on a new cluster solution in many other cases but I think this is just a code bug :)

…updated in `clusterProcessGossipSection` (valkey-io#965) `clusterProcessGossipSection` currently doesn't trigger a check and call `replicationSetPrimary` when `myself`'s primary node’s IP/port is updated. This fix ensures that after every node address update, `replicationSetPrimary` is called if the updated node is `myself`'s primary. This prevents missed updates and ensures that replicas reconnect properly to maintain their replication link with the primary.

…updated in `clusterProcessGossipSection` (valkey-io#965) `clusterProcessGossipSection` currently doesn't trigger a check and call `replicationSetPrimary` when `myself`'s primary node’s IP/port is updated. This fix ensures that after every node address update, `replicationSetPrimary` is called if the updated node is `myself`'s primary. This prevents missed updates and ensures that replicas reconnect properly to maintain their replication link with the primary. Signed-off-by: Ping Xie <pingxie@google.com>

…updated in `clusterProcessGossipSection` (valkey-io#965) `clusterProcessGossipSection` currently doesn't trigger a check and call `replicationSetPrimary` when `myself`'s primary node’s IP/port is updated. This fix ensures that after every node address update, `replicationSetPrimary` is called if the updated node is `myself`'s primary. This prevents missed updates and ensures that replicas reconnect properly to maintain their replication link with the primary.

…updated in `clusterProcessGossipSection` (valkey-io#965) `clusterProcessGossipSection` currently doesn't trigger a check and call `replicationSetPrimary` when `myself`'s primary node’s IP/port is updated. This fix ensures that after every node address update, `replicationSetPrimary` is called if the updated node is `myself`'s primary. This prevents missed updates and ensures that replicas reconnect properly to maintain their replication link with the primary. Signed-off-by: Ping Xie <pingxie@google.com>

…updated in `clusterProcessGossipSection` (#965) `clusterProcessGossipSection` currently doesn't trigger a check and call `replicationSetPrimary` when `myself`'s primary node’s IP/port is updated. This fix ensures that after every node address update, `replicationSetPrimary` is called if the updated node is `myself`'s primary. This prevents missed updates and ensures that replicas reconnect properly to maintain their replication link with the primary. Signed-off-by: Ping Xie <pingxie@google.com>

…updated in `clusterProcessGossipSection` (valkey-io#965) `clusterProcessGossipSection` currently doesn't trigger a check and call `replicationSetPrimary` when `myself`'s primary node’s IP/port is updated. This fix ensures that after every node address update, `replicationSetPrimary` is called if the updated node is `myself`'s primary. This prevents missed updates and ensures that replicas reconnect properly to maintain their replication link with the primary. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>

enjoy-binbin requested review from madolson, zuiderkwast and PingXie August 29, 2024 05:30

zuiderkwast approved these changes Aug 29, 2024

View reviewed changes

PingXie approved these changes Aug 29, 2024

View reviewed changes

madolson approved these changes Sep 1, 2024

View reviewed changes

enjoy-binbin added the release-notes This issue should get a line item in the release notes label Sep 4, 2024

PingXie merged commit 9b51949 into valkey-io:unstable Sep 6, 2024
47 checks passed

enjoy-binbin deleted the address_race branch September 6, 2024 06:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cluster replica unable to establish replication link in race condition #965

Fix cluster replica unable to establish replication link in race condition #965

enjoy-binbin commented Aug 29, 2024

codecov bot commented Aug 29, 2024 •

edited

Loading

zuiderkwast left a comment

enjoy-binbin commented Aug 29, 2024

PingXie left a comment

enjoy-binbin commented Aug 30, 2024

madolson left a comment

PingXie commented Sep 1, 2024

Fix cluster replica unable to establish replication link in race condition #965

Fix cluster replica unable to establish replication link in race condition #965

Conversation

enjoy-binbin commented Aug 29, 2024

codecov bot commented Aug 29, 2024 • edited Loading

Codecov Report

zuiderkwast left a comment

Choose a reason for hiding this comment

enjoy-binbin commented Aug 29, 2024

PingXie left a comment

Choose a reason for hiding this comment

enjoy-binbin commented Aug 30, 2024

madolson left a comment

Choose a reason for hiding this comment

PingXie commented Sep 1, 2024

codecov bot commented Aug 29, 2024 •

edited

Loading