Sentinel split-brain after failover #1322

MuhammadQadora · 2024-11-19T10:24:28Z

Describe the bug
Deployment architecture:
We are running a two replica deployment (one master.one replica) in the cloud on Kubernetes (Statefulset).
Each pod in our deployment runs on a dedicated instance (VM).
We have three pods:
Node-0
Node-1
Sentinel-0
Both node-0 (master) and node-1 (replica) are running a Redis server and a Sentinel process, and the Sentinel-0 only runs a Sentinel process.
We have a total of 3 Sentinels with a quorum of 2.
In our pipeline we are running a test that does the following:

1 – restart node-0 (kubectl delete ---grace-period 60) (master) and check if data persists and node-1 became master.
2- restart sentinel-0 and check if data persist
3 – restart node-1 (master at this point) and check if data persisted and node-0 became master
At this point (node-0 is master and node-1 is replica)
4 – stop all nodes (VM) (each pod runs on a node)
5 – start all nodes and check if data persists

During the steps (1-4) we are running a while loop that reads and writes to test zero downtime.
Steps 1-4 go as expected, but after starting all instances again which start up in this order : Sentinel-0 -> node-0 and node-1 (at the same time, sometimes node-1 starts before node-0 and vice versa) we get the split brain issue, where Sentinel-0 says that node-1 is master and node-1 and node-0 say that node-0 is master.
We tried waiting to see if there is eventual consistency but that is not the case.

A short description of the bug.

To reproduce
Follow steps 1 to 5.
Steps to reproduce the behavior and/or a minimal code sample.

Expected behavior
The expected behavior is that there is eventual consistency if there is a disagreement.

Additional information
The config files node.conf and sentinel.conf for the instances:
node_and_sentinel.docx

The logs for the sentinels before stopping the VM's:
before-node-0.log
before-node-1.log
before-sentinel-0.log

The logs for the sentinels after starting the VM's:
after-node-0.log
after-node-1.log
after-sentinel-0.log

Any additional information that is relevant to the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentinel split-brain after failover #1322

Sentinel split-brain after failover #1322

MuhammadQadora commented Nov 19, 2024

Sentinel split-brain after failover #1322

Sentinel split-brain after failover #1322

Comments

MuhammadQadora commented Nov 19, 2024