Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel split-brain after failover #1322

Open
MuhammadQadora opened this issue Nov 19, 2024 · 0 comments
Open

Sentinel split-brain after failover #1322

MuhammadQadora opened this issue Nov 19, 2024 · 0 comments

Comments

@MuhammadQadora
Copy link

Describe the bug
Deployment architecture:
We are running a two replica deployment (one master.one replica) in the cloud on Kubernetes (Statefulset).
Each pod in our deployment runs on a dedicated instance (VM).
We have three pods:
Node-0
Node-1
Sentinel-0
Both node-0 (master) and node-1 (replica) are running a Redis server and a Sentinel process, and the Sentinel-0 only runs a Sentinel process.
We have a total of 3 Sentinels with a quorum of 2.
In our pipeline we are running a test that does the following:

1 – restart node-0 (kubectl delete ---grace-period 60) (master) and check if data persists and node-1 became master.
2- restart sentinel-0 and check if data persist
3 – restart node-1 (master at this point) and check if data persisted and node-0 became master
At this point (node-0 is master and node-1 is replica)
4 – stop all nodes (VM) (each pod runs on a node)
5 – start all nodes and check if data persists

During the steps (1-4) we are running a while loop that reads and writes to test zero downtime.
Steps 1-4 go as expected, but after starting all instances again which start up in this order : Sentinel-0 -> node-0 and node-1 (at the same time, sometimes node-1 starts before node-0 and vice versa) we get the split brain issue, where Sentinel-0 says that node-1 is master and node-1 and node-0 say that node-0 is master.
We tried waiting to see if there is eventual consistency but that is not the case.

A short description of the bug.

To reproduce
Follow steps 1 to 5.
Steps to reproduce the behavior and/or a minimal code sample.

Expected behavior
The expected behavior is that there is eventual consistency if there is a disagreement.

Additional information
The config files node.conf and sentinel.conf for the instances:
node_and_sentinel.docx

The logs for the sentinels before stopping the VM's:
before-node-0.log
before-node-1.log
before-sentinel-0.log

The logs for the sentinels after starting the VM's:
after-node-0.log
after-node-1.log
after-sentinel-0.log

Any additional information that is relevant to the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant