You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Deployment architecture:
We are running a two replica deployment (one master.one replica) in the cloud on Kubernetes (Statefulset).
Each pod in our deployment runs on a dedicated instance (VM).
We have three pods:
Node-0
Node-1
Sentinel-0
Both node-0 (master) and node-1 (replica) are running a Redis server and a Sentinel process, and the Sentinel-0 only runs a Sentinel process.
We have a total of 3 Sentinels with a quorum of 2.
In our pipeline we are running a test that does the following:
1 – restart node-0 (kubectl delete ---grace-period 60) (master) and check if data persists and node-1 became master.
2- restart sentinel-0 and check if data persist
3 – restart node-1 (master at this point) and check if data persisted and node-0 became master
At this point (node-0 is master and node-1 is replica)
4 – stop all nodes (VM) (each pod runs on a node)
5 – start all nodes and check if data persists
During the steps (1-4) we are running a while loop that reads and writes to test zero downtime.
Steps 1-4 go as expected, but after starting all instances again which start up in this order : Sentinel-0 -> node-0 and node-1 (at the same time, sometimes node-1 starts before node-0 and vice versa) we get the split brain issue, where Sentinel-0 says that node-1 is master and node-1 and node-0 say that node-0 is master.
We tried waiting to see if there is eventual consistency but that is not the case.
A short description of the bug.
To reproduce
Follow steps 1 to 5.
Steps to reproduce the behavior and/or a minimal code sample.
Expected behavior
The expected behavior is that there is eventual consistency if there is a disagreement.
Additional information
The config files node.conf and sentinel.conf for the instances: node_and_sentinel.docx
Describe the bug
Deployment architecture:
We are running a two replica deployment (one master.one replica) in the cloud on Kubernetes (Statefulset).
Each pod in our deployment runs on a dedicated instance (VM).
We have three pods:
Node-0
Node-1
Sentinel-0
Both node-0 (master) and node-1 (replica) are running a Redis server and a Sentinel process, and the Sentinel-0 only runs a Sentinel process.
We have a total of 3 Sentinels with a quorum of 2.
In our pipeline we are running a test that does the following:
1 – restart node-0 (kubectl delete ---grace-period 60) (master) and check if data persists and node-1 became master.
2- restart sentinel-0 and check if data persist
3 – restart node-1 (master at this point) and check if data persisted and node-0 became master
At this point (node-0 is master and node-1 is replica)
4 – stop all nodes (VM) (each pod runs on a node)
5 – start all nodes and check if data persists
During the steps (1-4) we are running a while loop that reads and writes to test zero downtime.
Steps 1-4 go as expected, but after starting all instances again which start up in this order : Sentinel-0 -> node-0 and node-1 (at the same time, sometimes node-1 starts before node-0 and vice versa) we get the split brain issue, where Sentinel-0 says that node-1 is master and node-1 and node-0 say that node-0 is master.
We tried waiting to see if there is eventual consistency but that is not the case.
A short description of the bug.
To reproduce
Follow steps 1 to 5.
Steps to reproduce the behavior and/or a minimal code sample.
Expected behavior
The expected behavior is that there is eventual consistency if there is a disagreement.
Additional information
The config files node.conf and sentinel.conf for the instances:
node_and_sentinel.docx
The logs for the sentinels before stopping the VM's:
before-node-0.log
before-node-1.log
before-sentinel-0.log
The logs for the sentinels after starting the VM's:
after-node-0.log
after-node-1.log
after-sentinel-0.log
Any additional information that is relevant to the problem.
The text was updated successfully, but these errors were encountered: