-
Starting with the tutorial docker images (so monitor, node1 and node2). I have these running and I can monitor their status. So far so good. I have node1 as primary:
Now suppose node1 dies for some reason. On a production system, the node H/W or VM can have a kernel panic, network issues, whatever. Here I am simulating it with:
soon the status is:
and then a few seconds later:
Node2 stays as wait_primary and seems to be stuck there. I tried to wait 15 minutes and it stayed there. I then started the container to simulate the VM rebooting or the network restoration:
and quickly I see:
then:
But while node1 is down, node2 was stuck. That can't be right as it should have assumed primacy on its own. On another attempt, I tried drop node --name node1 (after I stopped it), and node2 because a 'single' node right away without waiting for node1 (wait_primary). I think I am missing a step. Any suggestions? Appreciated! Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
Hi @lgammo ; please take some time to actually read the documentation. Specifically, we have full coverage for the Failover State Machine including a glossary that details what the state names mean, including wait_primary. I won't copy paste the docs contents in here. In short, the primary state embeds the idea that we have a trustworthy secondary to failover to. Otherwise, the applicable state is wait_primary when there is (at least) another node registered but only one node is available at this time. In other words, it's all working exactly as designed. And documented... |
Beta Was this translation helpful? Give feedback.
-
I have read the documentation, which have: "Wait_primary I read the 'no yet in that position' to indicate it is not in fact a primary. By the way, you have a great product @DimCitus. |
Beta Was this translation helpful? Give feedback.
-
if our 3 nodes went in bad state then how we can identify which will take primary role if our 3 nodes show read-only! or None! |
Beta Was this translation helpful? Give feedback.
Hi @lgammo ; please take some time to actually read the documentation. Specifically, we have full coverage for the Failover State Machine including a glossary that details what the state names mean, including wait_primary.
I won't copy paste the docs contents in here. In short, the primary state embeds the idea that we have a trustworthy secondary to failover to. Otherwise, the applicable state is wait_primary when there is (at least) another node registered but only one node is available at this time.
In other words, it's all working exactly as designed. And documented...