You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When filing a bug, please include the following headings if
possible. Any example text in this template can be deleted.
Overview of the Issue
We have an issue quite similar to #3611 We had a failure of one of our consul nodes yesterday. In trying to fix this, the admin deleted consul data and re-tried to join. This resulted in a still-bad node with a slightly different name in the cluster. We gracefully left one and used force-leave on the other. It's been over eight hours and the two good servers are still trying to reconnect to the bad node.
Reproduction Steps
Three-node server cluster
Leave one of the server nodes
The remaining two servers keep trying to access the good node on the WAN port
I've already tried to force-leave the bad nodes (both by node name and ip) but that's not resolved this issue.
I'm planning to stand up a new node to join the cluster to give it fault tolerance but is there a way to ditch this old, bad node? Is it a problem that it persists?
The text was updated successfully, but these errors were encountered:
I wanted to come back and update this issue. We ended up create a new consul server node and when it joined the cluster, the old bad node was remove from the list. Oddly, the non leader is still trying to reconnect the bad node. Not sure what to do about that.
I guess this was a matter of time. Adding new nodes caused left nodes to be removed. And eventually, the consul server(s) stopped trying to contact the left nodes.
When filing a bug, please include the following headings if
possible. Any example text in this template can be deleted.
Overview of the Issue
We have an issue quite similar to #3611 We had a failure of one of our consul nodes yesterday. In trying to fix this, the admin deleted consul data and re-tried to join. This resulted in a still-bad node with a slightly different name in the cluster. We gracefully left one and used force-leave on the other. It's been over eight hours and the two good servers are still trying to reconnect to the bad node.
Reproduction Steps
Consul info for both Client and Server
Client info
Server info
Operating system and Environment details
RHEL 7.5 on Linux on VMWare
Log Fragments
on the node that gracefully left
on the remaining server node
Autopilot health reports that the cluster was stable as of when the sick node left
I've already tried to force-leave the bad nodes (both by node name and ip) but that's not resolved this issue.
I'm planning to stand up a new node to join the cluster to give it fault tolerance but is there a way to ditch this old, bad node? Is it a problem that it persists?
The text was updated successfully, but these errors were encountered: