check_if_node_is_quorum_critical incorrect status #9518
Replies: 4 comments 12 replies
-
There is ongoing work in the Ra repository making sure that such new members that are behind are not counted as online replicas. When it comes to timeouts, is your expectation that timeouts from such nodes should be ignored by the nodes that are caught up? What node did you run the check agains? |
Beta Was this translation helpful? Give feedback.
-
Can logs from all nodes be shared, please? That will likely shed some light on where these timeouts happen. |
Beta Was this translation helpful? Give feedback.
-
Filed #9519 with a clarified description. |
Beta Was this translation helpful? Give feedback.
-
@zaeemarshad also, would you mind describing how you use this health check? It was designed to be used before restarting a node during a rolling upgrade:
When you add a node, you don't need to run this check against it. A newly joined member generally won't be quorum-critical for several reasons, one of which was brought up in this discussion. So I am curious how did you end up with this check being run against node C in the scenario above? |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
With a 3-member QQ, if one member is in a timeout state,
rabbitmq-queues check_if_node_is_quorum_critical
incorrectly reports that the queue will become unavailable.Reproduction steps
rabbitmq-queues check_if_node_is_quorum_critical
Expected behavior
Expect the check to pass and not report any issues
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions