Skip to content
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.

Use readiness script for liveness probe too #234

Merged
merged 2 commits into from
Jul 8, 2020
Merged

Use readiness script for liveness probe too #234

merged 2 commits into from
Jul 8, 2020

Conversation

cscetbon
Copy link
Contributor

@cscetbon cscetbon commented Jul 1, 2020

| Bug fix? | yes

What's in this PR?

This change avoids having k8s pods to fail when they are decommissioning/joining the cluster. Thanks to CASSANDRA-7069 Cassandra will prevent 2 nodes from attempting to join the cluster in parallel

Additional context

The current PR goal is to avoid having pods restarted when they take too long to join or leave a cluster because of the liveness timeout

@cscetbon cscetbon requested review from ahmedjami and erdrix July 1, 2020 03:16
@ahmedjami
Copy link
Contributor

A decommissioned node appears in LeavingNode during decommission operation before it passed on UnreachableNodes status for a while.
Since your check didn't find the node on LiveNodes status and returns 1, your liveness will fail and kubelet will try to restart the pod endlessly...
Even node stopped appears on Unreachable status.
This is the behaviour that I can see when we request status from jmx within StorageServiceMbean.

I need also to test it within a cluster and casskop to confirm what I already mentioned above.

@cscetbon
Copy link
Contributor Author

cscetbon commented Jul 3, 2020

@ahmedjami I made some tests and a node that is leaving or joining stays in the LiveNodes list as long as it's alive and in the ring, so it won't be a problem. Look at those logs https://pastebin.com/raw/8saiJXgS to see it in action.

@ahmedjami
Copy link
Contributor

@cscetbon, yes the node appears in both status: Leaving and Live Nodes when we decommission it. So based on what you check on liveness probe, this will achieve the purpose here :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants