Crash: targetContainer
being empty leads to the operator crashing
#291
Labels
bug
Something isn't working
targetContainer
being empty leads to the operator crashing
#291
What version of redis operator are you using?
redis-operator version: We are using redis-operator built from the
HEAD
Does this issue reproduce with the latest release?
Yes, the problem reproduces with
quay.io/opstree/redis-operator:v0.10.0
What operating system and processor architecture are you using (
kubectl version
)?kubectl version
OutputWhat did you do?
I submitted an invalid cr file which should make it impossible to create redis leader pods but nothing more than that. However, the operator crashes after a while.
To reproduce, you can use Kubernetes with version lower than v1.23.0 together with the below cr file. This bug is found with
v1.22.9
.cr_cluster.yaml
First, observe that leader pods cannot be created as the cr input is invalid:
kubectl describe statefulset.apps/test-cluster-leader
Wait for about a minute or so, observe that the operator crashes and restarts. Below is the operator crash log:
kubectl logs deployment.apps/redis-operator -n redis-operator
outputWhat did you expect to see?
Redis-operator does not crash.
What did you see instead?
Redis-operator crashed with the forementioned cr yaml file
Possible root cause
The invalid CR file makes it impossible to create pods. Thus in function
executeCommand
, ``getContainerID(cr, podname)returns
-1` because the specified podname never exisit. However, instead of returning immediately, the function proceeds to accessing the pod's information at L285 using `-1` as the array index, leading to a crash.redis-operator/k8sutils/redis.go
Lines 278 to 285 in f1c547e
Comments
Why the example CR is invalid?
The livenessProbe field in the sample CR makes it invalid.
When creating a pod, one and only one probing handler need to be specified for the
livenessProbe
field. A pod cannot be created if it is not the case. Prior tov1.23
, the available handler list ([exec, httpGet, TCPSocket]
) does not includegrpc
as revealed in https://github.com/kubernetes/kubernetes/blob/release-1.22/pkg/apis/core/validation/validation.go#L2770-L2800.The bug cannot be reproduced with the example above in Kubernetes v1.23+ because in that case
grpc
becomes a valid handler as in https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/validation.go#L2824-L2862.But still, you can reproduce the bug by specifying more than one handler in
livenessProbe
.The operator didn't detect the invalid input because the statefulset does not validate the field
livenessProbe
at the time the patch gets submitted by the operator.The text was updated successfully, but these errors were encountered: