Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash: targetContainer being empty leads to the operator crashing #291

Closed
hoyhbx opened this issue Jun 20, 2022 · 0 comments · Fixed by #292
Closed

Crash: targetContainer being empty leads to the operator crashing #291

hoyhbx opened this issue Jun 20, 2022 · 0 comments · Fixed by #292
Labels
bug Something isn't working

Comments

@hoyhbx
Copy link
Contributor

hoyhbx commented Jun 20, 2022

What version of redis operator are you using?

redis-operator version: We are using redis-operator built from the HEAD

Does this issue reproduce with the latest release?
Yes, the problem reproduces with quay.io/opstree/redis-operator:v0.10.0

What operating system and processor architecture are you using (kubectl version)?

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:58:47Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What did you do?
I submitted an invalid cr file which should make it impossible to create redis leader pods but nothing more than that. However, the operator crashes after a while.

To reproduce, you can use Kubernetes with version lower than v1.23.0 together with the below cr file. This bug is found with v1.22.9.

cr_cluster.yaml
apiVersion: redis.redis.opstreelabs.in/v1beta1
kind: RedisCluster
metadata:
  name: test-cluster
spec:
  clusterSize: 3
  kubernetesConfig:
    image: quay.io/opstree/redis:v6.2.5
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: 101m
        memory: 128Mi
      requests:
        cpu: 101m
        memory: 128Mi
  redisExporter:
    enabled: true
    image: quay.io/opstree/redis-exporter:1.0
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
  redisLeader:
    livenessProbe:
      grpc:
        port: 4
        service: zangmybtia
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

First, observe that leader pods cannot be created as the cr input is invalid:

kubectl describe statefulset.apps/test-cluster-leader
Warning  FailedCreate      4s (x12 over 13s)  statefulset-controller  create Pod test-cluster-leader-0 in StatefulSet test-cluster-leader failed error: Pod "test-cluster-leader-0" is invalid: spec.containers[0].livenessProbe: Required value: must specify a handler type

Wait for about a minute or so, observe that the operator crashes and restarts. Below is the operator crash log:

kubectl logs deployment.apps/redis-operator -n redis-operator output
1.6543661422595975e+09	ERROR	controller_redis	Could not find pod to execute	{"Request.RedisManager.Namespace": "default", "Request.RedisManager.Name": "test-cluster"}
redis-operator/k8sutils.ExecuteRedisClusterCommand
	/workspace/k8sutils/redis.go:71
redis-operator/controllers.(*RedisClusterReconciler).Reconcile
	/workspace/controllers/rediscluster_controller.go:124
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227
panic: runtime error: index out of range [-1]

goroutine 312 [running]:
redis-operator/k8sutils.executeCommand(0xc000283200, {0xc00003ce40, 0xc, 0x1726e26}, {0xc000afa630, 0x15})
/workspace/k8sutils/redis.go:285 +0x827
redis-operator/k8sutils.ExecuteRedisClusterCommand(0xc000283200)
/workspace/k8sutils/redis.go:71 +0x805
redis-operator/controllers.(*RedisClusterReconciler).Reconcile(0xc0000cf1d0, {0xc0007181e0, 0x155c060}, {{{0xc000714a76, 0x1666c40}, {0xc000714a80, 0x30}}})
/workspace/controllers/rediscluster_controller.go:124 +0xb7e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc0000cc160, {0x1940918, 0xc0007181e0}, {{{0xc000714a76, 0x1666c40}, {0xc000714a80, 0x413a94}}})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114 +0x26f
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0000cc160, {0x1940870, 0xc0005b2580}, {0x15b1740, 0xc0000b6020})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311 +0x33e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0000cc160, {0x1940870, 0xc0005b2580})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:223 +0x357

What did you expect to see?
Redis-operator does not crash.

What did you see instead?
Redis-operator crashed with the forementioned cr yaml file

Possible root cause
The invalid CR file makes it impossible to create pods. Thus in function executeCommand, ``getContainerID(cr, podname)returns-1` because the specified podname never exisit. However, instead of returning immediately, the function proceeds to accessing the pod's information at L285 using `-1` as the array index, leading to a crash.

targetContainer, pod := getContainerID(cr, podName)
if targetContainer < 0 {
logger.Error(err, "Could not find pod to execute")
}
req := generateK8sClient().CoreV1().RESTClient().Post().Resource("pods").Name(podName).Namespace(cr.Namespace).SubResource("exec")
req.VersionedParams(&corev1.PodExecOptions{
Container: pod.Spec.Containers[targetContainer].Name,

Comments
Why the example CR is invalid?
The livenessProbe field in the sample CR makes it invalid.
When creating a pod, one and only one probing handler need to be specified for the livenessProbe field. A pod cannot be created if it is not the case. Prior to v1.23, the available handler list ([exec, httpGet, TCPSocket]) does not include grpc as revealed in https://github.com/kubernetes/kubernetes/blob/release-1.22/pkg/apis/core/validation/validation.go#L2770-L2800.

The bug cannot be reproduced with the example above in Kubernetes v1.23+ because in that case grpc becomes a valid handler as in https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/validation.go#L2824-L2862.
But still, you can reproduce the bug by specifying more than one handler in livenessProbe.

The operator didn't detect the invalid input because the statefulset does not validate the field livenessProbe at the time the patch gets submitted by the operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant