Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: crash if targetContainer does not exist #292

Merged

Conversation

hoyhbx
Copy link
Contributor

@hoyhbx hoyhbx commented Jun 22, 2022

This pull request serves as a naive patch to the crashing problem when targetContainer does not exist.

Specifically, the operator crashes when it cannot find the pod to execute command on. When the pod being queried does not exist, getContainerID returns '-1'. The operator detects the error, but instead of aborting immediately, it continues to access pod.Spec.Containers[] with index targetContainer, leading to a crash.

1.6543661422595975e+09	ERROR	controller_redis	Could not find pod to execute	{"Request.RedisManager.Namespace": "default", "Request.RedisManager.Name": "test-cluster"}
redis-operator/k8sutils.ExecuteRedisClusterCommand
	/workspace/k8sutils/redis.go:71
redis-operator/controllers.(*RedisClusterReconciler).Reconcile
	/workspace/controllers/rediscluster_controller.go:124
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227
panic: runtime error: index out of range [-1]

goroutine 312 [running]:

redis-operator/k8sutils.executeCommand(0xc000283200, {0xc00003ce40, 0xc, 0x1726e26}, {0xc000afa630, 0x15})

/workspace/k8sutils/redis.go:285 +0x827

redis-operator/k8sutils.ExecuteRedisClusterCommand(0xc000283200)

The detailed information and steps to reproduce the problem has been included in #291.

This pull request avoids the crash by adding a return inside every block of error handling code, as can be seen from the changes I made.

Additional Comments

This crashing problem is caused by only printing error logs but not dealing with those errors. The functions continue execution even in the presence of errors. For example, in executeCommand, even though errors are detected and error logs are printed, the function continues execution and print out the Successfully executed the command log.

In the current code base, there are many other places where errors are only detected but bot dealt with. To make the operator more robust, I think we should refactor the error handling code in many functions. As the desired error handling behavior is unknown to me, I cannot make that fix now. Nonetheless, I am more than happy to discuss with you and make the patches for you.

Signed-off-by: hoyhbx <hoyhbx@gmail.com>
@iamabhishek-dubey iamabhishek-dubey merged commit 24209e6 into OT-CONTAINER-KIT:master Jun 28, 2022
devkmsg added a commit to devkmsg/redis-operator that referenced this pull request Jan 30, 2024
Merge in OSS/redis-operator from master to cs-main

* commit '1f0be5c52f4be026c5fe702fdde37e8b19ab307a':
  Bump terser from 4.8.0 to 4.8.1 in /docs (OT-CONTAINER-KIT#311)
  Add pvc, pv clusterrole for deletion (OT-CONTAINER-KIT#313)
  Update zap options to support specifying log level (OT-CONTAINER-KIT#308)
  [BugFix][Change] Fixed cluster recovery logic (OT-CONTAINER-KIT#304)
  [BugFix][Change] Fixed the logic for probes (OT-CONTAINER-KIT#302)
  Fixed pdb creation issue (OT-CONTAINER-KIT#301)
  [BugFix][Change] Fixed secret whitespace issue in redis password (OT-CONTAINER-KIT#300)
  fix: crash if targetContainer does not exist (OT-CONTAINER-KIT#292)
  fix: crash if redisSecret is incomplete (OT-CONTAINER-KIT#293)
  fix: panic if resources is not present (OT-CONTAINER-KIT#294)
  add: error log when a user tries to modify cr.spec.storage.volumeClaimTemplate (OT-CONTAINER-KIT#280)
  Fixes case where node ip contains another PodIP (OT-CONTAINER-KIT#284)
  Bump eventsource from 1.1.0 to 1.1.1 in /docs (OT-CONTAINER-KIT#282)
  Fixed rbac policy for PDB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash: targetContainer being empty leads to the operator crashing
2 participants