Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator fails when the pod is restarted and the Service Monitor for operator metrics was already created #3446

Closed
iblancasa opened this issue Nov 11, 2024 · 2 comments · Fixed by #3447
Assignees
Labels
bug Something isn't working needs triage

Comments

@iblancasa
Copy link
Contributor

Component(s)

collector

What happened?

Description

Reported by @IshwarKanse (great job!)

Steps to Reproduce

  • Deploy the operator
  • Change something from the deployment. For instance, add an arg
  • Wait until the new pod fails

Actual Result

{"level":"INFO","timestamp":"2024-11-11T10:18:28.446310667Z","message":"All workers finished","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446315312Z","message":"Stopping and waiting for caches"}
W1111 10:18:28.446400       1 reflector.go:484] pkg/mod/k8s.io/client-go@v0.31.2/tools/cache/reflector.go:243: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446627238Z","message":"Stopping and waiting for webhooks"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446652862Z","logger":"controller-runtime.webhook","message":"Shutting down webhook server with timeout of 1 minute"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446740103Z","message":"Stopping and waiting for HTTP servers"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446760527Z","logger":"controller-runtime.metrics","message":"Shutting down metrics server with timeout of 1 minute"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.44679548Z","message":"shutting down server","name":"health probe","addr":"[::]:8081"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446854274Z","message":"Wait completed, proceeding to shutdown the manager"}
{"level":"ERROR","timestamp":"2024-11-11T10:18:28.454439947Z","message":"error received after stop sequence was engaged","error":"leader election lost","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\t/Users/ikanse/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.1/pkg/manager/internal.go:512"}
{"level":"ERROR","timestamp":"2024-11-11T10:18:28.454409611Z","logger":"setup","message":"problem running manager","error":"error creating service monitor: servicemonitors.monitoring.coreos.com \"opentelemetry-operator-metrics-monitor\" already exists","stacktrace":"main.main\n\t/Users/ikanse/opentelemetry-operator/main.go:517\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271"}

Kubernetes Version

.

Operator version

.

Collector version

.

Environment information

No response

Log output

No response

Additional context

No response

@Starefossen
Copy link
Contributor

This should not have been marked as solved as the patch is still not available in any released version of the operator.

@iblancasa
Copy link
Contributor Author

This should not have been marked as solved as the patch is still not available in any released version of the operator.

Thanks for your comment.

Issues are closed when merged. You can check this folder to find what it was fixed but not released yet https://github.com/open-telemetry/opentelemetry-operator/tree/main/.chloggen

This was referenced Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants