Operator fails when the pod is restarted and the Service Monitor for operator metrics was already created #3446

iblancasa · 2024-11-11T12:53:39Z

Component(s)

collector

What happened?

Description

Reported by @IshwarKanse (great job!)

Steps to Reproduce

Deploy the operator
Change something from the deployment. For instance, add an arg
Wait until the new pod fails

Actual Result

{"level":"INFO","timestamp":"2024-11-11T10:18:28.446310667Z","message":"All workers finished","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446315312Z","message":"Stopping and waiting for caches"}
W1111 10:18:28.446400       1 reflector.go:484] pkg/mod/k8s.io/client-go@v0.31.2/tools/cache/reflector.go:243: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446627238Z","message":"Stopping and waiting for webhooks"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446652862Z","logger":"controller-runtime.webhook","message":"Shutting down webhook server with timeout of 1 minute"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446740103Z","message":"Stopping and waiting for HTTP servers"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446760527Z","logger":"controller-runtime.metrics","message":"Shutting down metrics server with timeout of 1 minute"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.44679548Z","message":"shutting down server","name":"health probe","addr":"[::]:8081"}
{"level":"INFO","timestamp":"2024-11-11T10:18:28.446854274Z","message":"Wait completed, proceeding to shutdown the manager"}
{"level":"ERROR","timestamp":"2024-11-11T10:18:28.454439947Z","message":"error received after stop sequence was engaged","error":"leader election lost","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\t/Users/ikanse/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.1/pkg/manager/internal.go:512"}
{"level":"ERROR","timestamp":"2024-11-11T10:18:28.454409611Z","logger":"setup","message":"problem running manager","error":"error creating service monitor: servicemonitors.monitoring.coreos.com \"opentelemetry-operator-metrics-monitor\" already exists","stacktrace":"main.main\n\t/Users/ikanse/opentelemetry-operator/main.go:517\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271"}

Kubernetes Version

.

Operator version

.

Collector version

.

Environment information

No response

Log output

No response

Additional context

No response

Starefossen · 2024-11-27T09:12:08Z

This should not have been marked as solved as the patch is still not available in any released version of the operator.

iblancasa · 2024-11-27T09:20:24Z

This should not have been marked as solved as the patch is still not available in any released version of the operator.

Thanks for your comment.

Issues are closed when merged. You can check this folder to find what it was fixed but not released yet https://github.com/open-telemetry/opentelemetry-operator/tree/main/.chloggen

iblancasa added bug Something isn't working needs triage labels Nov 11, 2024

iblancasa self-assigned this Nov 11, 2024

iblancasa mentioned this issue Nov 11, 2024

Fix error when the operator metrics ServiceMonitor already exists #3447

Merged

pavolloffay closed this as completed in #3447 Nov 12, 2024

IshwarKanse mentioned this issue Nov 22, 2024

[Chore] Test operator restart #3486

Merged

This was referenced Nov 27, 2024

Release v0.114.0 #3498

Merged

Release v0.113.1 #3501

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator fails when the pod is restarted and the Service Monitor for operator metrics was already created #3446

Operator fails when the pod is restarted and the Service Monitor for operator metrics was already created #3446

iblancasa commented Nov 11, 2024

Starefossen commented Nov 27, 2024

iblancasa commented Nov 27, 2024

Operator fails when the pod is restarted and the Service Monitor for operator metrics was already created #3446

Operator fails when the pod is restarted and the Service Monitor for operator metrics was already created #3446

Comments

iblancasa commented Nov 11, 2024

Component(s)

What happened?

Description

Steps to Reproduce

Actual Result

Kubernetes Version

Operator version

Collector version

Environment information

Log output

Additional context

Starefossen commented Nov 27, 2024

iblancasa commented Nov 27, 2024