Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetryCollector ClusterIP and Headless Service are indistinguishable to a LabelSelector #898

Closed
TBBle opened this issue May 28, 2022 · 4 comments · Fixed by #1088
Closed
Labels
area:collector Issues for deploying collector

Comments

@TBBle
Copy link

TBBle commented May 28, 2022

Our use-case is running the Prometheus metrics exporter in the OpenTelemetry Operator-created OpenTelemetry Collector Deployment.

To activate this, we expected to create an OpenTelemetryCollector object, which comes with a Service, and then we create a ServiceMonitor which uses a LabelSelector to match that Service; then Prometheus Operator extracts the Endpoints from the selected Service, and configures Prometheus to scrape them.

However, OpenTelemetry Operator is creating two Services (ClusterIP and headless) and there are no distinctions in the labels, only the name and one extra annotation. This causes Prometheus to see two scrape targets for each Pod in the Deployment, distinguished only by the resulting job and service labels (and hence distinct metrics with duplicate data).

The simplest solution I see is if the headless service gets an extra specific label, e.g., operator.opentelemetry.io/collector-headless-service then that could be matched using Exists or DoesNotExist to distinguish the two services.

Edit: We were exploring using PodMonitor to work around this, but a colleague pointed out to me that the Pods created by OpenTelemetry Operator do not actually declare their ports, and the PodMonitor requires a named port to scrape. So that's a no-go.

@nlamirault
Copy link

nlamirault commented Aug 29, 2022

I've got the same problem :

2022-08-29T15:52:28.908Z        error   prometheusexporter@v0.56.0/collector.go:237     failed to convert metric otelcol_otelsvc_k8s_pod_added: duplicate label names   {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.56.0/collector.go:237
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
        github.com/prometheus/client_golang@v1.12.2/prometheus/registry.go:448
2022-08-29T15:52:28.908Z        error   prometheusexporter@v0.56.0/collector.go:237     failed to convert metric otelcol_receiver_accepted_metric_points: duplicate label names {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.56.0/collector.go:237
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
        github.com/prometheus/client_golang@v1.12.2/prometheus/registry.go:448

Resources created :

❯ kubectl -n opentelemetry get opentelemetrycollector
NAME                               MODE          VERSION   AGE
opentelemetry-operator-collector   statefulset   0.56.0    62m
❯ kubectl -n opentelemetry get svc
NAME                                                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
opentelemetry-operator-collector-collector                  ClusterIP   10.244.28.11    <none>        9090/TCP,4317/TCP,4318/TCP   42m
opentelemetry-operator-collector-collector-headless         ClusterIP   None            <none>        9090/TCP,4317/TCP,4318/TCP   42m
opentelemetry-operator-collector-collector-monitoring       ClusterIP   10.244.30.229   <none>        8888/TCP                     42m
opentelemetry-operator-controller-manager-metrics-service   ClusterIP   10.244.23.112   <none>        8443/TCP,8080/TCP            45m
opentelemetry-operator-webhook-service                      ClusterIP   10.244.24.226   <none>        443/TCP                      45m
❯ kubectl -n opentelemetry get servicemonitors
NAME                               AGE
opentelemetry-operator             34m
opentelemetry-operator-collector   45m

@kristinapathak
Copy link
Contributor

I plan to work on this. I'm not sure if the headless service is always needed? Regardless, I think we still need to add a label.

@TBBle
Copy link
Author

TBBle commented Sep 2, 2022

I think the headless service is there because there's (I read somewhere) an advantage to using gRPC's load-balancing over k8s' load balancing when using the the gRPC exporter to feed the collector. See #595 for discussion/proposed documentation of this, but note that it didn't actually achieve consensus.

For the record, the workaround we used was to change the Prometheus exporter to use port 8888, (and moved the internal Prometheus feed to 8889 which doesn't need to be exposed because we scrape that in the same collector to include it in our general metrics feed), and then pointed the ServiceMonitor at the -collector-monitoring Service which already exposes port 8888.

@pavolloffay pavolloffay added the area:collector Issues for deploying collector label Sep 2, 2022
@mingh2
Copy link

mingh2 commented Nov 10, 2022

+1 to this. A simple fix would be just changing the app.kubernetes.io/name label.
For ClusterIp object, the name is in the format {}-collector and for headless service, the name is {}-headless. However, the app.kubernetes.io/name label is the same for both object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:collector Issues for deploying collector
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants