Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to scrape metrics through kubeletstats receiver #26481

Closed
anand3493 opened this issue Sep 6, 2023 · 8 comments
Closed

Unable to scrape metrics through kubeletstats receiver #26481

anand3493 opened this issue Sep 6, 2023 · 8 comments
Labels
bug Something isn't working needs triage New item requiring triage receiver/kubeletstats waiting for author

Comments

@anand3493
Copy link

Component(s)

receiver/kubeletstats

What happened?

Description

Getting error from Opentelemetry Collector agent pods -
scraperhelper/scrapercontroller.go:200 Error scraping metrics {"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get "https://ip-10-166-222-111.eu-west-1.compute.internal:10250/stats/summary\": dial tcp: lookup ip-10-166-222-111.eu-west-1.compute.internal on 172.20.0.10:53: no such host", "scraper": "kubeletstats"}

The nodes are present without doubt. Happening over all the pods of the collector daemonset.

Steps to Reproduce

Expected Result

To scrape metrics and send to the exporter

Actual Result

Erroring out at
scraperhelper/scrapercontroller.go:200 Error scraping metrics {"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get "https://ip-10-166-222-111.eu-west-1.compute.internal:10250/stats/summary\": dial tcp: lookup ip-10-166-222-111.eu-west-1.compute.internal on 172.20.0.10:53: no such host", "scraper": "kubeletstats"}

Collector version

v0.83.0

Environment information

Environment

Kubernetes

OpenTelemetry Collector configuration

exporters:
      otlp/data-prepper:
        endpoint: data-prepper:21891
        tls:
          insecure: true
    extensions:
      health_check: null
      memory_ballast:
        size_in_percentage: 20
    processors:
      batch:
        send_batch_max_size: 1000
        send_batch_size: 800
        timeout: 10s
      k8sattributes:
        extract:
          metadata:
          - k8s.namespace.name
          - k8s.deployment.name
          - k8s.statefulset.name
          - k8s.daemonset.name
          - k8s.cronjob.name
          - k8s.job.name
          - k8s.node.name
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.pod.start_time
        filter:
          node_from_env_var: K8S_NODE_NAME
        passthrough: false
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.ip
        - sources:
          - from: resource_attribute
            name: k8s.pod.uid
        - sources:
          - from: connection
      memory_limiter:
        check_interval: 1s
        limit_percentage: 70
        spike_limit_percentage: 30
    receivers:
      kubeletstats:
        auth_type: serviceAccount
        collection_interval: 20s
        endpoint: ${K8S_NODE_NAME}:10250
    service:
      extensions:
      - health_check
      - memory_ballast
      pipelines:
        metrics:
          exporters:
          - otlp/data-prepper
          processors:
          - k8sattributes
          - memory_limiter
          - batch
          receivers:
          - kubeletstats

Log output

error scraperhelper/scrapercontroller.go:200  Error scraping metrics  {"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get \"https://ip-10-166-222-111.eu-west-1.compute.internal:10250/stats/summary\": dial tcp: lookup ip-10-166-222-111.eu-west-1.compute.internal on 172.20.0.10:53: no such host", "scraper": "kubeletstats"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
        go.opentelemetry.io/collector/receiver@v0.83.0/scraperhelper/scrapercontroller.go:200
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
        go.opentelemetry.io/collector/receiver@v0.83.0/scraperhelper/scrapercontroller.go:176

Additional context

No response

@anand3493 anand3493 added bug Something isn't working needs triage New item requiring triage labels Sep 6, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

@anand3493 the receiver is unable to scrape metrics from endpoint: ${K8S_NODE_NAME}:10250. It seems to be complaining that your node name, ip-10-166-222-111.eu-west-1.compute.internal, is an invalid host name. For the receiver to work you'll need to provide a valid endpoint.

@anand3493
Copy link
Author

anand3493 commented Sep 6, 2023

@TylerHelmuth true that but am using the default configuration for the endpoint: ${K8S_NODE_NAME}:10250 and I can confirm the host name is a valid one as I see them through kubectl get nodes command.
Yet why the error is happening

@TylerHelmuth
Copy link
Member

Are you able to hit the endpoint successfully?

@sspieker
Copy link

We're having this issue as well, even on version 0.85:

2023-09-14T08:12:11.091Z        error   kubeletstatsreceiver@v0.85.0/scraper.go:68
call to /stats/summary endpoint failed
 {"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get \"https://<#####>:10250/stats/summary\": dial tcp: lookup <#####> on 100.64.0.10:53: no such host"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver.(*kubletScraper).scrape

We basically followed the getting-started installation process, enabling the kubeletMetrics preset and providing a custom otel endpoint.

However, we seem to have found a workaround by using the node's hostIP. According to downward-api/#available-fields the field status.hostIP should do the trick:

status.hostIP
the primary IP address of the node to which the Pod is assigned

In order to apply the workaround we performed these steps:

  1. clone https://github.com/open-telemetry/opentelemetry-helm-charts repo
  2. modify charts/opentelemetry-collector/templates/_pod.tpl template and add status.hostIP to env:
[...]
    env:
      - name: NODE_IP
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: status.hostIP
[...]
  1. configure values.yaml / kubeletstats to use this env variable:
[...]
config:
  receivers:
    kubeletstats:
      collection_interval: 20s
      auth_type: 'serviceAccount'
      endpoint: 'https://${env:NODE_IP}:10250'
[...]
  1. update daemonset using this modified template
helm upgrade otel-collector --values values.yaml <path-to-cloned-repo>/charts/opentelemetry-collector/

Maybe this is not the right thing to do, but nevertheless it might point in the right direction.

@TylerHelmuth
Copy link
Member

@sspieker if node name is not working for you then node IP is a valid workaround. You don't have to modify the helm chart tho, it supports added extra env vars:

mode: daemonset

presets:
  kubeletMetrics:
    enabled: true

extraEnvs:
  - name: NODE_IP
     valueFrom:
       fieldRef:
         apiVersion: v1
         fieldPath: status.hostIP

config:
  receivers:
    kubeletstats:
      endpoint: 'https://${env:NODE_IP}:10250'

@sspieker
Copy link

As it happens, that works too. Thanks @TylerHelmuth , this makes stuff quite a bit easier for us!

@anand3493
Copy link
Author

@TylerHelmuth This NODE_IP suggestion worked for me as well.

The us-east-1 based nodes has the private IP DNS Name in the format: ip-xx-xxx-xxx-xx.ec2.internal

The eu-west-1 based nodes has the private IP DNS Name in the format: ip-xx-xxx-xxx-xxx.eu-west-1.compute.internal

This may be the reason why NODE_NAME is not working on my European Cluster. NODE_IP works fine. .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage receiver/kubeletstats waiting for author
Projects
None yet
Development

No branches or pull requests

4 participants