Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/kubeletstats] Service account token is not getting reloaded resulting in 401 errors #26120

Closed
eplightning opened this issue Aug 28, 2023 · 7 comments

Comments

@eplightning
Copy link
Contributor

eplightning commented Aug 28, 2023

Component(s)

receiver/kubeletstats

What happened?

Description

After running for several weeks without any issues we started encounter issues with missing metrics from kubeletstatsreceiver. After inspecting the logs it seems that token used for authentication with kubelet is not getting automatically reloaded, resulting in 401 errors when scraping the data. I'm not 100% sure if that was caused by expiration or by Azure rotating service account token issuer.

We are also running k8sclusterreceiver on the same cluster which didn't seem to have any issues. Restarting collector instance cleared the issue.

Steps to Reproduce

  1. Run collector with kubeletreceiver
  2. Set short token lifetime and enforce it or rotate service account token issuer
  3. Kubelet will eventually rotate the token

Expected Result

kubeletreceiver should (possibly after few minutes of delay) automatically pick up new token and continue scraping data

Actual Result

kubeletreceiver stops scraping data due to 401 errors and requires manual restart

Collector version

0.81.0

Environment information

Environment

OS: Azure Linux
Compiler(if manually compiled): go 1.20
Kubernetes 1.25.11 (AKS)

OpenTelemetry Collector configuration

receivers:
  kubeletstats:
    collection_interval: 1m
    auth_type: serviceAccount
    endpoint: https://localhost:10250
    insecure_skip_verify: true
    metric_groups:
     - pod
     - container
     - volume
    extra_metadata_labels:
     - k8s.volume.type

exporters:
  logging:
    verbosity: normal
    sampling_initial: 5
    sampling_thereafter: 200
  otlp:
    endpoint: monitoring-agent-hub-collector-headless:4317
    compression: zstd
    tls:
      cert_file: /tls/incluster/tls.crt
      key_file: /tls/incluster/tls.key
      ca_file: /tls/incluster/ca.crt
      reload_interval: 24h

processors:
  batch:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
  k8sattributes:
    passthrough: true
  resource:
    attributes:
    - key: k8s.node.name
      value: ${env:NODE_NAME}
      action: upsert

extensions:
  health_check:

service:
  extensions: [health_check]
  pipelines:
    metrics:
      receivers: [kubeletstats]
      processors: [memory_limiter, k8sattributes, batch, resource]
      exporters: [otlp]
  telemetry:
    metrics:
      address: 127.0.0.1:4319

Log output

2023-08-28T15:21:21.721Z	error	scraperhelper/scrapercontroller.go:213	Error scraping metrics	{"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "kubelet request GET https://localhost:10250/stats/summary failed - \"401 Unauthorized\", response: \"Unauthorized\"", "scraper": "kubeletstats"}
	go.opentelemetry.io/collector/receiver@v0.81.0/scraperhelper/scrapercontroller.go:192
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
	go.opentelemetry.io/collector/receiver@v0.81.0/scraperhelper/scrapercontroller.go:210
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
	go.opentelemetry.io/collector/receiver@v0.81.0/scraperhelper/scraper.go:20
go.opentelemetry.io/collector/receiver/scraperhelper.ScrapeFunc.Scrape
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver@v0.81.0/scraper.go:68
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver.(*kubletScraper).scrape



unit="kubelet.service" log="E0828 15:25:21.715563    1694 server.go:291] "Unable to authenticate the request due to an error" err="[invalid bearer token, square/go-jose: error in cryptographic primitive, the server has asked for the client to provide credentials]""

Additional context

No response

@eplightning eplightning added bug Something isn't working needs triage New item requiring triage labels Aug 28, 2023
@eplightning eplightning changed the title [receiver/kubeletstatsreceiver] Service account token is not getting reloaded resulting in 401 errors [receiver/kubeletstats] Service account token is not getting reloaded resulting in 401 errors Aug 28, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

Pinging code owners for internal/k8sconfig: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

Pinging code owners for internal/kubelet: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

Looking through the ways that the kubeletestatsreceiver and the k8sclusterreceiver handle service account authentication it does appear that both receivers grab the token 1 time and build their clients with it.

@eplightning
Copy link
Contributor Author

From my own quick research it seems to concern receivers that connect directly to the kubelet (so in my case, kubeletstats with auth_type: serviceAccount).

k8s_cluster / k8s_events which I'm running inside different collector seem to be running fine without any restarts so far. They seem to be using k8s.io/client-go/rest which I'd assume does reloading of its own.

@jinja2
Copy link
Contributor

jinja2 commented Aug 30, 2023

Yeah, the kubelet client in kubeletstats is not refreshing the SA token after startup. I see a quick fix, let me try it.

@crobert-1
Copy link
Member

#26316 fixed this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants