Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg="target not found" for standard kube-state-metrics #279

Open
forestoden opened this issue Apr 6, 2021 · 3 comments
Open

msg="target not found" for standard kube-state-metrics #279

forestoden opened this issue Apr 6, 2021 · 3 comments

Comments

@forestoden
Copy link

I am trying to set up the stackdriver-prometheus-sidecar to push a few CronJob/Job metrics from kube-state-metrics to Stackdriver. I'm running into an issue where no matter what I do, all of the metrics report

level=debug ts=2021-04-06T22:10:39.947Z caller=series_cache.go:369 component="Prometheus reader" msg="target not found" labels="{__name__=\"kube_cronjob_next_schedule_time\",container=\"kube-state-metrics\",cronjob=\"cronjob\",endpoint=\"http\",instance=\"10.8.6.2:8080\",job=\"kube-state-metrics\",namespace=\"production\",pod=\"kube-prometheus-stack-kube-state-metrics-bbf56d7f5-dss8c\",service=\"kube-prometheus-stack-kube-state-metrics\"}"

Here is my config for the sidecar:

  - args:
    - --stackdriver.project-id=<project>
    - --prometheus.wal-directory=/prometheus/wal
    - --stackdriver.kubernetes.location=us-central1
    - --stackdriver.kubernetes.cluster-name=<cluster>
    - --include=kube_cronjob_next_schedule_time{namespace="production"}
    - --log.level=debug
    image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.8.2

I am using the Prometheus operator, with Prometheus version 2.18. I tried a couple different versions (up to 2.22) with no luck.

I am not seeing any metrics get to Stackdriver, I've tried adding --stackdriver.store-in-files-directory=/prometheus/sd and see a file get created but nothing is written to it, so it doesn't seem like a permissions issue there.

For the --include flag, I've tried a number of different ways with no luck.

I found #104 which highlights a similar log message but I think that use case is a bit more complex than this

@forestoden
Copy link
Author

I dug into the code a bit and determined what the issue is but I'm not sure how it could be fixed given how the code works today.

The issue stems from the target look up, and getting a target from the Cache. We make a call

	t, _ := targetMatch(ts, lset)

that attempts to "return the first target in the entry that matches all labels of the input set iff it has them set." Prometheus targets have a namespace label. For kube-state-metrics deployments, in most cases, this namespace will not be the same as the workloads that it monitors. This leads you to a scenario where targetMatcher is going to iterate over a list of targets that match job and instance labels of the metric and check that all labels match and it fails to match on namespace because kube-state-metrics is not in the same namespace as the workload.

I have fixed this by just deploying kube-state-metrics in my production namespace as that covers my use-case. This is almost certainly not viable for all cases, for example, deploying a workload per namespace would make this tricky as you'd have to deploy multiple kube-state-metrics. Filtering out namespaces from targetMatch seems hacky so I'm hesitant to suggest that.

@vmcalvo
Copy link

vmcalvo commented May 18, 2021

I have had the same problem with this sidecar and kube-state-metrics, in my case the only solution I have found is to modify the Prometheus ServiceMonitor (I am using https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/templates )

The serviceMonitor that it generates for the kube-state-metrics metrics scrape takes a literal value for honorLabels of true:
https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/exporters/kube-state-metrics/serviceMonitor.yaml

Changing it to false I get that in the face of the label namespace conflict it generates 2 labels:

  • namespace: which is kept as the name of the namespace where I have prometheus and kube-state-metrics deployed
  • exported_namespace: which is the namespace of the object monitored by kube-state-metrics

I have not reviewed all the metrics but I suppose that some will exceed the 10 labels because of this, perhaps in such cases a relabeling can be performed to delete the labels that I do not need.

@jinnovation
Copy link

Building on @forestoden and @vmcalvo's findings, my recent comment in #229 might be relevant as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants