Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheusreceiver: Unable to relabel job and instance labels using relabel_configs and metric_relabel_configs in prometheus receiver #5663

Closed
rashmichandrashekar opened this issue Oct 6, 2021 · 8 comments
Labels
bug Something isn't working comp:prometheus Prometheus related issues

Comments

@rashmichandrashekar
Copy link
Contributor

rashmichandrashekar commented Oct 6, 2021

Describe the bug
When using relabel_config to relabel job or using metric_relabel_configs to relabel job/instance labels, the collector fails to scrape metrics

Steps to reproduce
Provide target_label as job in the relabel_config or job/instance in metric_relabel_configs.

What did you expect to see?
Job/instance label should be replaced with the desired replacement value

What did you see instead?
When using relabel_config with target_label job, saw this error -
{"level":"warn","ts":1633406450.405102,"caller":"scrape/scrape.go:1104", "msg":"Appending scrape report failed","kind":"receiver","name":"prometheus","scrape_pool":"prometheus_ref_app", "target":"http://10.244.1.76:2112/metrics","err":"unable to find a target group with job=job_replacement"}

When using metric_relabel_config saw this error -
{"level":"debug","ts":1633485477.928548,"caller":"scrape/scrape.go:1355","msg":"Unexpected error","kind":"receiver",
# "name":"prometheus","scrape_pool":"prometheus_ref_app","target":"http://10.244.1.76:2112/metrics",
# "series":"go_gc_duration_seconds{quantile="0"}","err":"unable to find a target with job=prometheus_ref_app,
# and instance=instance_replacement"}

What version did you use?
Version: v0.27.0

What config did you use?
Config: (e.g. the yaml config file)

global:
  evaluation_interval: 5s
  scrape_interval: 5s
scrape_configs:
scrape_configs:
- job_name: prometheus_ref_app
  scheme: http
  kubernetes_sd_configs:
    - role: pod
  relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      action: keep
      regex: "prometheus-reference-app"
    - source_labels: [__address__]
      replacement: job_replacement
      target_label: job
 metric_relabel_configs:
    - source_labels: [__address__]
      replacement: instance_replacement
      target_label: instance

Environment
OS:"Ubuntu 20.04"
Compiler(if manually compiled): go 1.14

@rashmichandrashekar rashmichandrashekar added the bug Something isn't working label Oct 6, 2021
@rashmichandrashekar rashmichandrashekar changed the title Unable to relabel job and instance labels using relabel_config and metric_relabel_config Unable to relabel job and instance labels using relabel_config and metric_relabel_config in prometheus receiver Oct 6, 2021
@rashmichandrashekar rashmichandrashekar changed the title Unable to relabel job and instance labels using relabel_config and metric_relabel_config in prometheus receiver Unable to relabel job and instance labels using relabel_configs and metric_relabel_configs in prometheus receiver Oct 6, 2021
@rashmichandrashekar rashmichandrashekar changed the title Unable to relabel job and instance labels using relabel_configs and metric_relabel_configs in prometheus receiver prometheusreceiver: Unable to relabel job and instance labels using relabel_configs and metric_relabel_configs in prometheus receiver Oct 8, 2021
@rashmichandrashekar
Copy link
Contributor Author

rashmichandrashekar commented Oct 8, 2021

Upon some investigation, I see that the code here(

)
is trying to do a lookup of targets based on the newly relabelled values -

  1. For relabel_config with 'job' label's value replaced with job_replacement, this code tries to lookup a target with job='job_replacement' and it fails to find it since targetsAll()(https://github.com/prometheus/prometheus/blob/b878527151e6503d24ac5b667b86e8794eb79ff7/scrape/manager.go#L288) returns targets with primary key set to the value in scrape pool which is the value in the config which is 'prometheus_ref_app' in this case.
  2. For metric_relabel_config with 'job' label's value replaced with job_replacement, this code tries to lookup a target with job='job_replacement' and it fails to find it since there is no target with job label's value set to 'job_replacement'.
  3. 2nd point above applies to instance relabelling too(job not relabelled but instance label's value is changed to instance_replacement) since once the target with the job label is found, the Get() method tries to find a target with the replaced instance value and it fails to find it.
    And in the case where both job and instance label values are changed for the metric, the first lookup for the job itself fails and the otelcollector fails to send metrics

@rashmichandrashekar
Copy link
Contributor Author

rashmichandrashekar commented Oct 13, 2021

@dashpole - As discussed in the sig this morning, verified that this works in prometheus with the same relabel_configs.
This seems to be an issue specific to prometheus receivers' way of identifying the target from the job/instance labels.

@GlowingRuby
Copy link

I meet the same problem.Is someone working on this?

@dashpole
Copy link
Contributor

When you tried with prometheus, did you see metadata (e.g. description) for the relabeled metric? It might be that prometheus allows relabeling job+instance, but you lose metadata when doing so. The fix might be something similar to what we decided in #5001 (comment), where we should be allowing users to relabel these labels, but should pass them on without metadata.

@rashmichandrashekar
Copy link
Contributor Author

rashmichandrashekar commented Dec 2, 2021

@dashpole - Sorry for the delay. I tried it with prometheus and I see that the metadata seems to be available when i tried a few targets. I queried the target metadata api to view the metadata. Was that what you were looking for?
For one of the tests I configured 2 scrape jobs for the metrics_path - /metrics/cadvisor - one with job relabelled and the other without and I see metadata for both -
image

@jpkrohling jpkrohling added the comp:prometheus Prometheus related issues label Dec 2, 2021
@dashpole
Copy link
Contributor

dashpole commented Dec 2, 2021

Thats really interesting. In theory, we should be querying the target metadata when we go check for metadata. It probably needs more digging

@dashpole
Copy link
Contributor

metric relabel configs + job

With prometheus config:

scrape_configs:
- job_name: 'metric-relabel-config'
  scrape_interval: 10s
  static_configs:
  - targets: ['0.0.0.0:8888'] #self obs for collector/prometheus
  metric_relabel_configs:
  - source_labels: [__address__]
    replacement: job_replacement
    target_label: job
...

The collector outputs:

2021-12-21T13:12:14.492-0800	debug	scrape/scrape.go:1460	Unexpected error	{"kind": "receiver", "name": "prometheus", "scrape_pool": "metric-relabel-config", "target": "http://0.0.0.0:8888/metrics", "series": "otelcol_exporter_enqueue_failed_log_records{exporter=\"logging\",service_instance_id=\"ec4e2a8b-8b41-458a-9d36-f69e5d86bc33\",service_version=\"latest\"}", "error": "unable to find a target group with job=job_replacement"}
2021-12-21T13:12:14.492-0800	debug	scrape/scrape.go:1248	Append failed	{"kind": "receiver", "name": "prometheus", "scrape_pool": "metric-relabel-config", "target": "http://0.0.0.0:8888/metrics", "error": "unable to find a target group with job=job_replacement"}
2021-12-21T13:12:14.492-0800	warn	internal/metricsbuilder.go:125	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1640121134490, "target_labels": "map[__name__:up instance:0.0.0.0:8888 job:metric-relabel-config]"}

The prometheus server shows

From (/api/v1/targets/metadata) it the metric with the old metadata:

{"target":{"instance":"0.0.0.0:9090","job":"metric-relabel-config"},"metric":"go_memstats_lookups_total","type":"counter","help":"Total number of pointer lookups.","unit":""}

But in the query UI, I can see:

go_memstats_lookups_total{instance="0.0.0.0:9090", job="job_replacement"}

metric relabel configs + instance

With prometheus config:

scrape_configs:
- job_name: 'metric-relabel-config'
  scrape_interval: 10s
  static_configs:
  - targets: ['0.0.0.0:8888'] #self obs for collector/prometheus
  metric_relabel_configs:
  - source_labels: [__address__]
    replacement: instance_replacement
    target_label: instance
...

The collector outputs:

2021-12-21T14:09:04.494-0800	debug	scrape/scrape.go:1460	Unexpected error	{"kind": "receiver", "name": "prometheus", "scrape_pool": "metric-relabel-config", "target": "http://0.0.0.0:8888/metrics", "series": "otelcol_exporter_enqueue_failed_log_records{exporter=\"logging\",service_instance_id=\"cc81fd05-120e-4772-8ea1-21fa903bee15\",service_version=\"latest\"}", "error": "unable to find a target with job=metric-relabel-config, and instance=instance_replacement"}
2021-12-21T14:09:04.495-0800	debug	scrape/scrape.go:1248	Append failed	{"kind": "receiver", "name": "prometheus", "scrape_pool": "metric-relabel-config", "target": "http://0.0.0.0:8888/metrics", "error": "unable to find a target with job=metric-relabel-config, and instance=instance_replacement"}
2021-12-21T14:09:04.495-0800	warn	internal/metricsbuilder.go:125	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1640124544492, "target_labels": "map[__name__:up instance:0.0.0.0:8888 job:metric-relabel-config]"}

The prometheus server shows

From (/api/v1/targets/metadata) it the metric with the old metadata:

{"target":{"instance":"0.0.0.0:9090","job":"metric-relabel-config"},"metric":"go_memstats_lookups_total","type":"counter","help":"Total number of pointer lookups.","unit":""}

I didn't see the new metadata.

But in the query UI, I can see:

go_memstats_lookups_total{instance="instance_replacement", job="metric-relabel-config"}

So if you tried to look for metric metadata with prometheus + grafana, the metric would be treated as Unknown.

With relabel_configs + job:

scrape_configs:
- job_name: 'relabel-config'
  scrape_interval: 10s
  static_configs:
  - targets: ['0.0.0.0:8888'] #self obs for collector/prometheus
  relabel_configs:
  - source_labels: [__address__]
    replacement: job_replacement
    target_label: job
...

The collector outputs:

2021-12-21T14:06:04.344-0800	debug	scrape/scrape.go:1460	Unexpected error	{"kind": "receiver", "name": "prometheus", "scrape_pool": "relabel-config", "target": "http://0.0.0.0:8888/metrics", "series": "otelcol_exporter_enqueue_failed_log_records{exporter=\"logging\",service_instance_id=\"3844962f-c8e6-4bb3-93a0-f9e74530bcb2\",service_version=\"latest\"}", "error": "unable to find a target group with job=job_replacement"}
2021-12-21T14:06:04.344-0800	debug	scrape/scrape.go:1248	Append failed	{"kind": "receiver", "name": "prometheus", "scrape_pool": "relabel-config", "target": "http://0.0.0.0:8888/metrics", "error": "unable to find a target group with job=job_replacement"}
2021-12-21T14:06:04.344-0800	warn	scrape/scrape.go:1203	Appending scrape report failed	{"kind": "receiver", "name": "prometheus", "scrape_pool": "relabel-config", "target": "http://0.0.0.0:8888/metrics", "error": "unable to find a target group with job=job_replacement"}

The prometheus server shows

From (/api/v1/targets/metadata) it the metric with the old metadata:

{"target":{"instance":"0.0.0.0:9090","job":"relabel-config"},"metric":"go_memstats_lookups_total","type":"counter","help":"Total number of pointer lookups.","unit":""}

But in the query UI, I can see:

go_memstats_lookups_total{instance="0.0.0.0:9090", job="job_replacement"}

TL;DR

The collector's behavior is always to drop points for which we are unable to find target metadata. The prometheus server's behavior is to treat them the same as unknown-typed metrics without metadata. The fix should be similar to #5001 (comment).

I'm not sure why our results differed looking at metadata, though. What endpoint were you querying?

gouthamve referenced this issue in gouthamve/opentelemetry-collector-contrib Apr 5, 2022
This fixes #5757 and #5663

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
jpkrohling added a commit that referenced this issue Apr 5, 2022
* Use target and metadata from context

This fixes #5757 and #5663

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add tests for relabeling working

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Use Prometheus main branch

prometheus/prometheus#10473 has been merged

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add back the tests

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Fix flaky test

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add Changelog entry

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add relabel test with the e2e framework

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Update receiver/prometheusreceiver/metrics_receiver_labels_test.go

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>

* Move changelog entry to unreleased

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

* Make lint pass

Needed to run make gotidy; make golint

strings.Title is deprecated

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
Co-authored-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@gouthamve
Copy link
Member

@dashpole I believe this can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working comp:prometheus Prometheus related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants