metric_relabel_configs drop action doesn't work #35720

arthur-observe · 2024-10-09T16:29:55Z

Component(s)

receiver/prometheus

What happened?

Description

I have a prometheus configured like this:

  prometheus/pod_metrics:
      config:
        scrape_configs:
        - job_name: pod-metrics
          scrape_interval: 10s
          honor_labels: true
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          # this is defaulted to keep so we start with everything
          - action: keep

          # Drop anything matching the configured namespace.
          - action: 'drop'
            source_labels: ['__meta_kubernetes_namespace']
            regex: (.*istio.*|.*ingress.*|kube-system)

          # Drop anything not matching the configured namespace.
          - action: 'keep'
            source_labels: ['__meta_kubernetes_namespace']
            regex: (default)
          # Maps all Kubernetes pod labels to Prometheus labels with the prefix removed (e.g., __meta_kubernetes_pod_label_app becomes app).
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)

          # adds new label
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace

          # adds new label
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name

          metric_relabel_configs:
            - action: drop
              regex: .*bucket
              source_labels:
                - __name__
            - action: keep
              regex: (.*)
              source_labels:
                - __name__

Namespace keep and drop rules seem to work fine but the metric_relabel_configs do not. I tested same on grafana agent and it works fine there.

Steps to Reproduce

Deployed this pod on my cluster which emits metrics as expected -

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: prometheus-example-app
  name: prometheus-example-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: prometheus-example-app
  template:
    metadata:
      labels:
        app.kubernetes.io/name: prometheus-example-app
      annotations:
        observeinc_com_scrape: 'true'
        observeinc_com_path: '/metrics'
        observeinc_com_port: '8080'
    spec:
      containers:
      - name: prometheus-example-app
        image: quay.io/brancz/prometheus-example-app:v0.3.0
        ports:
        - name: web
          containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-example-app-service
spec:
  selector:
    app.kubernetes.io/name: prometheus-example-app
  ports:
    - protocol: TCP
      port: 8080  # Exposed service port
      targetPort: 8080
      name: metrics
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: caller-cronjob
spec:
  schedule: "*/1 * * * *"  # Runs every minute
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: caller
            image: curlimages/curl:latest  # A lightweight curl image
            env:
              - name: SLEEP_TIME
                value: "10"  # Sleep time in seconds
              - name: LOOP_COUNT
                value: "36"   # Number of iterations
            command:
              - /bin/sh
              - -c
              - |
                for i in $(seq 1 $LOOP_COUNT); do
                  curl http://prometheus-example-app-service:8080;  # Adjust the URL and port as necessary

                  # Second call on even numbers
                  if [ $((i % 2)) -eq 0 ]; then
                    curl http://prometheus-example-app-service:8080/err;  # Second target service
                    echo "Second call on even #$i made."
                  fi
                  sleep $SLEEP_TIME;
                done
          restartPolicy: OnFailure

Expected Result

For this configuration I would expect this metrics to not be scraped - http_request_duration_seconds_bucket

Actual Result

It gets scraped and sent.

Collector version

0.111.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")
Using latest contrib image on eks

OpenTelemetry Collector configuration

relay:
----
extensions:
  # https://github.com/open-telemetry/opentelemetry-helm-charts/issues/816
  # 0.0.0.0 is hack for ipv6 on eks clusters
  health_check:
    endpoint: "${env:MY_POD_IP}:13133"

exporters:
  debug/override:
      verbosity: detailed
      sampling_initial: 2
      sampling_thereafter: 1
  prometheusremotewrite:
      endpoint: "YOURS"
      headers:
          authorization: "YOURS"
      resource_to_telemetry_conversion:
          enabled: true # Convert resource attributes to metric labels
      send_metadata: true

receivers:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/k8sclusterreceiver/documentation.md
  k8s_cluster:
    collection_interval: 60s
    metadata_collection_interval: 5m
    auth_type: serviceAccount
    node_conditions_to_report:
    - Ready
    - MemoryPressure
    - DiskPressure
    allocatable_types_to_report:
    - cpu
    - memory
    - storage
    - ephemeral-storage
    # defaults and optional - https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/k8sclusterreceiver/documentation.md
    metrics:
      k8s.node.condition:
        enabled: true
  prometheus/pod_metrics:
      config:
        scrape_configs:
        - job_name: pod-metrics
          scrape_interval: 10s
          honor_labels: true
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          # this is defaulted to keep so we start with everything
          - action: keep

          # Drop anything matching the configured namespace.
          - action: 'drop'
            source_labels: ['__meta_kubernetes_namespace']
            regex: (.*istio.*|.*ingress.*|kube-system)

          # Drop anything not matching the configured namespace.
          - action: 'keep'
            source_labels: ['__meta_kubernetes_namespace']
            regex: (default)

          # Drop endpoints without one of: a port name suffixed with the configured regex, or an explicit prometheus port annotation.
          - action: 'keep'
            source_labels: ['__meta_kubernetes_pod_container_port_name', '__meta_kubernetes_pod_annotation_prometheus_io_port']
            regex: '(.*metrics|web;|.*;\d+)'

          # Drop pods with phase Succeeded or Failed.
          - action: 'drop'
            regex: 'Succeeded|Failed'
            source_labels: ['__meta_kubernetes_pod_phase']


          ################################################################
          # Prometheus Configs
          # Drop anything annotated with 'prometheus.io.scrape=false'.
          - action: 'drop'
            regex: 'false'
            source_labels: ['__meta_kubernetes_pod_annotation_prometheus_io_scrape']

          # Allow pods to override the scrape scheme with 'prometheus.io.scheme=https'.
          - action: 'replace'
            regex: '(https?)'
            replacement: '$1'
            source_labels: ['__meta_kubernetes_pod_annotation_prometheus_io_scheme']
            target_label: '__scheme__'

          # Allow service to override the scrape path with 'prometheus.io.path=/other_metrics_path'.
          - action: 'replace'
            regex: '(.+)'
            replacement: '$1'
            source_labels: ['__meta_kubernetes_pod_annotation_prometheus_io_path']
            target_label: '__metrics_path__'

          # Allow services to override the scrape port with 'prometheus.io.port=1234'.
          - action: 'replace'
            regex: '(.+?)(\:\d+)?;(\d+)'
            replacement: '$1:$3'
            source_labels: ['__address__', '__meta_kubernetes_pod_annotation_prometheus_io_port']
            target_label: '__address__'

          ################################################################

          # Maps all Kubernetes pod labels to Prometheus labels with the prefix removed (e.g., __meta_kubernetes_pod_label_app becomes app).
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)

          # adds new label
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace

          # adds new label
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name

          metric_relabel_configs:
            - action: drop
              regex: .*bucket
              source_labels:
                - __name__
            - action: keep
              regex: (.*)
              source_labels:
                - __name__


  

processors:
  # https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md
  memory_limiter:
    # check_interval is the time between measurements of memory usage for the
    # purposes of avoiding going over the limits. Defaults to zero, so no
    # checks will be performed. Values below 1 second are not recommended since
    # it can result in unnecessary CPU consumption.
    check_interval: 5s
    # limit_percentage (default = 0): Maximum amount of total memory targeted to be allocated by the process heap.
    # This configuration is supported on Linux systems with cgroups and it's intended to be used in dynamic platforms like docker.
    # This option is used to calculate memory_limit from the total available memory.
    # For instance setting of 75% with the total memory of 1GiB will result in the limit of 750 MiB.
    # The fixed memory setting (limit_mib) takes precedence over the percentage configuration.
    limit_percentage: 75
    # spike_limit_percentage (default = 0): Maximum spike expected between the measurements of memory usage.
    # The value must be less than limit_percentage.
    # This option is used to calculate spike_limit_mib from the total available memory.
    # For instance setting of 25% with the total memory of 1GiB will result in the spike limit of 250MiB.
    # This option is intended to be used only with limit_percentage.
    spike_limit_percentage: 25
  batch:
    send_batch_size: 4096
    send_batch_max_size: 4096
  k8sattributes:
    extract:
      metadata:
      - k8s.namespace.name
      - k8s.deployment.name
      - k8s.replicaset.name
      - k8s.statefulset.name
      - k8s.daemonset.name
      - k8s.cronjob.name
      - k8s.job.name
      - k8s.node.name
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.cluster.uid
      - k8s.node.name
      - k8s.node.uid
    passthrough: false
    pod_association:
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.uid
    - sources:
      - from: connection
  attributes/observe_common:
    actions:
      - key: k8s.cluster.name
        action: insert
        value: ${env:CLUSTER_NAME}
      - key: k8s.cluster.uid
        action: insert
        value:  ${env:CLUSTER_UID}
        


  # attributes to append to objects
  attributes/debug_source_cluster_metrics:
    actions:
      - key: debug_source
        action: insert
        value: cluster_metrics
  attributes/debug_source_pod_metrics:
    actions:
      - key: debug_source
        action: insert
        value: pod_metrics

service:
  extensions: [health_check]
  pipelines:
      metrics:
        receivers: [k8s_cluster]
        processors: [memory_limiter, batch, k8sattributes, attributes/observe_common, attributes/debug_source_cluster_metrics]
        exporters: [prometheusremotewrite, debug/override]
      metrics/pod_metrics:
        receivers: [prometheus/pod_metrics]
        processors: [memory_limiter, batch, k8sattributes, attributes/observe_common, attributes/debug_source_pod_metrics]
        exporters: [prometheusremotewrite, debug/override]
      
  telemetry:
      metrics:
        level: normal
        address: ${env:MY_POD_IP}:8888
      logs:
        level: DEBUG
        encoding: console

Log output

No errors

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-09T16:30:17Z

Pinging code owners:

receiver/prometheus: @Aneurysm9 @dashpole

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dashpole · 2024-10-09T16:58:56Z

Can you fix the formatting of the issue above? We pass through the configuration you provide to prometheus server code without modification, so it would be strange for behavior to differ between the Prometheus server and the collector. Can you reproduce the issue with the prometheus server?

arthur-observe · 2024-10-09T17:03:35Z

Can you fix the formatting of the issue above? We pass through the configuration you provide to prometheus server code without modification, so it would be strange for behavior to differ between the Prometheus server and the collector. Can you reproduce the issue with the prometheus server?

Sorry new to submitting issues here - I think I cleaned up formatting - any other issues with submission?

trying on astronomy shop - will post results

dashpole · 2024-10-09T17:30:13Z

There is a bunch of yaml above that isn't wrapped in a yaml markdown block, and is hard to read

arthur-observe · 2024-10-09T23:04:57Z

ok tried in astronomy shop using grafana to view and it doesn't seem to work there either - added metric_relabel_configs to values file

  serverFiles:
    prometheus.yml:
      scrape_configs:
        - job_name: 'otel-collector'
          honor_labels: true
          kubernetes_sd_configs:
            - role: pod
              namespaces:
                own_namespace: true
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_opentelemetry_community_demo]
              action: keep
              regex: true
          metric_relabel_configs:
            - action: drop
              regex: http_server_duration_seconds_bucket
              source_labels:
                - __name__

NathanNam · 2024-10-29T19:46:38Z

Any updates?

dashpole · 2024-10-30T02:46:14Z

I don't see anything obviously wrong with the config, but I suspect the regex isn't working properly. Can you try

(.*)bucket instead of .*bucket? All of the examples I can find seem to use parenthesis around wildcards.

The only other thing that I would try is removing the unnecessary action: keep block.

github-actions · 2024-12-30T03:35:20Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/prometheus: @Aneurysm9 @dashpole

See Adding Labels via Comments if you do not have permissions to add labels yourself.

arthur-observe added bug Something isn't working needs triage New item requiring triage labels Oct 9, 2024

github-actions bot added the receiver/prometheus Prometheus receiver label Oct 9, 2024

dashpole removed the needs triage New item requiring triage label Oct 9, 2024

dashpole self-assigned this Oct 9, 2024

github-actions bot mentioned this issue Oct 15, 2024

Weekly Report: 2024-10-08 - 2024-10-15 #35785

Closed

github-actions bot added the Stale label Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metric_relabel_configs drop action doesn't work #35720

metric_relabel_configs drop action doesn't work #35720

arthur-observe commented Oct 9, 2024 •

edited

Loading

github-actions bot commented Oct 9, 2024

dashpole commented Oct 9, 2024

arthur-observe commented Oct 9, 2024 •

edited

Loading

dashpole commented Oct 9, 2024

arthur-observe commented Oct 9, 2024 •

edited

Loading

NathanNam commented Oct 29, 2024

dashpole commented Oct 30, 2024

github-actions bot commented Dec 30, 2024

metric_relabel_configs drop action doesn't work #35720

metric_relabel_configs drop action doesn't work #35720

Comments

arthur-observe commented Oct 9, 2024 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Oct 9, 2024

dashpole commented Oct 9, 2024

arthur-observe commented Oct 9, 2024 • edited Loading

dashpole commented Oct 9, 2024

arthur-observe commented Oct 9, 2024 • edited Loading

NathanNam commented Oct 29, 2024

dashpole commented Oct 30, 2024

github-actions bot commented Dec 30, 2024

arthur-observe commented Oct 9, 2024 •

edited

Loading

arthur-observe commented Oct 9, 2024 •

edited

Loading

arthur-observe commented Oct 9, 2024 •

edited

Loading