Error in scrape job definition pass silently #633

Abuelodelanada · 2024-07-17T19:30:51Z

Bug Description

Prometheus is unable to unmarshal params sent through Prometheus scrape target.
Prometheus remains in Active state

To Reproduce

Deploy Prometheus: juju deploy prometheus-k8s prom --channel edge --trust
Deploy Prometheus scrape target: juju deploy prometheus-scrape-target-k8s scrape --channel edge
Config Prometheus scrape target:
- juju config prometheus-scrape-target-k8s targets=192.168.0.248:9116
- juju config prometheus-scrape-target-k8s labels="job:cumulus"
- juju config prometheus-scrape-target-k8s metrics_path="/snmp"
- juju config scrape params='{"auth": "snmp_v3", "module": "if_mib_if_name", "target": "192.168.100.200"}'
Relate Prometheus to Prometheus scrape target: juju relate prom scrape

Verify this scrape job is not included in Prometheus:

$ juju ssh --container prometheus prom/0 cat /etc/prometheus/prometheus.yml                                                            
global:
  evaluation_interval: 1m
  scrape_interval: 1m
  scrape_timeout: 10s
rule_files:
- /etc/prometheus/rules/juju_*.rules
scrape_configs:
- honor_timestamps: true
  job_name: prometheus
  metrics_path: /metrics
  relabel_configs:
  - regex: (.*)
    separator: _
    source_labels:
    - juju_model
    - juju_model_uuid
    - juju_application
    - juju_unit
    target_label: instance
  scheme: http
  scrape_interval: 5s
  scrape_timeout: 5s
  static_configs:
  - labels:
      host: localhost
      juju_application: prom
      juju_charm: prometheus-k8s
      juju_model: dmytro
      juju_model_uuid: 67755b6d-9410-46d9-8617-ee7c87d285c2
      juju_unit: prom/0
    targets:
    - prom-0.prom-endpoints.dmytro.svc.cluster.local:9090

Alternatively it is possible to use this bundle:

bundle: kubernetes
applications:
  prom:
    charm: prometheus-k8s
    channel: latest/edge
    revision: 210
    resources:
      prometheus-image: 149
    scale: 1
    constraints: arch=amd64
    storage:
      database: kubernetes,1,1024M
    trust: true
  scrape:
    charm: prometheus-scrape-target-k8s
    channel: latest/edge
    revision: 34
    scale: 1
    options:
      labels: job:cumulus
      params: '{"auth": "snmp_v3", "module": "if_mib_if_name", "target": "192.168.100.200"}'
      targets: 192.168.0.248:9116
    constraints: arch=amd64
relations:
- - prom:metrics-endpoint
  - scrape:metrics-endpoint

Environment

Model   Controller  Cloud/Region        Version  SLA          Timestamp
dmytro  microk8s    microk8s/localhost  3.5.2    unsupported  16:23:20-03:00

App     Version  Status  Scale  Charm                         Channel      Rev  Address        Exposed  Message
prom    2.52.0   active      1  prometheus-k8s                latest/edge  210  10.152.183.22  no       
scrape  n/a      active      1  prometheus-scrape-target-k8s  latest/edge   34  10.152.183.36  no       

Unit       Workload  Agent  Address     Ports  Message
prom/0*    active    idle   10.1.9.252         
scrape/0*  active    idle   10.1.9.217         

Integration provider     Requirer               Interface          Type     Message
prom:prometheus-peers    prom:prometheus-peers  prometheus_peers   peer     
scrape:metrics-endpoint  prom:metrics-endpoint  prometheus_scrape  regular

Relevant log output

unit-prom-0: 16:07:08.634 INFO unit.prom/0.juju-log metrics-endpoint:3: reqs=ResourceRequirements(claims=None, limits={}, requests={'cpu': '0.25', 'memory': '200Mi'}), templated=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'}), actual=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'})
unit-prom-0: 16:07:08.672 DEBUG unit.prom/0.juju-log metrics-endpoint:3: No alertmanagers available
unit-prom-0: 16:07:08.704 ERROR unit.prom/0.juju-log metrics-endpoint:3: Validating scrape jobs failed: b'time="2024-07-17T19:07:08Z" level=fatal msg="parsing YAML file /tmp/tmpe9yyw2pz: yaml: unmarshal errors:\\n  line 4: cannot unmarshal !!str `snmp_v3` into []string\\n  line 5: cannot unmarshal !!str `if_mib_...` into []string\\n  line 6: cannot unmarshal !!str `192.168...` into []string"\n'
unit-prom-0: 16:07:08.757 INFO unit.prom/0.juju-log metrics-endpoint:3: Pushed new configuration

Additional context

No response

The text was updated successfully, but these errors were encountered:

lucabello · 2024-08-29T14:10:50Z

We should also do this for alert rules. Currently, if you relate to cos-config and make a typo in one alert rule, all of them will disappear from Prometheus, and everything will stay in active/idle.

We should either validate on cos-config and set that to blocked, or validate in Prometheus.

Abuelodelanada added Status: Triage Type: Bug Priority: High labels Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in scrape job definition pass silently #633

Error in scrape job definition pass silently #633

Abuelodelanada commented Jul 17, 2024

lucabello commented Aug 29, 2024

Error in scrape job definition pass silently #633

Error in scrape job definition pass silently #633

Comments

Abuelodelanada commented Jul 17, 2024

Bug Description

To Reproduce

Environment

Relevant log output

Additional context

lucabello commented Aug 29, 2024