Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in scrape job definition pass silently #633

Open
Abuelodelanada opened this issue Jul 17, 2024 · 1 comment
Open

Error in scrape job definition pass silently #633

Abuelodelanada opened this issue Jul 17, 2024 · 1 comment

Comments

@Abuelodelanada
Copy link
Contributor

Bug Description

Prometheus is unable to unmarshal params sent through Prometheus scrape target.
Prometheus remains in Active state

To Reproduce

  1. Deploy Prometheus: juju deploy prometheus-k8s prom --channel edge --trust
  2. Deploy Prometheus scrape target: juju deploy prometheus-scrape-target-k8s scrape --channel edge
  3. Config Prometheus scrape target:
    • juju config prometheus-scrape-target-k8s targets=192.168.0.248:9116
    • juju config prometheus-scrape-target-k8s labels="job:cumulus"
    • juju config prometheus-scrape-target-k8s metrics_path="/snmp"
    • juju config scrape params='{"auth": "snmp_v3", "module": "if_mib_if_name", "target": "192.168.100.200"}'
  4. Relate Prometheus to Prometheus scrape target: juju relate prom scrape
  5. Verify this scrape job is not included in Prometheus:
    $ juju ssh --container prometheus prom/0 cat /etc/prometheus/prometheus.yml                                                            
    global:
      evaluation_interval: 1m
      scrape_interval: 1m
      scrape_timeout: 10s
    rule_files:
    - /etc/prometheus/rules/juju_*.rules
    scrape_configs:
    - honor_timestamps: true
      job_name: prometheus
      metrics_path: /metrics
      relabel_configs:
      - regex: (.*)
        separator: _
        source_labels:
        - juju_model
        - juju_model_uuid
        - juju_application
        - juju_unit
        target_label: instance
      scheme: http
      scrape_interval: 5s
      scrape_timeout: 5s
      static_configs:
      - labels:
          host: localhost
          juju_application: prom
          juju_charm: prometheus-k8s
          juju_model: dmytro
          juju_model_uuid: 67755b6d-9410-46d9-8617-ee7c87d285c2
          juju_unit: prom/0
        targets:
        - prom-0.prom-endpoints.dmytro.svc.cluster.local:9090

Alternatively it is possible to use this bundle:

bundle: kubernetes
applications:
  prom:
    charm: prometheus-k8s
    channel: latest/edge
    revision: 210
    resources:
      prometheus-image: 149
    scale: 1
    constraints: arch=amd64
    storage:
      database: kubernetes,1,1024M
    trust: true
  scrape:
    charm: prometheus-scrape-target-k8s
    channel: latest/edge
    revision: 34
    scale: 1
    options:
      labels: job:cumulus
      params: '{"auth": "snmp_v3", "module": "if_mib_if_name", "target": "192.168.100.200"}'
      targets: 192.168.0.248:9116
    constraints: arch=amd64
relations:
- - prom:metrics-endpoint
  - scrape:metrics-endpoint

Environment

Model   Controller  Cloud/Region        Version  SLA          Timestamp
dmytro  microk8s    microk8s/localhost  3.5.2    unsupported  16:23:20-03:00

App     Version  Status  Scale  Charm                         Channel      Rev  Address        Exposed  Message
prom    2.52.0   active      1  prometheus-k8s                latest/edge  210  10.152.183.22  no       
scrape  n/a      active      1  prometheus-scrape-target-k8s  latest/edge   34  10.152.183.36  no       

Unit       Workload  Agent  Address     Ports  Message
prom/0*    active    idle   10.1.9.252         
scrape/0*  active    idle   10.1.9.217         

Integration provider     Requirer               Interface          Type     Message
prom:prometheus-peers    prom:prometheus-peers  prometheus_peers   peer     
scrape:metrics-endpoint  prom:metrics-endpoint  prometheus_scrape  regular 

Relevant log output

unit-prom-0: 16:07:08.634 INFO unit.prom/0.juju-log metrics-endpoint:3: reqs=ResourceRequirements(claims=None, limits={}, requests={'cpu': '0.25', 'memory': '200Mi'}), templated=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'}), actual=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'})
unit-prom-0: 16:07:08.672 DEBUG unit.prom/0.juju-log metrics-endpoint:3: No alertmanagers available
unit-prom-0: 16:07:08.704 ERROR unit.prom/0.juju-log metrics-endpoint:3: Validating scrape jobs failed: b'time="2024-07-17T19:07:08Z" level=fatal msg="parsing YAML file /tmp/tmpe9yyw2pz: yaml: unmarshal errors:\\n  line 4: cannot unmarshal !!str `snmp_v3` into []string\\n  line 5: cannot unmarshal !!str `if_mib_...` into []string\\n  line 6: cannot unmarshal !!str `192.168...` into []string"\n'
unit-prom-0: 16:07:08.757 INFO unit.prom/0.juju-log metrics-endpoint:3: Pushed new configuration

Additional context

No response

@lucabello
Copy link
Contributor

We should also do this for alert rules. Currently, if you relate to cos-config and make a typo in one alert rule, all of them will disappear from Prometheus, and everything will stay in active/idle.

We should either validate on cos-config and set that to blocked, or validate in Prometheus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants