Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter not working #93

Closed
anderson4u2 opened this issue Feb 22, 2019 · 6 comments
Closed

Filter not working #93

anderson4u2 opened this issue Feb 22, 2019 · 6 comments
Assignees

Comments

@anderson4u2
Copy link

anderson4u2 commented Feb 22, 2019

I'm setting the filter as follows:

        - '--filter=__name__="consumer_group_backlog_avg_10m"'

However when I trace the logs in the sidecar it stays here and doesn't seem to forward any metrics

level=info ts=2019-02-22T12:52:07.654292427Z caller=main.go:256 msg="Starting Stackdriver Prometheus sidecar" version="(version=0.4.0, branch=master, revision=e246041acf99c8487e1ac73552fb8625339c64a1)"
level=info ts=2019-02-22T12:52:07.654367128Z caller=main.go:257 build_context="(go=go1.11.4, user=kbuilder@kokoro-gcp-ubuntu-prod-217445279, date=20190221-15:24:24)"
level=info ts=2019-02-22T12:52:07.654414564Z caller=main.go:258 host_details="(Linux 4.14.65+ #1 SMP Thu Oct 25 10:42:50 PDT 2018 x86_64 prometheus-84b8bdf44-6kcw8 (none))"
level=info ts=2019-02-22T12:52:07.654645769Z caller=main.go:259 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-02-22T12:52:07.658270228Z caller=main.go:463 msg="Web server started"
level=info ts=2019-02-22T12:52:07.658797109Z caller=main.go:444 msg="Stackdriver client started"
level=info ts=2019-02-22T12:53:10.664382837Z caller=manager.go:150 component="Prometheus reader" msg="Starting Prometheus reader..."
level=info ts=2019-02-22T12:53:10.668043076Z caller=manager.go:211 component="Prometheus reader" msg="reached first record after start offset" start_offset=0 skipped_records=0

When I curl the prometheus server it should be querying metrics for it does seem to have the metric I'm trying to filter for:

root@myserver:/# curl prometheus:9090/api/v1/query?query=consumer_group_backlog_avg_10m | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   313  100   313    0     0  43745      0 --:--:-- --:--:-- --:--:-- 44714
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "consumer_group_backlog_avg_10m",
          "consumer_group": "media-extractor"
        },
        "value": [
          1550838129.789,
          "892269.2"
        ]
      },
      {
        "metric": {
          "__name__": "consumer_group_backlog_avg_10m",
          "consumer_group": "summarizer"
        },
        "value": [
          1550838129.789,
          "548159.4"
        ]
      }
    ]
  }
}

Does anything seem clearly wrong? What could be the issue? Thanks!

@jkohen
Copy link
Contributor

jkohen commented Feb 22, 2019

Thanks for the report. Nothing obvious jumps to mind. What is the full command-line? Note that if you have multiple filters, they all have to pass.

Do you see any metrics in Stackdriver if you remove all filters (the sidecar should forward all metrics by default)? Knowing this would help us eliminate the filter as a problem.

@jkohen jkohen self-assigned this Feb 22, 2019
@anderson4u2
Copy link
Author

anderson4u2 commented Feb 22, 2019

Hi Javier, thanks for picking this up.

I'm passing the args through a k8s manifest. The full command is:

      - name: stackdriver-prometheus-sidecar
        image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.4.0
        imagePullPolicy: Always
        args:
        - --stackdriver.project-id={{ project-id }}
        - --prometheus.wal-directory=/prometheus/data/wal
        - --stackdriver.kubernetes.location={{ gcp_region }}
        - --stackdriver.kubernetes.cluster-name={{ kube_cluster }}
        - --stackdriver.use-gke-resource
        - '--filter=__name__="consumer_group_backlog_avg_10m"'
        ports:
        - name: sidecar
          containerPort: 9091
        volumeMounts:
        - name: tmp-data-dir
          mountPath: /prometheus/data

Yes without filter all metrics are exported, and others filters work. For example the filter '--filter=consumer_group=~".+"' exports some kafka metrics. Unfortunately it doesn't export the metric I'm interested in, that also has the label consumer_group populated.

The problem may lie in a small detail, that the metric I'm trying to export (consumer_group_backlog_avg_10m) actually comes from a recording rule. The config is:

groups:
- name: consumer-groups
  rules:
  - record: consumer_group_backlog_avg_10m
    expr: avg_over_time(consumer_group_backlog_k8s[10m])
  - record: consumer_group_backlog_k8s
    expr: sum(kafka_consumer_group_total_lag) by (consumer_group)

The contents of the metric are in my initial post, this at least shows that is available in prometheus.
Thanks!

@jkohen
Copy link
Contributor

jkohen commented Feb 25, 2019

Anderson, thanks for the clarification. The issue is indeed caused by recorded rules, as you suspected. I can see two options:

  • You can ingest the raw metric into Stackdriver and use Stackdriver's query-time aggregations. In this case, ingest kafka_consumer_group_total_lag and query with 'mean aggregation' using a '10m window' and 'group by consumer_group label'. Would this work for your case? Raw metrics have the advantage that you can play with the data interactively in the Metrics Explorer and Stackdriver dashboard, filter and group by metadata, etc.
  • Add a static_metadata entry in the collector's config (docs). As long as you preserve the job and instance labels (i.e. don't aggregate them away in your query), which is the case in this rule, then it should work. If you go with this option, I would recommend ingesting consumer_group_backlog_k8s instead of consumer_group_backlog_avg_10m, so you can still change the aggregation (not the group-by) at query time.

I will add an entry to our website explaining this in a bit more detail. Thanks for bringing it up!

@jkohen
Copy link
Contributor

jkohen commented Feb 26, 2019

I'm going to go ahead and close this request. If there's anything else I can do to help you, let me know.

@jkohen jkohen closed this as completed Feb 26, 2019
@anderson4u2
Copy link
Author

Thanks a lot for your reply!
The first solution wouldn't work for me, because I'm planning to autoscale (hpa) on these metrics.
The second solution works like a charm! Thanks!
Saw you've already updated the google docs as well, cool!

@jkohen
Copy link
Contributor

jkohen commented Feb 27, 2019

Glad it helped, thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants