Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nginx] Include track label in Prometheus metrics #2128

Closed
lambertjosh opened this issue Feb 21, 2018 · 9 comments · Fixed by #2608
Closed

[nginx] Include track label in Prometheus metrics #2128

lambertjosh opened this issue Feb 21, 2018 · 9 comments · Fixed by #2608

Comments

@lambertjosh
Copy link

The addition of Prometheus metrics support directly within the Ingress in 0.9.0 beta is great. One limitation we have found however, is that there is a limited set of upstream information available.

For example if a company uses two labels to facilitate canary deployments in k8s, this unfortunately means there is no way to differentiate between them. This makes it hard to detect increasing error rates and latency from these canary pods.

It would be really helpful if we could include an additional label, or perhaps include it in the existing upstream label.

@aledbf
Copy link
Member

aledbf commented Feb 21, 2018

It would be really helpful if we could include an additional label, or perhaps include it in the existing upstream label.

We have this information but it will too expensive (CPU and time) because the only way we can support this is searching the upstream IP and port in the local cache.
Not sure this makes sense.

@aledbf
Copy link
Member

aledbf commented Feb 21, 2018

ping @pieterlange

@gianrubio
Copy link
Contributor

gianrubio commented Mar 12, 2018

@lambertjosh looks like you need to configure your prometheus server to inject the pod label during the scrape. Below you can see how my metrics are tagged by prometheus.

screen shot 2018-03-12 at 21 05 59

I'm currently using prometheus-operator to setup my server, it can abstract a lot of configuration for you. I'm pasting the config generated by prometheus, hope it can help you.

- job_name: monitoring/ingress/0
  honor_labels: true
  kubernetes_sd_configs:
  - role: endpoints
  scrape_interval: 50s
  relabel_configs:
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app
    regex: nginx-ingress-lb
  - action: keep
    source_labels:
    - __meta_kubernetes_namespace
    regex: kube-system
  - action: keep
    source_labels:
    - __meta_kubernetes_endpoint_port_name
    regex: http-metrics
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: service
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: job
    replacement: ${1}
  - source_labels:
    - __meta_kubernetes_service_label_ingress
    target_label: job
    regex: (.+)
    replacement: ${1}
  - target_label: endpoint
    replacement: http-metrics

@lambertjosh
Copy link
Author

@gianrubio that would apply the labels of the ingress, right? In the case of canary deploys, you'd typically only have a single ingress pointing at a single service, with multiple deployments underneath it.

The trick is to be able to differentiate, in the upstream metrics, between the multiple deployments backing a single service.

@gianrubio
Copy link
Contributor

Could you illustrate that using an existing metrics and adding the desired labels?

@lambertjosh
Copy link
Author

@gianrubio sorry for the delay in writing this up. Our use case is canary deployments, but this same use case could apply to anyone who is running two different deployments matching the same service.

In our case we have two deployments matching a single service, one with container.v1 and another with container.v2. This allows us to test new versions of software, gradually rolling it out across our pods until version 2 is all that remains.

We'd really like to be able to detect at the NGINX level if we are different behavior between the two versions. Unfortunately the existing labels do not provide enough information to know which deployment the upstream is a member of.

k8s specs of sample environment:
Deployment A labels: app:production, tier:web, track:stable
Deployment B labels: app:production, tier:web, track:canary
Service selector: app:appname, tier:web

If I query for nginx_upstream_response_msecs_avg, I get the following:

nginx_upstream_response_msecs_avg{app="nginx-ingress",component="controller",ingress_class="nginx",instance="10.92.0.15:10254",job="kubernetes-pods",kubernetes_namespace="gitlab-managed-apps",kubernetes_pod_name="ingress-nginx-ingress-controller-596657f487-pndrg",namespace="",pod_template_hash="1522139043",release="ingress",server="10.92.0.30:5000",upstream="ruby-gke-5984625-production-auto-deploy-5000"}
nginx_upstream_response_msecs_avg{app="nginx-ingress",component="controller",ingress_class="nginx",instance="10.92.0.15:10254",job="kubernetes-pods",kubernetes_namespace="gitlab-managed-apps",kubernetes_pod_name="ingress-nginx-ingress-controller-596657f487-pndrg",namespace="",pod_template_hash="1522139043",release="ingress",server="10.92.0.41:5000",upstream="ruby-gke-5984625-production-auto-deploy-5000"}

There are two relevant labels for the upstream pod:

  • server which corresponds to the Pod IP/Port
  • upstream which is the service name and port.

Technically it is possible to match Pod IP to other pod details like the track label but maintaining that lookup table, especially historically, would be a lot of work.

It would be amazing if either the pod name or pod labels could be attached to the emitted metrics.

@gianrubio
Copy link
Contributor

We have this information but it will too expensive (CPU and time) because the only way we can support this is searching the upstream IP and port in the local cache.

@aledbf Indeed this is the solution, does the Store store data in memory or for each query it will query k8s api?

@aledbf
Copy link
Member

aledbf commented May 25, 2018

does the Store store data in memory or for each query it will query k8s api?

No, the Store contains a local copy of the required objects but even with that the operation will be expensive.
Just in case we are going to refactor the prometheus metrics, removing the vts module as the source of the data and use lua in the log phase (after the request was returned) to update the metrics. In that case, we don't need the local store because nginx has all the fields we need and if that's not the case we can just add another variable in the template :)

@aledbf
Copy link
Member

aledbf commented May 25, 2018

@gianrubio once I get a working PR with the described change I will request you a review of the prometheus part

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants