📖 document metrics scraping and enable metrics services via annotations #4247

bavarianbidi · 2021-03-03T15:55:54Z

What this PR does / why we need it:

Oriented on the metrics documentation from kubebuilder, this PR will add the required annotations on the existing *-metrics-service objects and describe how to configure Prometheus with required ClusterRoles/ClusterRoleBindings to get valid scrape targets.

Mario Constanti mario.constanti@daimler.com, Daimler TSS GmbH, legal info/Impressum

To scrape metrics from capi containers, prometheus is mostly configured to scrape from targets, when the well common prometheus.io annotations are used. Signed-off-by: Constanti, Mario <mario.constanti@daimler.com>

Add some more details, how a kubebuilder bootstraped application protect their metrics endpoint and how prometheus must be configured to scrape these metrics. Signed-off-by: Constanti, Mario <mario.constanti@daimler.com>

k8s-ci-robot · 2021-03-03T15:56:02Z

Welcome @bavarianbidi!

It looks like this is your first PR to kubernetes-sigs/cluster-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2021-03-03T15:56:03Z

Hi @bavarianbidi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fabriziopandini · 2021-03-03T17:54:11Z

/ok-to-test

CecileRobertMichon · 2021-03-03T21:31:43Z

bootstrap/kubeadm/config/rbac/auth_proxy_service.yaml

@@ -1,6 +1,10 @@
 apiVersion: v1
 kind: Service
 metadata:
+  annotations:


should these changes also be applied in each infra provider?

Yes. In capo the annotation is already set
https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/master/config/rbac/auth_proxy_service.yaml#L4

Will go through the other infra provider and add the missing annotation.

Hi @CecileRobertMichon

just created three PRs for providers under kubernetes-sigs-Org, because CLA already signed. Have to check the legal stuff on the remaining others first. Hope this doesn't block this PR ;-)

Summary:

new PRs:

Packet - merged

IBM Cloud - merged

DigitalOcean - merged

annotation already exist:

AWS

Metal3

GCP

OpenStack

vSphere

what about Azure?

annotation was already there but was removed last year.

I created Issue #1222 to get in contact with Azure team.

Azure PR #1320 created and merged

CecileRobertMichon · 2021-03-03T21:32:27Z

cc @devigned - this might be interesting to you

devigned · 2021-03-04T14:45:27Z

Another way we have it configured when you bring up CAPZ in Tilt is to specify the scraping information via a ServiceMonitor.

---
# Prometheus Monitor Service (Metrics)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    control-plane: capz-controller-manager
  name: capz-controller-manager-metrics-monitor
spec:
  endpoints:
    - path: /metrics
      port: https
      scheme: https
      bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      tlsConfig:
        insecureSkipVerify: true
  selector:
    matchLabels:
      control-plane: capz-controller-manager

https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/451d034dc0c0995ab2fe88aa04c69274e8e18f2e/hack/observability/prometheus/resources/prometheus.yaml#L91-L108

I haven't experimented with it, but I bet by adding the annotations it would allow folks to specify less in the endpoints configuration.

+1 to prom annotations

bavarianbidi · 2021-03-04T18:03:37Z

Another way we have it configured when you bring up CAPZ in Tilt is to specify the scraping information via a ServiceMonitor.
---
# Prometheus Monitor Service (Metrics)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    control-plane: capz-controller-manager
  name: capz-controller-manager-metrics-monitor
spec:
  endpoints:
    - path: /metrics
      port: https
      scheme: https
      bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      tlsConfig:
        insecureSkipVerify: true
  selector:
    matchLabels:
      control-plane: capz-controller-manager
https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/451d034dc0c0995ab2fe88aa04c69274e8e18f2e/hack/observability/prometheus/resources/prometheus.yaml#L91-L108

I haven't experimented with it, but I bet by adding the annotations it would allow folks to specify less in the endpoints configuration.

+1 to prom annotations

Yes, but this require the prometheus-operator in place. The annotation-way is much more generic (imho)

fabriziopandini · 2021-03-05T19:27:10Z

/milestone v0.4.0

randomvariable · 2021-03-16T11:50:19Z

docs/book/src/reference/metrics.md

+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: capi-metrics-reader
+rules:
+- nonResourceURLs: ["/metrics"]
+  verbs: ["get"]


We may as well add this to the infra components yaml and remove this step for the end user.

Probably makes sense to leave out the cluster role binding as we don't know the namespace Prometheus may be deployed into.

That's the reason i documented it this way

@randomvariable so did I get it right that you suggest removing this YAML here and adding it to the infra component YAMLs of all providers? Maybe I'm missing something but in case we really want to add this to our deployments, it looks to me like we need this ClusterRole only once.

But I'm really not sure if we should add this to our YAMLs so that it's deployed everywhere. Imho Prometheus and RBAC setups can vary and (as far as I'm aware) there was no recurring demand for this in Slack. I assume nobody is really missing this in our YAMLs at the moment.

I have no strong opinion against adding the ClusteRole to our YAMLs, but if we do we should do it right and adding it to all infra providers seems redundant to me.

if #4640 will be merged, there is no need to add these objects to the infra components yaml. Let's wait what happens to #4640 and discuss again

sftim

Hi. Here are some minor nits about documentation style.

docs/book/src/reference/metrics.md

Signed-off-by: Constanti, Mario <mario.constanti@daimler.com>

k8s-ci-robot · 2021-04-21T03:49:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign fabriziopandini after the PR has been reviewed.
You can assign the PR to them by writing /assign @fabriziopandini in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sbueringer

a few nits. Apart from that we should clarify if we want to move the ClusterRole to our YAMLs. I think we should start with this documentation and with more data / user feedback we can always move it into the YAMLs later on.

docs/book/src/reference/metrics.md

sbueringer · 2021-04-21T18:28:47Z

docs/book/src/reference/metrics.md

+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: capi-metrics-reader
+rules:
+- nonResourceURLs: ["/metrics"]
+  verbs: ["get"]


@randomvariable so did I get it right that you suggest removing this YAML here and adding it to the infra component YAMLs of all providers? Maybe I'm missing something but in case we really want to add this to our deployments, it looks to me like we need this ClusterRole only once.

But I'm really not sure if we should add this to our YAMLs so that it's deployed everywhere. Imho Prometheus and RBAC setups can vary and (as far as I'm aware) there was no recurring demand for this in Slack. I assume nobody is really missing this in our YAMLs at the moment.

I have no strong opinion against adding the ClusteRole to our YAMLs, but if we do we should do it right and adding it to all infra providers seems redundant to me.

docs/book/src/reference/metrics.md

sbueringer · 2021-05-20T20:21:37Z

@bavarianbidi fyi. The metrics port will change through: https://github.com/kubernetes-sigs/cluster-api/pull/4640/files

sbueringer · 2021-05-21T03:47:42Z

@bavarianbidi fyi. The metrics port will change through: https://github.com/kubernetes-sigs/cluster-api/pull/4640/files

thanks for this hint. I will stand still until #4640 is merged because we don't need the additional documentation regarding clusterRoles, clusterRoleBindings and serviceAccounts for metrics scraping. The annotations (with adjusted metrics port and scheme) are still valid.

Yup I somehow didn't even think about that this makes this PR also a lot easier :)

sbueringer · 2021-05-21T03:48:04Z

/hold

until #4640 is merged

k8s-ci-robot · 2021-06-09T10:29:20Z

@bavarianbidi: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

randomvariable · 2021-06-09T10:31:04Z

i don't think there's any metrics to expose now.

sbueringer · 2021-06-10T04:54:15Z

i don't think there's any metrics to expose now.

How do you mean that?

As far as I'm aware there should be metrics exposed via the metrics endpoint right now. Although, after the latest changes they are only exposed on localhost.

vincepri · 2021-07-28T23:54:47Z

Folks, what's the status of this PR?

sbueringer · 2021-07-29T05:54:06Z

/hold cancel
(as the referenced PR above has been merged)

Since the PR has been initially opened the situation has been changed a bit. Previously, we had kube-rbac-proxy so we needed to:

Add service annotations
Deploy a ClusterRole
Extend the Prometheus config

In the meantime we dropped kube-rbac-proxy but we also are binding the metrics port to localhost so the metrics cannot be scraped at all (per default). I think one way to scrape the metrics now would be:

Set "--metrics-bind-addr=0.0.0.0:8080" on all managers
Extend the Prometheus config
(no ClusterRole needed)

But as we didn't want to merge the variant with the metrics port binded to 0.0.0.0 because of security concerns (* No ClusterRole needed), I'm not sure if we want to document that variant?

An alternative would be to document how to

Add kube-rbac-proxy sidecar to every manager deployment
Add service annotations
Deploy a ClusterRole
Extend the Prometheus config

Afaik we're now in a situation where it doesn't make sense to adjust any of our release manifests and only document the steps a user has to do based on our manifests.

@randomvariable Regarding the metrics we have. We don't have CAPI specific metrics, but we still have the ones from controller-runtime and go (link). In my experience they can already be used for some basic monitoring and alerting.

sbueringer · 2021-07-29T05:56:50Z

Forgot cc :)
/cc @vincepri @bavarianbidi

k8s-ci-robot · 2021-07-29T05:56:52Z

@sbueringer: GitHub didn't allow me to request PR reviews from the following users: bavarianbidi.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Forgot cc :)
/cc @vincepri @bavarianbidi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bavarianbidi · 2021-08-26T04:57:23Z

/hold cancel
(as the referenced PR above has been merged)

Since the PR has been initially opened the situation has been changed a bit. Previously, we had kube-rbac-proxy so we needed to:

Add service annotations

Deploy a ClusterRole

Extend the Prometheus config

In the meantime we dropped kube-rbac-proxy but we also are binding the metrics port to localhost so the metrics cannot be scraped at all (per default). I think one way to scrape the metrics now would be:

Set "--metrics-bind-addr=0.0.0.0:8080" on all managers

Extend the Prometheus config
(no ClusterRole needed)

But as we didn't want to merge the variant with the metrics port binded to 0.0.0.0 because of security concerns (* No ClusterRole needed), I'm not sure if we want to document that variant?

An alternative would be to document how to

Add kube-rbac-proxy sidecar to every manager deployment

Add service annotations

Deploy a ClusterRole

Extend the Prometheus config

Afaik we're now in a situation where it doesn't make sense to adjust any of our release manifests and only document the steps a user has to do based on our manifests.

@randomvariable Regarding the metrics we have. We don't have CAPI specific metrics, but we still have the ones from controller-runtime and go (link). In my experience they can already be used for some basic monitoring and alerting.

why not document the --metrics-bind-addr=0.0.0.0:8080 variant and add a note about possible security concerns. The variant with kube-rbac-proxy is definitely much more secure, but then i would propose to create an additional section in the docs how to improve the security on CAPI in general by using kube-rbac-proxy.

WDYT @sbueringer / @vincepri

sbueringer · 2021-08-26T07:22:50Z

@CoMario Fine for me.

vincepri · 2021-09-15T21:44:57Z

Folks, what's the status of this PR?

sbueringer · 2021-09-16T09:02:45Z

Folks, what's the status of this PR?

@vincepri I think we're waiting from a response from you if the proposed documentation in #4247 (comment) would be okay

killianmuldoon · 2021-10-27T14:42:02Z

/area health

k8s-ci-robot · 2021-12-08T13:53:53Z

@bavarianbidi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cluster-api-test-main-mink8s	`cb247b9`	link	`/test pull-cluster-api-test-main-mink8s`
pull-cluster-api-verify-main	`cb247b9`	link	true	`/test pull-cluster-api-verify-main`
pull-cluster-api-test-mink8s-main	`cb247b9`	link	true	`/test pull-cluster-api-test-mink8s-main`
pull-cluster-api-e2e-main	`cb247b9`	link	true	`/test pull-cluster-api-e2e-main`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

enxebre · 2022-03-01T22:04:15Z

@bavarianbidi are you still pursuing this PR?

bavarianbidi · 2022-03-09T05:48:08Z

pursuing

@enxebre as i will leave Daimler within the next couple of weeks i will loose access to the mercedes-benz github organization and i'm not able to update this PR anymore.

I'm fine if we close this PR and we create several other PRs as cluster-api-state-metrics will make it into CAPI-Repo and then the monitoring-section in the gitbook needs a general refactoring. WDYT?

vincepri · 2022-03-31T17:02:42Z

/close

Closing based on the above comment, if folks want to still pursue documentation later please feel free to reopen different PRs

k8s-ci-robot · 2022-03-31T17:02:54Z

@vincepri: Closed this PR.

In response to this:

/close

Closing based on the above comment, if folks want to still pursue documentation later please feel free to reopen different PRs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Constanti, Mario added 2 commits March 3, 2021 16:23

add prometheus annotations on metrics-services

2441c93

To scrape metrics from capi containers, prometheus is mostly configured to scrape from targets, when the well common prometheus.io annotations are used. Signed-off-by: Constanti, Mario <mario.constanti@daimler.com>

documentation/reference: metrics informations

4c94ed6

Add some more details, how a kubebuilder bootstraped application protect their metrics endpoint and how prometheus must be configured to scrape these metrics. Signed-off-by: Constanti, Mario <mario.constanti@daimler.com>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 3, 2021

k8s-ci-robot requested review from detiber and fabriziopandini March 3, 2021 15:56

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 3, 2021

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 3, 2021

bavarianbidi changed the title ~~📖 enable metrics scraping~~ 📖 document metrics scraping and enable metrics services via annotations Mar 3, 2021

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 3, 2021

CecileRobertMichon reviewed Mar 3, 2021

View reviewed changes

k8s-ci-robot added this to the v0.4.0 milestone Mar 5, 2021

bavarianbidi mentioned this pull request Mar 9, 2021

Enable metrics-scraping kubernetes-sigs/cluster-api-provider-azure#1222

Closed

randomvariable reviewed Mar 16, 2021

View reviewed changes

bavarianbidi mentioned this pull request Apr 16, 2021

enable metrics scraping via prometheus kubernetes-sigs/cluster-api-provider-azure#1320

Merged

3 tasks

sftim reviewed Apr 20, 2021

View reviewed changes

docs/metrics: upper camala case for API objects

a6dc2c8

Signed-off-by: Constanti, Mario <mario.constanti@daimler.com>

sbueringer reviewed Apr 21, 2021

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 21, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 9, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 29, 2021

k8s-ci-robot requested a review from vincepri July 29, 2021 05:56

sbueringer mentioned this pull request Sep 24, 2021

Remove kube-rbac-proxy kubernetes-sigs/cluster-api-provider-azure#1707

Closed

vincepri modified the milestones: v0.4, v1.1 Oct 22, 2021

k8s-ci-robot added the area/health label Oct 27, 2021

fabriziopandini modified the milestones: v1.1, v1.2 Feb 3, 2022

k8s-ci-robot closed this Mar 31, 2022

📖 document metrics scraping and enable metrics services via annotations #4247

📖 document metrics scraping and enable metrics services via annotations #4247

Conversation

bavarianbidi commented Mar 3, 2021

k8s-ci-robot commented Mar 3, 2021

k8s-ci-robot commented Mar 3, 2021

fabriziopandini commented Mar 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bavarianbidi Mar 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bavarianbidi Apr 16, 2021 • edited Loading

Choose a reason for hiding this comment

CecileRobertMichon commented Mar 3, 2021

devigned commented Mar 4, 2021 • edited Loading

bavarianbidi commented Mar 4, 2021

fabriziopandini commented Mar 5, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Apr 21, 2021

sbueringer left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer commented May 20, 2021

sbueringer commented May 21, 2021 • edited Loading

sbueringer commented May 21, 2021

k8s-ci-robot commented Jun 9, 2021

randomvariable commented Jun 9, 2021

sbueringer commented Jun 10, 2021 • edited Loading

vincepri commented Jul 28, 2021

sbueringer commented Jul 29, 2021 • edited Loading

sbueringer commented Jul 29, 2021

k8s-ci-robot commented Jul 29, 2021

bavarianbidi commented Aug 26, 2021 • edited Loading

sbueringer commented Aug 26, 2021

vincepri commented Sep 15, 2021

sbueringer commented Sep 16, 2021 • edited Loading

killianmuldoon commented Oct 27, 2021

k8s-ci-robot commented Dec 8, 2021

enxebre commented Mar 1, 2022

bavarianbidi commented Mar 9, 2022

vincepri commented Mar 31, 2022

k8s-ci-robot commented Mar 31, 2022

bavarianbidi Mar 4, 2021 •

edited

Loading

bavarianbidi Apr 16, 2021 •

edited

Loading

devigned commented Mar 4, 2021 •

edited

Loading

sbueringer left a comment •

edited

Loading

sbueringer commented May 21, 2021 •

edited

Loading

sbueringer commented Jun 10, 2021 •

edited

Loading

sbueringer commented Jul 29, 2021 •

edited

Loading

bavarianbidi commented Aug 26, 2021 •

edited

Loading

sbueringer commented Sep 16, 2021 •

edited

Loading