-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 document metrics scraping and enable metrics services via annotations #4247
📖 document metrics scraping and enable metrics services via annotations #4247
Conversation
To scrape metrics from capi containers, prometheus is mostly configured to scrape from targets, when the well common prometheus.io annotations are used. Signed-off-by: Constanti, Mario <mario.constanti@daimler.com>
Add some more details, how a kubebuilder bootstraped application protect their metrics endpoint and how prometheus must be configured to scrape these metrics. Signed-off-by: Constanti, Mario <mario.constanti@daimler.com>
Welcome @bavarianbidi! |
Hi @bavarianbidi. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@@ -1,6 +1,10 @@ | |||
apiVersion: v1 | |||
kind: Service | |||
metadata: | |||
annotations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these changes also be applied in each infra provider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In capo
the annotation is already set
https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/master/config/rbac/auth_proxy_service.yaml#L4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will go through the other infra provider and add the missing annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just created three PRs for providers under kubernetes-sigs
-Org, because CLA already signed. Have to check the legal stuff on the remaining others first. Hope this doesn't block this PR ;-)
Summary:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about Azure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotation was already there but was removed last year.
I created Issue #1222 to get in contact with Azure team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Azure PR #1320 created and merged
cc @devigned - this might be interesting to you |
Another way we have it configured when you bring up CAPZ in Tilt is to specify the scraping information via a ---
# Prometheus Monitor Service (Metrics)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: capz-controller-manager
name: capz-controller-manager-metrics-monitor
spec:
endpoints:
- path: /metrics
port: https
scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
insecureSkipVerify: true
selector:
matchLabels:
control-plane: capz-controller-manager I haven't experimented with it, but I bet by adding the annotations it would allow folks to specify less in the +1 to prom annotations |
Yes, but this require the prometheus-operator in place. The annotation-way is much more generic (imho) |
/milestone v0.4.0 |
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
metadata: | ||
name: capi-metrics-reader | ||
rules: | ||
- nonResourceURLs: ["/metrics"] | ||
verbs: ["get"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may as well add this to the infra components yaml and remove this step for the end user.
Probably makes sense to leave out the cluster role binding as we don't know the namespace Prometheus may be deployed into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the reason i documented it this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@randomvariable so did I get it right that you suggest removing this YAML here and adding it to the infra component YAMLs of all providers? Maybe I'm missing something but in case we really want to add this to our deployments, it looks to me like we need this ClusterRole only once.
But I'm really not sure if we should add this to our YAMLs so that it's deployed everywhere. Imho Prometheus and RBAC setups can vary and (as far as I'm aware) there was no recurring demand for this in Slack. I assume nobody is really missing this in our YAMLs at the moment.
I have no strong opinion against adding the ClusteRole to our YAMLs, but if we do we should do it right and adding it to all infra providers seems redundant to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. Here are some minor nits about documentation style.
Signed-off-by: Constanti, Mario <mario.constanti@daimler.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few nits. Apart from that we should clarify if we want to move the ClusterRole to our YAMLs. I think we should start with this documentation and with more data / user feedback we can always move it into the YAMLs later on.
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
metadata: | ||
name: capi-metrics-reader | ||
rules: | ||
- nonResourceURLs: ["/metrics"] | ||
verbs: ["get"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@randomvariable so did I get it right that you suggest removing this YAML here and adding it to the infra component YAMLs of all providers? Maybe I'm missing something but in case we really want to add this to our deployments, it looks to me like we need this ClusterRole only once.
But I'm really not sure if we should add this to our YAMLs so that it's deployed everywhere. Imho Prometheus and RBAC setups can vary and (as far as I'm aware) there was no recurring demand for this in Slack. I assume nobody is really missing this in our YAMLs at the moment.
I have no strong opinion against adding the ClusteRole to our YAMLs, but if we do we should do it right and adding it to all infra providers seems redundant to me.
@bavarianbidi fyi. The metrics port will change through: https://github.com/kubernetes-sigs/cluster-api/pull/4640/files |
Yup I somehow didn't even think about that this makes this PR also a lot easier :) |
/hold until #4640 is merged |
@bavarianbidi: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
i don't think there's any metrics to expose now. |
How do you mean that? As far as I'm aware there should be metrics exposed via the metrics endpoint right now. Although, after the latest changes they are only exposed on localhost. |
Folks, what's the status of this PR? |
/hold cancel Since the PR has been initially opened the situation has been changed a bit. Previously, we had kube-rbac-proxy so we needed to:
In the meantime we dropped kube-rbac-proxy but we also are binding the metrics port to localhost so the metrics cannot be scraped at all (per default). I think one way to scrape the metrics now would be:
But as we didn't want to merge the variant with the metrics port binded to An alternative would be to document how to
Afaik we're now in a situation where it doesn't make sense to adjust any of our release manifests and only document the steps a user has to do based on our manifests. @randomvariable Regarding the metrics we have. We don't have CAPI specific metrics, but we still have the ones from controller-runtime and go (link). In my experience they can already be used for some basic monitoring and alerting. |
Forgot cc :) |
@sbueringer: GitHub didn't allow me to request PR reviews from the following users: bavarianbidi. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
why not document the WDYT @sbueringer / @vincepri |
@CoMario Fine for me. |
Folks, what's the status of this PR? |
@vincepri I think we're waiting from a response from you if the proposed documentation in #4247 (comment) would be okay |
/area health |
@bavarianbidi: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@bavarianbidi are you still pursuing this PR? |
@enxebre as i will leave Daimler within the next couple of weeks i will loose access to the I'm fine if we close this PR and we create several other PRs as |
/close Closing based on the above comment, if folks want to still pursue documentation later please feel free to reopen different PRs |
@vincepri: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
Oriented on the metrics documentation from
kubebuilder
, this PR will add the required annotations on the existing*-metrics-service
objects and describe how to configure Prometheus with requiredClusterRoles
/ClusterRoleBindings
to get valid scrape targets.