-
-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Kubelet API bearer token authentication #191
Conversation
Prometheus itself is quite capable of scraping kubelet targets without turning on bearer token auth - its just a config option. In fact, the Typhoon Prometheus examples configure kubelet scraping and do so proxied through the apiserver which also works for scraping remote kubelets. Have you seen https://typhoon.psdn.io/addons/prometheus/ and the manifests in https://github.com/poseidon/typhoon/tree/master/addons/prometheus? Those manifests are maintained to provide a working set of targets, dashboards, and alerts (that pass) on Typhoon cluster across clouds/bare-metal. |
@dghubble As you found on Slack, the reason I need this change is that I would like to keep using Prometheus Operator/kube-prometheus. I've standardized on using this stack for my projects, and would like to keep using it also with Typhoon. I see you have a Prometheus add-on for Typhoon, which is great, but I want to hang on to benefits brought by Prometheus Operator such as its ServiceMonitor abstraction and the monitoring rules/Grafana dashboards installed by kube-prometheus. As you mention in Slack, Prometheus Operator relies on token auth. Is it problematic to enable this in Typhoon (considering you're hesitant about merging this PR)? I hear however from Frederic Branczyk that the token request API will be used instead in the future - are you going to wait until this is ready instead? |
Addons show a fully functional Prometheus setup which scrapes and alerts on components, including scraping kubelets. Typhoon clusters shouldn't enable a separate, otherwise unneeded, authorization mode without sufficient justification. At a minimum, I expect an app like prometheus-operator to justify its requirement that clusters enable the new mode, document the reasons for it, and show clear plain-manifest examples of a token setup. Then we can assess whether those reasons are appropriate. Currently, I don't see a clear case and the token usage seems to be buried inside the operator. Onus lies on prometheus-operator. |
Do you want for me to obtain a rationale for its usage from a Prometheus Operator dev? I was also wondering if you might solve this by allowing users to configure extra parameters for Kubelet themselves. |
Sure. I'd request that of any app workload making special demands about the underlying cluster. |
Typhoon does allow advanced users to make customizations to hosts by providing custom Container Linux Configs which are additively merged and validated https://typhoon.psdn.io/advanced/customization/#container-linux. You can create a Kubelet dropin with extra flags if really needed. |
Does that method allow me to add just extra flags and not redefine the whole (Kubelet) command line? |
In this case, your dropin has to overwrite the ExecStart. |
Another item. These modes for prometheus-operator haven't been enabled in bootkube |
I'm gonna see if I can get a rationale for this requirement from a Prometheus Operator dev :) |
Note that proxying through the API allows the Prometheus pod to perform any operation on the kubelet API, which is a security risk. The upstream Prometheus team agrees unanimously that this is a bad idea, unfortunately GKE doesn't allow token authentication today which is why it's still in the example configuration. I've been working with the GKE team to get this enabled in order to stop this practice. Also probably not as relevant for small users, but cadvisor and kubelet metrics can be large scrapes in large clusters (seen up to ~50mb per scrape), which can put unnecessary load on your apiserver if you are scraping a lot of kubelets. Note that Tectonic, OpenShift, GCE (clusters used in Kubernetes testing) and soon GKE clusters all use this option. That said, using client certs is also a valid option and probably even better if you can manage to generate correct certificates, it's just more complicated for people to rotate certs rather than service account tokens. I'm saying it may be better as token authentication adds one request per scrape of a kubelet in addition to an authorization (which is required either way). |
Thanks for weighing in @brancz. Let's establish some clarifying background. Pods have the abilities they are granted by their service account via associated bound Roles and ClusterRoles. We're defending against an attacker breaking into Prometheus and assuming those abilities. Specifically, we're discussing the ClusterRole
Given that's the case, we could explore changing to an alternate scrape. Notably, the measurable result needs to be removing the
Comparing Typhoon's Prometheus ClusterRole with Prometheus Operator's ClusterRole does show |
The issue of proxying adding load the apiserver is definitely to bear in mind. And a possible reason to switch, even if So far in my scaling tests, it hasn't been an issue yet. There are other limits that apply and for now, multi-controller allows the ratio of controllers to workers to be kept reasonable. |
@dghubble Have you come to any conclusion in this matter? |
As is, this PR will break Presently, Typhoon Kubelets don't enable bearer token authentication. It would be a nice (though not exactly pressing) feature to add bearer token auth since it would allow the Prometheus pod to drop |
Found the root cause, described in #215. I'm going to start on the full implementation and close this out. |
Thanks for getting the ball rolling with an initial shot |
See #216 |
High level description of the change.
Enable Kubelet API bearer token authentication on controllers and workers. I need to do this because otherwise Prometheus as installed by kube-prometheus isn't able to scrape Kubelet targets. I've verified that with my change, Kubelet scraping works.
Testing
I deployed a new cluster with these changes, deployed kube-prometheus and saw that Kubelet targets were successfully scraped.