Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Kubelet API bearer token authentication #191

Closed
wants to merge 1 commit into from

Conversation

aknuds1
Copy link
Contributor

@aknuds1 aknuds1 commented Apr 16, 2018

High level description of the change.

Enable Kubelet API bearer token authentication on controllers and workers. I need to do this because otherwise Prometheus as installed by kube-prometheus isn't able to scrape Kubelet targets. I've verified that with my change, Kubelet scraping works.

Testing

I deployed a new cluster with these changes, deployed kube-prometheus and saw that Kubelet targets were successfully scraped.

@dghubble
Copy link
Member

Prometheus itself is quite capable of scraping kubelet targets without turning on bearer token auth - its just a config option. In fact, the Typhoon Prometheus examples configure kubelet scraping and do so proxied through the apiserver which also works for scraping remote kubelets. Have you seen https://typhoon.psdn.io/addons/prometheus/ and the manifests in https://github.com/poseidon/typhoon/tree/master/addons/prometheus? Those manifests are maintained to provide a working set of targets, dashboards, and alerts (that pass) on Typhoon cluster across clouds/bare-metal.

@aknuds1
Copy link
Contributor Author

aknuds1 commented Apr 19, 2018

@dghubble As you found on Slack, the reason I need this change is that I would like to keep using Prometheus Operator/kube-prometheus. I've standardized on using this stack for my projects, and would like to keep using it also with Typhoon. I see you have a Prometheus add-on for Typhoon, which is great, but I want to hang on to benefits brought by Prometheus Operator such as its ServiceMonitor abstraction and the monitoring rules/Grafana dashboards installed by kube-prometheus.

As you mention in Slack, Prometheus Operator relies on token auth. Is it problematic to enable this in Typhoon (considering you're hesitant about merging this PR)? I hear however from Frederic Branczyk that the token request API will be used instead in the future - are you going to wait until this is ready instead?

@dghubble
Copy link
Member

Addons show a fully functional Prometheus setup which scrapes and alerts on components, including scraping kubelets.

Typhoon clusters shouldn't enable a separate, otherwise unneeded, authorization mode without sufficient justification. At a minimum, I expect an app like prometheus-operator to justify its requirement that clusters enable the new mode, document the reasons for it, and show clear plain-manifest examples of a token setup. Then we can assess whether those reasons are appropriate.

Currently, I don't see a clear case and the token usage seems to be buried inside the operator. Onus lies on prometheus-operator.

@aknuds1
Copy link
Contributor Author

aknuds1 commented Apr 20, 2018

Do you want for me to obtain a rationale for its usage from a Prometheus Operator dev?

I was also wondering if you might solve this by allowing users to configure extra parameters for Kubelet themselves.

@dghubble
Copy link
Member

Sure. I'd request that of any app workload making special demands about the underlying cluster.

@dghubble
Copy link
Member

Typhoon does allow advanced users to make customizations to hosts by providing custom Container Linux Configs which are additively merged and validated https://typhoon.psdn.io/advanced/customization/#container-linux. You can create a Kubelet dropin with extra flags if really needed.

@aknuds1
Copy link
Contributor Author

aknuds1 commented Apr 20, 2018

Does that method allow me to add just extra flags and not redefine the whole (Kubelet) command line?

@dghubble
Copy link
Member

In this case, your dropin has to overwrite the ExecStart.

@dghubble
Copy link
Member

Another item. These modes for prometheus-operator haven't been enabled in bootkube hack examples either, which is another example of a vanilla Kubernetes setup.

@aknuds1
Copy link
Contributor Author

aknuds1 commented Apr 20, 2018

I'm gonna see if I can get a rationale for this requirement from a Prometheus Operator dev :)

@brancz
Copy link

brancz commented Apr 21, 2018

Note that proxying through the API allows the Prometheus pod to perform any operation on the kubelet API, which is a security risk. The upstream Prometheus team agrees unanimously that this is a bad idea, unfortunately GKE doesn't allow token authentication today which is why it's still in the example configuration. I've been working with the GKE team to get this enabled in order to stop this practice. Also probably not as relevant for small users, but cadvisor and kubelet metrics can be large scrapes in large clusters (seen up to ~50mb per scrape), which can put unnecessary load on your apiserver if you are scraping a lot of kubelets.

Note that Tectonic, OpenShift, GCE (clusters used in Kubernetes testing) and soon GKE clusters all use this option.

That said, using client certs is also a valid option and probably even better if you can manage to generate correct certificates, it's just more complicated for people to rotate certs rather than service account tokens. I'm saying it may be better as token authentication adds one request per scrape of a kubelet in addition to an authorization (which is required either way).

@dghubble
Copy link
Member

dghubble commented Apr 21, 2018

Thanks for weighing in @brancz.

Let's establish some clarifying background. Pods have the abilities they are granted by their service account via associated bound Roles and ClusterRoles. We're defending against an attacker breaking into Prometheus and assuming those abilities. Specifically, we're discussing the ClusterRole
allowing nodes/proxy and whether to switch to a direct-to-kublet approach.

  • Is there an upstream issue about the nodes/proxy risks you mention? Its in widespread use.

Given that's the case, we could explore changing to an alternate scrape. Notably, the measurable result needs to be removing the nodes/proxy resource access. Adding a new kubelet authorization mode or changing the Prometheus config alone doesn't provide security benefit.

  • Can you make the case upstream for bootkube clusters to allow kubelet token auth too? It'd give some confidence its the right direction for all clusters in the whole community - since Tectonic, Openshift, Prometheus-Operator, and your GCE test all fall under the umbrella of Red Hat products.
  • Is the discussion with GKE in the open / linkable? I'd like to see their concerns as well.

Comparing Typhoon's Prometheus ClusterRole with Prometheus Operator's ClusterRole does show nodes/proxy isn't in use in operator (though a ton of others are). Perhaps this isn't a fair comparison bc operator might have a separate clusterrole for just the Prometheus it deploys or these are the right ones to compare against?

@dghubble
Copy link
Member

The issue of proxying adding load the apiserver is definitely to bear in mind. And a possible reason to switch, even if nodes/proxy turns out to be fine. More investigation is needed to know the load impact of proxy vs direct-to-kubelet (which still has to check with the apiserver). I agree, its likely direct-to-kubelet will be lower load.

So far in my scaling tests, it hasn't been an issue yet. There are other limits that apply and for now, multi-controller allows the ratio of controllers to workers to be kept reasonable.

@aknuds1
Copy link
Contributor Author

aknuds1 commented May 11, 2018

@dghubble Have you come to any conclusion in this matter?

@dghubble
Copy link
Member

As is, this PR will break kubectl logs, exec, and port-forward on your Digital Ocean cluster. Its not on the road to merge, the solution is larger is scope and will ultimately enable Kubelet authn/authz across all platforms, not just Digital Ocean.

Presently, Typhoon Kubelets don't enable bearer token authentication. It would be a nice (though not exactly pressing) feature to add bearer token auth since it would allow the Prometheus pod to drop node/proxy for the more granular node/metrics, further improving the Prometheus addon.

@dghubble
Copy link
Member

Found the root cause, described in #215. I'm going to start on the full implementation and close this out.

@dghubble dghubble closed this May 14, 2018
@dghubble
Copy link
Member

Thanks for getting the ball rolling with an initial shot

@dghubble
Copy link
Member

See #216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants