Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Remove kube-rbac-proxy and expose metrics on localhost:8080 #4640

Merged

Conversation

micahhausler
Copy link
Member

What this PR does / why we need it:

Removes unneeded component in cluster-api.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes #4325

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 19, 2021
@micahhausler micahhausler changed the title 🌱 Removed kube-rbac-proxy from cluster-api 🌱 Removed kube-rbac-proxy from cluster-api May 19, 2021
@sbueringer
Copy link
Member

sbueringer commented May 19, 2021

I'm a bit curious why kubebuilder adds kube-rbac-proxy for metrics per default. If it's easily possible to find out the reasoning behind adding it in the first place in kubebuilder from the kubebuilder repo / issues / PRs, that would be nice.

In my opinion, we're doing something seriously wrong regarding metrics, if we have to protect them with RBAC (but there might be more security-minded folks who would disagree). For the change itself lgtm from me overall.

Can you also adjust: https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/book/src/developer/providers/v1alpha3-to-v1alpha4.md#upgrade-kube-rbac-proxy-to-v080

Based on this comment:

// WatchPodMetrics captures metrics from all pods every 5s. It expects to find port 8080 open on the controller.
// Use replacements in an e2econfig to enable metrics scraping without kube-rbac-proxy, e.g:
// - new: --metrics-bind-addr=:8080
// old: --metrics-addr=127.0.0.1:8080
I think it's highly likely that we have to adjust --metrics-bind-addr so that metric scraping still works. We can then also cleanup the --metrics-bind-addr replacements in https://github.com/kubernetes-sigs/cluster-api/blob/master/test/e2e/config/docker.yaml.

The combination of the new bind address and the docker.yaml can then be used to prove that the metrics scraping still works. Fyi metrics are e.g. stored here in our test artifacts: https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api/4640/pull-cluster-api-e2e-main/1395079221084164096/artifacts/clusters/bootstrap/controllers/capd-controller-manager/capd-controller-manager-85444cb6c9-c565f/metrics.txt

I think another repo-wide grep for kube-rbac-proxy to make sure we got all occurances would make sense.

@micahhausler micahhausler force-pushed the remove-kube-rbac-proxy branch from fc5f975 to 8e40c02 Compare May 19, 2021 19:58
@micahhausler
Copy link
Member Author

I'm a bit curious why kubebuilder adds kube-rbac-proxy for metrics per default. If it's easily possible to find out the reasoning behind adding it in the first place in kubebuilder from the kubebuilder repo / issues / PRs, that would be nice.

In my opinion, we're doing something seriously wrong regarding metrics, if we have to protect them with RBAC (but there might be more security-minded folks who would disagree). For the change itself lgtm from me overall.

Yea, relying on shipping personal projects in kube-builder was probably not the best call. Looking at kubernetes-sigs/kubebuilder#513, it got added in a monster PR (+53,599 −80,563 LOC) without any question about whether it should have been included. That should probably be re-evaluated.

Can you also adjust: https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/book/src/developer/providers/v1alpha3-to-v1alpha4.md#upgrade-kube-rbac-proxy-to-v080

Done

I think it's highly likely that we have to adjust --metrics-bind-addr so that metric scraping still works. We can then also cleanup the --metrics-bind-addr replacements in https://github.com/kubernetes-sigs/cluster-api/blob/master/test/e2e/config/docker.yaml.

Done

I think another repo-wide grep for kube-rbac-proxy to make sure we got all occurances would make sense.

How does this look? I left the v1alpha2-to-v1alpha3 references alone

$ grep -E  'auth_proxy|kube-rbac-proxy' -r ./*
./docs/book/src/developer/providers/v1alpha2-to-v1alpha3.md:    - `manager_auth_proxy_patch.yaml` to `config/manager`
./docs/book/src/developer/providers/v1alpha2-to-v1alpha3.md:        - manager_auth_proxy_patch.yaml
./docs/book/src/developer/providers/v1alpha2-to-v1alpha3.md:        - manager_auth_proxy_patch.yaml
./docs/book/src/developer/providers/v1alpha3-to-v1alpha4.md:## kube-rbac-proxy has been removed
./docs/book/src/developer/providers/v1alpha3-to-v1alpha4.md:- Find and remove the `kube-rbac-proxy` sidecar

@sbueringer
Copy link
Member

sbueringer commented May 19, 2021

@micahhausler Thx, looks great.

/retest
(never saw those tests fail, but let's just retry to see if they were triggered by the changes in this PR)

P.S. I'll take a closer look tomorrow, but according to the grep results you provided it looks good to me.

@micahhausler micahhausler force-pushed the remove-kube-rbac-proxy branch from 8e40c02 to 1718b96 Compare May 19, 2021 21:03
@micahhausler
Copy link
Member Author

Looks like an unrelated flake

/test pull-cluster-api-test-main

@@ -20,14 +20,17 @@ spec:
- /manager
args:
- "--leader-elect"
- "--metrics-bind-addr=127.0.0.1:8080"
- "--metrics-bind-addr=:8080"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked and found out that this is the default value. So we don't have to set it. I honestly don't particularly care if we remove the argument altogether or explicitly set the default. Maybe others do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: to me it looks like the section title Required kustomize changes to have a single manager watching all namespaces and answer to webhook calls does not really accurately describe all things we change in the kustomize folder. So it might make sense to make the section title a bit more generic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the default I'm +1 to remove this flag if possible

@@ -115,12 +133,9 @@ Provider's `/config` folder has the same structure of `/config` folder in CAPI
```
- "--metrics-bind-addr=127.0.0.1:8080"
```
- Verify that fetaure flags required by your container are properly set
- Verify that feature flags required by your container are properly set
(as it was in `/config/webhook/manager_webhook_patch.yaml`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section should describe what other providers have to do to drop kube-rbac-proxy. I.e. it should be a more or less close description what we do on the current PR.

Thus we should probably mention that they have to remove (the files and the references to it):

  • auth_proxy_role.yaml
  • auth_proxy_role_binding.yaml
  • auth_proxy_service.yaml
  • manager_auth_proxy_patch.yaml

Or at least that's the level of granularity our existing migration doc has

@@ -41,6 +37,28 @@
- Rename `--metrics-addr` to `--metrics-bind-addr`
- Rename `--leader-election` to `--leader-elect`

## kube-rbac-proxy has been removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this down and merge it with

- Add the following items to the `args` list for the `manager` container list
    ```
    - "--metrics-bind-addr=127.0.0.1:8080"
    ```

Copy link
Member

@fabriziopandini fabriziopandini May 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 to move this down, but IMO we should not merge this changes with existing changes.
Let's assume that some provider has already done the first set of changes for v1alpha4 and this one adds on top of them

@sbueringer
Copy link
Member

sbueringer commented May 20, 2021

I don't know how we categorize changes like this usually, is this considered a breaking change as the metrics are now exported through a different port (which I guess would mean we should change the PR title)? (@vincepri )

@vincepri
Copy link
Member

It's a breaking change, yes

@vincepri
Copy link
Member

/retitle ⚠️ Remove kube-rbac-proxy and expose metrics directly on port :8080

@k8s-ci-robot k8s-ci-robot changed the title 🌱 Removed kube-rbac-proxy from cluster-api ⚠️ Remove kube-rbac-proxy and expose metrics directly on port :8080 May 24, 2021
Comment on lines -120 to -129
1. Edit the `/config/manager/manager_auth_proxy_patch.yaml` file:
- Remove the patch for the container with name `manager`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should change those instructions, because I assume some providers already has implemented those changes.

So I think that the changes in this PR should be considered starting from a repository after those change applies (this also should be enforced by the order of changes in this doc + eventually by a note)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some text though that links to the alternative of removing the proxy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can phrase it in a way that the kube-rbac proxy removal works for both cases? (something like grep for certain files and remove the files and all references of them)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking that we could add the instructions for the removal above this change and just say something like (skip this step if you've done step number X)

@CecileRobertMichon
Copy link
Contributor

/kind release-blocking

@k8s-ci-robot k8s-ci-robot added the kind/release-blocking Issues or PRs that need to be closed before the next CAPI release label May 26, 2021
@randomvariable
Copy link
Member

Follow up from today's meeting.

Conflicted in that this is removing potentially useful functionality, but I think it's best if we follow the main Kubernetes project in removing binding the metrics endpoint to ::/0 in favour of only exposing on localhost (which should cover both IPv4 & IPv6) and defaulting to that in the Go flags. This brings us closer in line with Kubernetes v1.20 where the insecure serving port was removed from API server and only /healthz and /ready were left as unauthenticated endpoints.

I'll open an issue to port SAR for metrics to controller-runtime.

@JoelSpeed
Copy link
Contributor

I'll open an issue to port SAR for metrics to controller-runtime.

@Danil-Grigorev and I had some conversations around this last year with the controller-runtime maintainers and Kube RBAC Proxy maintainers. We concluded that this kind of thing isn't really the responsibility of controller-runtime but instead, we should make it so that users can customise this by exposing the various HTTP servers as handlers, so that users can disable the server within CR and wrap the handler in their own middlewares if they wanted to (and we were planning to refactor rbac proxy to a middleware library). We may want to reopen the discussion but this kubernetes-sigs/controller-runtime#1007 is what I wrote up at the time of the disucssion

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as discussed during the meeting, we should also change the flag's default value to be localhost:8080

Thanks @micahhausler for bringing this up!

bootstrap/kubeadm/config/manager/manager.yaml Outdated Show resolved Hide resolved
config/manager/manager.yaml Outdated Show resolved Hide resolved
controlplane/kubeadm/config/manager/manager.yaml Outdated Show resolved Hide resolved
docs/book/src/developer/providers/v1alpha3-to-v1alpha4.md Outdated Show resolved Hide resolved
test/infrastructure/docker/config/manager/manager.yaml Outdated Show resolved Hide resolved
@vincepri vincepri changed the title ⚠️ Remove kube-rbac-proxy and expose metrics directly on port :8080 ⚠️ Remove kube-rbac-proxy and expose metrics on localhost:8080 May 27, 2021
@micahhausler micahhausler force-pushed the remove-kube-rbac-proxy branch from 1718b96 to 303483a Compare June 7, 2021 13:10
Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 7, 2021
@@ -94,6 +90,14 @@ For a `/config` folder reference, please use the testdata in the Kubebuilder pro

Provider's `/config` folder has the same structure of `/config` folder in CAPI controllers.

**Changes in the `/config/rbac` folder:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO Those changes should not go under "## Required kustomize changes to have a single manager watching all namespaces and answer to webhook calls"; the instruction in this paragraph should not be modified given that some of the providers already applied those changes, and it will be difficult for them to figure out the delta.

Instead, there should be a separarated paragraph ## Required kustomize changes to remove Kubeadm-rbac-proxy

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 8, 2021
@vincepri
Copy link
Member

vincepri commented Jun 8, 2021

@fabriziopandini to follow-up with docs changes

@k8s-ci-robot k8s-ci-robot merged commit de99c22 into kubernetes-sigs:master Jun 8, 2021
@k8s-ci-robot k8s-ci-robot added this to the v0.4 milestone Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/release-blocking Issues or PRs that need to be closed before the next CAPI release lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is kube-rbac-proxy required?
8 participants