Add design details for Custom resource definition webhook validation. #1418

brendandburns · 2017-11-20T05:37:08Z

@kubernetes/sig-api-machinery-api-reviews @lavalamp

lavalamp · 2017-11-20T16:53:28Z

Yeah, that was my suggestion. Just let me repeat that I'd like to get some miles on the webhooks before adding an auto-configuration layer.

lavalamp · 2017-11-20T16:54:05Z

contributors/design-proposals/api-machinery/customresources-validation.md

@@ -381,6 +381,28 @@ Note:

 2. For migration of CRDs with no validation to CRDs with validation, we can create a controller that will validate and annotate invalid CRs once the spec changes, so that the custom controller can choose to delete them (this is also essentially the status condition of the CRD). This can be achieved, but it is not part of the proposal.

+### Admission Webhook
+
+Custom resource definitions use the normal REST endpoint implementation and only customizes the registry and the codecs consequentaly dynamic web-hook admission controllers can be used


Reword, hard to read-- this is missing a comma or something.

brendandburns · 2017-12-13T21:02:06Z

@lavalamp comment addressed, please re-check.

Thanks!

nikhita · 2017-12-13T21:37:36Z

contributors/design-proposals/api-machinery/customresources-validation.md

@@ -381,6 +381,29 @@ Note:

 2. For migration of CRDs with no validation to CRDs with validation, we can create a controller that will validate and annotate invalid CRs once the spec changes, so that the custom controller can choose to delete them (this is also essentially the status condition of the CRD). This can be achieved, but it is not part of the proposal.

+### Admission Webhook
+
+Custom resource definitions use the the same REST endpoint as built-in Kubernetes API objects. Consequentaly standard


s/Consequentaly/Consequently

sttts · 2017-12-14T09:29:00Z

contributors/design-proposals/api-machinery/customresources-validation.md

+create arbitrary admission controllers, it is necessary to have an alternate solution for
+adding web-hook validation to custom resource definitions.
+
+The solution is that the `CustomResourceDefinition` itself has an array of webhooks in the


technically two arrays: mutating, non-mutating

sttts · 2017-12-14T09:38:18Z

contributors/design-proposals/api-machinery/customresources-validation.md

+and in many cases the creator of a `CustomResourceDefinition` is third-party extension code
+(e.g. an instance of the _operator pattern_) which should not have the ability to
+create arbitrary admission controllers, it is necessary to have an alternate solution for
+adding web-hook validation to custom resource definitions.


This raises the question about how to create+distribute the necessary certificates. Currently, we don't have a mechanism to auto-create and inject

the webhook server key+certificate (trusted by the apiextensions-apiserver)

and the kube-apiserver proxy client CA (trusted and verified by the webhook).

We need both. I don't want to see every CRD controller using this implementing its own certificate generation code, and even worse creating something insecure.

In OpenShift we have a controller that's listening on a custom annotation on apiserver pods and then creates a secret on-demand. This gives a smooth and secure setup with zero config.

We need to solve the certificate problem. I had a brief chat with @deads2k a few weeks ago about extracting the service signing certificate controller from OpenShift and making it generally available. Whether we do that or find another solution, I believe we need to do something. Otherwise, I feel it's too much of a burden to ask users to figure out certificate generation, signing, etc. to be able to do things like this.

I think @ncdc is saying this, but I think this is a general problem with dynamic webhooks, not specific to this proposal. We need to make it much easier for people to build dynamic webhooks in general.

Added some text about this.

brendandburns · 2017-12-14T23:28:33Z

Comments addressed, please take another look.

sttts · 2017-12-15T08:07:31Z

contributors/design-proposals/api-machinery/customresources-validation.md

+This ensures that third-party extension code can not register admission controllers for
+arbitrary API objects.
+
+To achieve this, the `CustomResourceDefinition` has two arrays of `admissionregistration.Webhook` 


nit: not much new in this paragraph. just move the type admissionregistration.Webhook into the previous paragraph.

sttts · 2017-12-15T08:09:15Z

One nit, otherwise lgtm.

There are a number of details like what of admissionregistration.Webhook can be set by the user and which not. But I think we don't need that here.

brendandburns · 2017-12-21T05:13:26Z

@sttts comments addressed, please re-check.

Thanks!

sttts · 2018-01-02T09:49:33Z

contributors/design-proposals/api-machinery/customresources-validation.md

+However, because the creation of admission controllers is a fairly high-privilege activity,
+and in many cases the creator of a `CustomResourceDefinition` is third-party extension code
+(e.g. an instance of the _operator pattern_) which should not have the ability to
+create arbitrary admission controllers, it is necessary to have an alternate solution for


Seeing operators in the wild that install CRDs and RBAC rules, I question that there is a real difference to "creation of admission controllers" today. Of course, I totally agree that there should be. But for that we need a similar in-CRD mechanism for RBAC rules.

That is clearly a job for a future design PR :)

sure, just thinking out loud here ;)

sttts · 2018-01-02T09:50:50Z

contributors/design-proposals/api-machinery/customresources-validation.md

+definition itself. The arrays will be of type `admissionregistration.Webhook`.
+One array will contain non-mutating admission controllers for things like
+validation, and one will have mutating admission controllers for the purposes of defaulting.
+The custom resource controller (a piece of trusted code), will register


custom resource definition controller

brendandburns · 2018-01-03T04:54:22Z

Comments addressed, please re-check.

Thanks!

sttts · 2018-01-03T09:25:20Z

@deads2k @liggitt ptal

deads2k · 2018-01-04T12:50:20Z

Thinking about the overall flow of creating a CRD, I see this

Create a CRD and affect global discovery information, including shadowing existing resources and stealing shortnames. This is currently reserved to cluster admins by default.
Create an RBAC cluster role to allow users to use the new custom resource. This requires passing an access escalation check. Since the custom resource won't exist, that means it is reserved to cluster admins too.
Create an AdmissionRegistration. This currently requires cluster-admin as well.

Given that the first two items require cluster-admin access and the RBAC cluster role implies the power to grant access to any resources, I don't see an avenue for attack that this locks down. If I can create a useful and accessible CRD, I can already read every pod and secret in the system (or grant access to do so).

lavalamp · 2018-01-04T16:39:09Z

I think Brendan probably wants to make things easy, not lock them down.

…

On Thu, Jan 4, 2018 at 4:50 AM, David Eads ***@***.***> wrote: Thinking about the overall flow of creating a CRD, I see this 1. Create a CRD and affect global discovery information, including shadowing existing resources and stealing shortnames. This is currently reserved to cluster admins by default. 2. Create an RBAC cluster role to allow users to use the new custom resource. This requires passing an access escalation check. Since the custom resource won't exist, that means it is reserved to cluster admins too. 3. Create an AdmissionRegistration. This currently requires cluster-admin as well. Given that the first two items require cluster-admin access and the RBAC cluster role implies the power to grant access to any resources, I don't see an avenue for attack that this locks down. If I can create a useful and accessible CRD, I can already read every pod and secret in the system (or grant access to do so). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1418 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAnglj1EHhJbJt5POpYzjfVRYPve1Nqkks5tHMkUgaJpZM4Qjw4_> .

deads2k · 2018-01-05T13:33:20Z

I think Brendan probably wants to make things easy, not lock them down.

I'm not sure where this comment is going. In a cluster that cares about security, I think my point stands and if you grant permissions creating separate resources works fine. In a cluster that doesn't care about security, you don't have a restriction on those things and creating separate resources works fine. Are you suggesting there's a third case? Could you describe it?

lavalamp · 2018-01-05T17:13:59Z

Maybe I should let Brendan say what he wants, but I think it is something along the lines of "in a cluster that doesn't care about security I only have to submit the CRD to the system, no other configuration objects". This is about making the CRD developer and installer's life easy. (I think we would do well to adopt that goal in general, but I am personally ambivalent about whether this particular thing goes very far in service of this goal, esp. if a CRD author generally ends up running a controller etc. anyway.)

…

On Fri, Jan 5, 2018 at 5:33 AM, David Eads ***@***.***> wrote: I think Brendan probably wants to make things easy, not lock them down. I'm not sure where this comment is going. In a cluster that cares about security, I think my point stands and if you grant permissions creating separate resources works fine. In a cluster that doesn't care about security, you don't have a restriction on those things and creating separate resources works fine. Are you suggesting there's a third case? Could you describe it? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1418 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAngln8VvTMm1x_N5ax6z9xa2sg_JS_Rks5tHiSsgaJpZM4Qjw4_> .

brendandburns · 2018-02-05T14:47:27Z

(sorry returning to this)

@deads2k regarding your point, there are two things:

Regardless of security, I think this design prevents accidents. I'd rather not have CRDs accidentally creating badly-written/broken admission controllers and breaking the entire cluster.
My hoped for flow for a secure cluster would look like:

Cluster-admin creates a stub CRD with names/short-names but no real details.
CRD-Operator is created with permissions to update their own CRD, but no other permissions.
CRD-Operator updates the CRD to add a web-hook for validation.

In this model, you can keep CRD creation with cluster-admin, but delegate the details of the CRD definition to the CRD operator.

I don't see anything in your concerns that should block merging this design proposal so that we can start building it, let me know if you feel differently and I'll address those concerns.

Thanks

deads2k · 2018-02-05T15:38:20Z

I don't see anything in your concerns that should block merging this design proposal so that we can start building it, let me know if you feel differently and I'll address those concerns.

My comments and concerns address the core concept of this pull and I think that until we have agreement in concept, we should not merge this change.

CRD-Operator is created with permissions to update their own CRD, but no other permissions.

The power to update the CustomResourceDefinition allows power over global discovery. Allowing the CRD-operator the power to change the CRD is akin to allowing a process to modify its own configuration.

The CRD-operator should not get to decide the shape of the CRD that it uses and it also not be able to control the validation associated with it. When the CRD is created, it should contain its validation rules. This means the responsibility lies with the CRD creator who is also creating the roles. It looks like a cluster-admin to me.

lavalamp · 2018-02-05T19:32:04Z

Maybe we would do well to work on having some common goal / requirements around usability + security for CRDs. I don't think we can come to agreement on a specific proposal until we agree what we're even trying to accomplish.

brendandburns · 2018-02-06T02:09:16Z

@deads2k given that that is up to the cluster admin to decide via RBAC, I don't see the issue here.

IF the cluster admin wants to force manual install for all CRDs that's their call.

That said, I think there are lots of clusters were people want things to behave like plugins and just dynamically work. But forcing CRD developers to learn about admission controllers and to configure them correctly just adds complexity and work to the job of the CRD developer.

Could you describe the disagreement you see in the core concept?

For me the core-concept(s) are:

We should have webhook validation for CRDs
We should have an controlled and automated method to make it easier for CRD developers/operators to activate those webhook validators without writing raw Admission Controllers

Nothing in this proposal is affecting the security model one way or the other, the cluster admin is free to set RBAC however they choose.

Please clarify, thanks!

deads2k · 2018-02-06T14:31:59Z

We should have an controlled and automated method to make it easier for CRD developers/operators to activate those webhook validators without writing raw Admission Controllers

This piece. It isn't obvious to me what value this adds. You're duplicating an existing API in a different object to save setting three fields. In the use-cases we have already the actor creating a useful CRD already has full permission on cluster, so there isn't much value in duplicating the API.

You're still going to be writing the admission webhook code either way. You're just eliminating writing a rule from the resource manifest. Also, writing a library to make the admission plugin webhook is fairly easy. See https://github.com/openshift/generic-admission-server as an example in golang.

brendandburns · 2018-02-07T01:10:41Z

The most important feature of this isn't the simplification (though I think that's important too) it's preventing someone from making mistakes and accidentally registering an Admission Controller that has broader scope than just the particular CRD.

Given that your objection is based on not seeing the value (and I believe there is the value outlined above) can we agree to disagree and move forward with this proposal? I don't think it hurts anything, and I personally believe there is significant value in terms of ease of use and prevention of mistakes.

liggitt · 2018-02-07T01:20:03Z

Given that your objection is based on not seeing the value (and I believe there is the value outlined above) can we agree to disagree and move forward with this proposal?

I'd prefer we not expand/nest APIs if we don't have to. I'd rather see API additions justified by enabling use cases that are not currently possible.

brendandburns · 2018-02-07T03:56:31Z

I don't think this is nesting, this is like the fact that PodTemplate is embedded in multiple different objects. It is re-use of a common object. We're not nesting a complete AdmissionController.

One of the recurring benefits of CRDs (and the reason why they are way more popular than aggregated API servers) is ease of use.

If you tell someone that if they want to validate their CRD they need to either learn Swagger or learn about admission controllers, you are erecting serious barriers to entry.

I think you need to focus on the usability for the average user, not the person who spends their time reading sig-api-machinery.

fejta-bot · 2018-05-08T04:08:07Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

k8s-ci-robot · 2018-05-30T04:41:21Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: deads2k

Assign the PR to them by writing /assign @deads2k in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

contributors/design-proposals/api-machinery/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · 2018-06-29T05:31:43Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-07-29T06:19:52Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-github-robot assigned lavalamp and deads2k Nov 20, 2017

lavalamp reviewed Nov 20, 2017

View reviewed changes

brendandburns force-pushed the master branch from 92e7bfd to 2a9287c Compare December 13, 2017 21:01

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 13, 2017

nikhita reviewed Dec 13, 2017

View reviewed changes

sttts reviewed Dec 14, 2017

View reviewed changes

brendandburns force-pushed the master branch from 2a9287c to c00e99e Compare December 14, 2017 23:28

sttts reviewed Dec 15, 2017

View reviewed changes

brendandburns force-pushed the master branch from c00e99e to a1ceb56 Compare December 21, 2017 05:13

nikhita mentioned this pull request Dec 21, 2017

Validation for Custom Resource contents kubernetes/kubernetes#38117

Closed

sttts reviewed Jan 2, 2018

View reviewed changes

brendandburns force-pushed the master branch from a1ceb56 to 636ce07 Compare January 3, 2018 04:53

nikhita mentioned this pull request Jan 23, 2018

Umbrella issue for CRDs moving to GA kubernetes/kubernetes#58682

Closed

54 tasks

k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Feb 5, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 8, 2018

Add design details for Custom resource definition webhook validation.

57fecb2

brendandburns force-pushed the master branch from 636ce07 to d6015da Compare May 30, 2018 04:40

k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 30, 2018

brendandburns force-pushed the master branch from d6015da to 57fecb2 Compare May 30, 2018 04:41

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 29, 2018

k8s-ci-robot closed this Jul 29, 2018

Add design details for Custom resource definition webhook validation. #1418

Add design details for Custom resource definition webhook validation. #1418

Conversation

brendandburns commented Nov 20, 2017

lavalamp commented Nov 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendandburns commented Dec 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sttts Dec 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendandburns commented Dec 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sttts commented Dec 15, 2017

brendandburns commented Dec 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendandburns commented Jan 3, 2018

sttts commented Jan 3, 2018

deads2k commented Jan 4, 2018

lavalamp commented Jan 4, 2018 via email

deads2k commented Jan 5, 2018

lavalamp commented Jan 5, 2018 via email

brendandburns commented Feb 5, 2018 • edited Loading

deads2k commented Feb 5, 2018

lavalamp commented Feb 5, 2018

brendandburns commented Feb 6, 2018

deads2k commented Feb 6, 2018

brendandburns commented Feb 7, 2018

liggitt commented Feb 7, 2018

brendandburns commented Feb 7, 2018

fejta-bot commented May 8, 2018

k8s-ci-robot commented May 30, 2018

fejta-bot commented Jun 29, 2018

fejta-bot commented Jul 29, 2018

sttts Dec 14, 2017 •

edited

Loading

brendandburns commented Feb 5, 2018 •

edited

Loading