Add EP for synchronous webhook validation #685

imjasonh · 2021-03-19T22:01:06Z

Changes

Adds a proposal as provisional to gather feedback.

Submitter Checklist

[n/a] Includes tests if functionality changed/was added
[n/a] Includes docs if changes are user-facing
[y] Set a kind label on this PR
[y] Release notes block has been filled in, or marked NONE

See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.

Release Notes

NONE

docs/proposals/webhook-validation.md

gabemontero · 2021-03-23T13:52:20Z

docs/proposals/webhook-validation.md

+Do we want to require that operators run another job to synchronously validate client requests?
+
+Do we want to rely on Knative's webhook packages, or write our own code to manage certs ourselves?
+Is there anything already out there in controller-gen we should be using?


not a req for change, but rather an aside for those who might look at this, @coreydaley on our team has a non-shipwright assignment that happens to involve investigating the particulars here around webhook cert management outside the confines of knative

hopefully we can take what he discovers and apply it to the consideration here

UPDATE: more details are in #596 though some of them are in Red Hat only slack discussion threads that need to be summarized in something consumable by the shipwright community.

High level summary: there is not at the moment in the operator framework / controller-gen space an equivalent of the auto cert generation / mgmt features in the knavite webhook abstraction layers.

gabemontero

this captures all the elements of prior conversations that I recall @imjasonh

thanks

qu1queee

Asking some minor question around the EP, looks good to me in general

docs/proposals/webhook-validation.md

SaschaSchwarze0 · 2021-03-24T15:49:59Z

docs/proposals/webhook-validation.md

+
+## Drawbacks
+
+This requires us to build a new component in the Shipwright installation, and for the operator component to operate it.


That list is quite short. :-)

There is one caveat with the webhook: it validates only on API request (create or update is relevant in this context). But, our validations often include another resource which can also change.

We have some validations where this does not really matter, for example the Git repository validation because we have no trigger if this goes away anyway, or the non-existing Build on a BuildRun creation because the BuildRun is a one-shot thing.

But, we also validate the referenced secrets of a Build and would notice if a secret that is referenced by a Build gets deleted (assuming the secret has the necessary annotation) and subsequently mark the Build as not registered. With the webhook, this would not be possible anymore.

Good points, thanks for bringing them up.

I think it could be simpler (for our implementation, and our users' mental model) for Builds to be assumed to be valid until we find out they're not the next time they're run. I'd like to avoid the scenario where deleting/updating a Secret results in dozens or hundreds of Builds having to be reconciled to check if they're still valid. Especially if that validation includes reaching out to a registry to check that the new credentials are valid, potentially lots of times.

In Tekton, you can create a Task, create a Pipeline that uses the Task, then delete the Task, and the Pipeline doesn't proactively report as invalid. The next time you try to run that Pipeline, PipelineRun creation will fail saying the referenced Task isn't valid. This avoids unnecessary duplicative validation work while a user doesn't care, and puts the error message directly in the user's face the next time they try it.

I'm open to discussing this point more though.

Sounds good.

SaschaSchwarze0 · 2021-03-24T15:53:30Z

docs/proposals/webhook-validation.md

+Always accepting and storing resources, even invalid ones, can also lead to additional storage of known-invalid objects, which clutter the user experience and can lead to confusion, and in extreme cases, cluster resource exhaustion.
+
+Mixing reconciliation and validation code in the controller also complicates our codebase, leading to PRs like https://github.com/shipwright-io/build/pull/641 which add new validation-centric statuses that the user must watch for.
+As we add new validation scenarios, we currently add new `Reason` strings that clients must be aware of, which is cumbersome and won't scale indefinitely.


Imo this would not change with a webhook. We will have different validations and should continue to report the problem as specific as possible, for example using an error code that would be very much the same as the reason that we have today. A good client would have to provide the user different assistance depending on what the error code is.

A goal (perhaps unstated) of this proposal is to not have to define and maintain a table of all possible validation error scenarios and associated unique codes/reasons. That doesn't scale well much beyond where we're already at, and unless we invest in generating documentation, we will have them fall out of sync eventually. Incorrect or incomplete documentation can be worse than no documentation.

Instead, I'd like to invest in human-readable descriptive error messages that we can proactively put in front of users' faces. Those errors can also include codes (i.e., along the lines of HTTP 400 vs 404) to classify the kinds of errors, if we want.

I think it will change @SaschaSchwarze0 , that does not mean we will get rid of how we populate our Status.Conditions, we still need the core states of a resource that runs to completion, e.g. True, False, Unknown and all potential flavors of the Reason field. What we will do is reduce some of those flavors, e.g. BuildStrategyNotFound and allow the admission webhook to block the creation of a BuildRun if the strategy is not there.

@imjasonh I think the table of error scenarios is still needed till some extent, probably an smaller table of error scenarios but we still need to decide in the logic around when to mark something as False with a Failed Reason, or with a BuildRunTimeout one. I dont see this going away.

Also, I´m not sure if the error categorization discussion is out of the scope on this PR, maybe too much of an implementation detail, which we might want to take somewhere else. I do have some strong opinions around entering the business of parsing errors.

Good discussion. I was mainly focusing on those validations that we would perform before a resource gets created (like referenced secrets and buildstrategy when a Build creation happens). I agree that we will have HTTP status codes which can help us to categorize errors already. Though, I guess we will have many errors that translate into the same HTTP status codes. Not just because of that, I think that errors should be categorized using a computer-readable error code (like the current reasons we have for Builds). I am definitely not against human-readable descriptions. But, those have a tendency to be changed from time to time, and client interfaces - especially localized one's - typically cannot rely on the human-readable description. They need other means to identify an error to guide their users to make the necessary correction.

qu1queee

/lgtm

gabemontero · 2021-03-29T12:31:45Z

/approve

/hold

@imjasonh @SaschaSchwarze0 feel free to unhold if you think the latest discussion threads between you two are resolved. They seemed to be but I was not certain enough to assume that.

openshift-ci-robot · 2021-03-29T12:31:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [gabemontero]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

SaschaSchwarze0 · 2021-03-30T13:15:36Z

@imjasonh @SaschaSchwarze0 feel free to unhold if you think the latest discussion threads between you two are resolved. They seemed to be but I was not certain enough to assume that.

I am generally fine with it. Will make sure that the errors that the validation errors that the webhook returns are good enough to be machine-readable when it gets implemented.

/unhold

openshift-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. release-note-none Label for when a PR does not need a release note labels Mar 19, 2021

openshift-ci-robot requested review from adambkaplan and qu1queee March 19, 2021 22:01

imjasonh force-pushed the webhook branch from fc2a426 to b550461 Compare March 21, 2021 13:42

zhangtbj reviewed Mar 22, 2021

View reviewed changes

docs/proposals/webhook-validation.md Outdated Show resolved Hide resolved

docs/proposals/webhook-validation.md Outdated Show resolved Hide resolved

docs/proposals/webhook-validation.md Show resolved Hide resolved

docs/proposals/webhook-validation.md Show resolved Hide resolved

Add EP for synchronous webhook validation

00d3966

imjasonh force-pushed the webhook branch from b550461 to 00d3966 Compare March 23, 2021 12:53

gabemontero reviewed Mar 23, 2021

View reviewed changes

gabemontero approved these changes Mar 23, 2021

View reviewed changes

qu1queee requested changes Mar 24, 2021

View reviewed changes

docs/proposals/webhook-validation.md Show resolved Hide resolved

docs/proposals/webhook-validation.md Show resolved Hide resolved

docs/proposals/webhook-validation.md Show resolved Hide resolved

docs/proposals/webhook-validation.md Show resolved Hide resolved

SaschaSchwarze0 reviewed Mar 24, 2021

View reviewed changes

qu1queee self-requested a review March 29, 2021 10:13

qu1queee approved these changes Mar 29, 2021

View reviewed changes

openshift-ci-robot assigned qu1queee Mar 29, 2021

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 29, 2021

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 29, 2021

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 29, 2021

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2021

openshift-merge-robot merged commit d2520f8 into shipwright-io:master Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EP for synchronous webhook validation #685

Add EP for synchronous webhook validation #685

imjasonh commented Mar 19, 2021

gabemontero Mar 23, 2021

gabemontero Mar 29, 2021

gabemontero left a comment

qu1queee left a comment

SaschaSchwarze0 Mar 24, 2021

imjasonh Mar 24, 2021

SaschaSchwarze0 Mar 29, 2021

SaschaSchwarze0 Mar 24, 2021

imjasonh Mar 24, 2021

qu1queee Mar 25, 2021

SaschaSchwarze0 Mar 29, 2021

qu1queee left a comment

gabemontero commented Mar 29, 2021

openshift-ci-robot commented Mar 29, 2021

SaschaSchwarze0 commented Mar 30, 2021 •

edited

Loading


		## Drawbacks

		This requires us to build a new component in the Shipwright installation, and for the operator component to operate it.

Add EP for synchronous webhook validation #685

Add EP for synchronous webhook validation #685

Conversation

imjasonh commented Mar 19, 2021

Changes

Submitter Checklist

Release Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gabemontero left a comment

Choose a reason for hiding this comment

qu1queee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qu1queee left a comment

Choose a reason for hiding this comment

gabemontero commented Mar 29, 2021

openshift-ci-robot commented Mar 29, 2021

SaschaSchwarze0 commented Mar 30, 2021 • edited Loading

SaschaSchwarze0 commented Mar 30, 2021 •

edited

Loading