Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEP-0091]Trusted Resources #739

Merged
merged 1 commit into from
Aug 15, 2022

Conversation

Yongxuanzhang
Copy link
Member

@Yongxuanzhang Yongxuanzhang commented Jun 24, 2022

This commit proposes one solution to veriy the Tekton Resources via
admission webhook. It aims at starting the discussion on this work and
drives the Trusted Resources launched to Alpha.

This is a fork of #537 and work from @wlynch, and adapt it based on the poc in https://github.com/tektoncd/experimental/tree/main/pipeline/trusted-resources

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 24, 2022
@Yongxuanzhang
Copy link
Member Author

@wlynch @dibyom Looking forward to your comments!

Copy link
Member

@jerop jerop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/kind tep

@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Jun 27, 2022
@dibyom
Copy link
Member

dibyom commented Jun 27, 2022

/assign @wlynch
/assign @dibyom

@dibyom
Copy link
Member

dibyom commented Jun 27, 2022

/assign @afrittoli

@ywluogg
Copy link
Contributor

ywluogg commented Jun 30, 2022

@ywluogg

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 5, 2022
Copy link
Member

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few minor comments.


### Verify the Resources

A new validating webhook should be implemented to do the verification. To receive the admission request new types should be implemented to include the Tekton TaskRun and PipelineRun as field, future Tekton Resources can also be extended this way.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, if we're adding this to Pipelines I wouldn't be opposed to adding this to the existing validation webhook. The separate webhook we were using in experimental was primarily to allow optional installation.

(that said, also fine to ship the second webhook to start if that simplifies things)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok. So should I include both options?

  1. Add into existing webhook
  2. Add a new webhook

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah packaging the new webhook into the pipeline webhook binary should be straightforward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok got it!

Copy link
Member Author

@Yongxuanzhang Yongxuanzhang Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this section to add the verification into current admission webhook, and it should be gated by a new configmap.

}
```

The new types should implement `Validate()` and verify the Resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be careful not to expose new types to the users if we can avoid it - we should try to unexport these / put into an internal package if possible for the real impl.

Copy link
Member Author

@Yongxuanzhang Yongxuanzhang Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean add all the stuff into existing tekton pipeline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we integrate into the pipeline then this shouldn't be a problem. I have changed the tep to propose to add to the pipeline


Validating webhook can use configmap to allow users to config when the Resource fails the verification
1) Directly fail the run;
2) Not fail the run but annotate the run with `tekton.dev/pass-verification: false`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be careful not to place too much trust into this option, since it can be mutated later by another user later on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, could this be added to the status so that the data can be signed by SPIFFE/SPIRE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be careful not to place too much trust into this option, since it can be mutated later by another user later on.

Oh yes, so for status did you mean this?https://github.com/tektoncd/pipeline/blob/6876520132f3bb22ff076e6a223b7114e936c611/pkg/apis/pipeline/v1beta1/taskrun_types.go#L120-L125

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the tep to propose adding a field under TaskRunStatus, and add the result into this new field

1) Directly fail the run;
2) Not fail the run but annotate the run with `tekton.dev/pass-verification: false`;
If it passes the verification, the run will be annotated with `tekton.dev/pass-verification: true`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3rd option - you might also be interested in returning a warning response for failure for non-enforced validation. https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#response

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it something we can do within knative's admission webhook?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure but I think this might be possible by setting the DiagnosticLevel when retuning an apis.FieldError from the webhook validation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supported in knative 3 months ago. knative/pkg#2498
Just check the code pipeline is not updated to include this change but operator does.
I can update this in the tep

teps/0091-trusted-resources.md Outdated Show resolved Hide resolved
@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 5, 2022
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 6, 2022
@pritidesai
Copy link
Member

API WG - @dibyom is still reviewing and can merge offline - proposed TEP with PoC in experimental repo.

teps/0091-trusted-resources.md Outdated Show resolved Hide resolved

### Performance (optional)

For remote Resources such as OCI bundle, it will take time to fetch the resources. Resources that are not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a rough estimate of how long the resource fetch takes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have right now. It will depend on how revolver work and also what remote source we are fetching. For now I can make some experiment, like for the OCI case, if I pull it from docker hub, how long it will take. To get a rough estimate of it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested a simple task oci bundle fetching from docker hub and the time is very short, can be ignored.

Copy link
Member

@dibyom dibyom Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit worried about the remote artifact fetch particularly if this is done as a validation webhook and uses remote resolution - that would involve the webhook creating a ResolutionRequest CR, waiting for the Resolver to finish resolving, and making a validation decision.

(One mitigation would be to do this verification in the reconciler (or the resolver) instead of in the admission webhook)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and this also needs to give webhook the permission to fetch the tasks/pipelines. It's not so difficult to do it after resolution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work with a validation webhook and remote resources?
Would the validation webhook apply to the TaskRun / PipelineRun? That would mean that both the webhook and controller would have to fetch remote resources, unless the controller would have a way to use the resources fetched by webhook.

Copy link
Member Author

@Yongxuanzhang Yongxuanzhang Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work with a validation webhook and remote resources? Would the validation webhook apply to the TaskRun / PipelineRun? That would mean that both the webhook and controller would have to fetch remote resources, unless the controller would have a way to use the resources fetched by webhook.

Yes you're right that both the webhook and controller would have to fetch remote resources if this feature is enabled and we want to have the verification in webhook.

Another option is that we can integrate the verification together with remote resolution, either in the reconciler after resolving or in the resolver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to balance when thinking about webhook vs reconcile validation - validation before admission is the safer strategy because the resource can be blocked before downstream informers are aware of them.

Other OCI based policy webhooks (OPA, Kyverno, sigstore/policy-controller) seem to do fine with network based lookups at admission time, though I get we also want to dedup fetching efforts between this and RemoteResolution.

If we go down the reconcile validation path, we should be careful about documenting when resources are considered trusted and how that information is exposed to users in the Run status (e.g. it might be worth having a dedicated termination status marking a Run as unverified?). Even then, there may still be some utility in a webhook based validation to make sure that the Run object itself hasn't been tampered with. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with the caveats around reconciler based validation but I do think there is (or will be) a difference in the remote fetch between this webhook and OPA/policy controller - with remote resolution for resource fetches, the webhook will have to create a custom resource and then watch for update to its status as opposed to a sync HTTP request/response

Some possible mitigations include the following:

1. Update the Resources to add the missing mutating values (e.g. Use `SetDefaults`) everytime we bump the version of Tekton Pipeline.
2. Avoid mutating Tekton Resources or move the mutating after verification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't another mitigation be moving the verification from the webhook to the reconciler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong.
If we move it to the reconciler, for local tasks/pipelines they still need to go through mutating webhook. And the content may be changed. The flow is:
mutating -> validation -> reconciler

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do some research this week and see if there are better ways!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. One option is that we apply the SetDefaults() each time on reconcile instead of at admission time (so the object stored in etcd is not mutated)

Another possibility might be having the CLI that is doing the signing make a dry-run call to the API server to get the resource defaults set and sign that https://kubernetes.io/docs/reference/using-api/api-concepts/#dry-run

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks!
The dryrun is really helpful, Billy just also mentioned this.
Right now it is similar that I add call setdefaults during signing, the benefit is that it will not depend a running cluster, but it cannot guarantee that the setdefault is the same as the cluster's tekton. So I think what you're suggesting is better!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for moving defaults to the reconciler - the less we mutate the config before verification the better.

Copy link
Member Author

@Yongxuanzhang Yongxuanzhang Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for moving defaults to the reconciler - the less we mutate the config before verification the better.

I think people may not like this idea but I will keep this as an option. Because it is kind of against the original desgin of admission webhooks? That is all the request objects need to be mutated and then validate, and some validating logic assume that this object must be mutated.

We encountered some issues recently that resources not go through the mutating will fail in validating.
If we move mutating into reconciler then what stored in etcd is not mutated and fully validated.

If we can do verification->mutating->verification then it may be better?


### Verify the Resources

A new validating webhook should be implemented to do the verification. To receive the admission request new types should be implemented to include the Tekton TaskRun and PipelineRun as field, future Tekton Resources can also be extended this way.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah packaging the new webhook into the pipeline webhook binary should be straightforward.

1) Directly fail the run;
2) Not fail the run but annotate the run with `tekton.dev/pass-verification: false`;
If it passes the verification, the run will be annotated with `tekton.dev/pass-verification: true`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure but I think this might be possible by setting the DiagnosticLevel when retuning an apis.FieldError from the webhook validation

@dibyom
Copy link
Member

dibyom commented Jul 11, 2022

This sounds good to me @Yongxuanzhang -- have a couple of questions. I think once you squash the 2 commits, we can merge this.

@Yongxuanzhang
Copy link
Member Author

Hi @wlynch @dibyom, I went through the comments of this tep,
So from your comment, we can add this into the current pipeline's webhook.
I have 2 questions regarding direction.

  1. Do we need to worry about introducing more dependency into the pipeline? cosign and sigstore will be added into the go.mod, not sure if that is ok. But since SPIFFE/SPIRE also integrate into the pipeline, maybe that is fine?
  2. We have the command line code for singing the yaml files. Where is the best place to put them? Under pipeline/cmd?

@Yongxuanzhang Yongxuanzhang force-pushed the trusted-resources branch 3 times, most recently from ebf2a79 to bb0f0b2 Compare August 3, 2022 20:28
@Yongxuanzhang
Copy link
Member Author

Hi @wlynch @dibyom, I went through the comments of this tep, So from your comment, we can add this into the current pipeline's webhook. I have 2 questions regarding direction.

  1. Do we need to worry about introducing more dependency into the pipeline? cosign and sigstore will be added into the go.mod, not sure if that is ok. But since SPIFFE/SPIRE also integrate into the pipeline, maybe that is fine?
  2. We have the command line code for singing the yaml files. Where is the best place to put them? Under pipeline/cmd?
  1. The TEP is adjusted to add into Tektoncd/Pipeline
  2. The signing can go to cmd, there is another tep before discussing adding sign/verify into tkn. This can be also considered

@dibyom
Copy link
Member

dibyom commented Aug 5, 2022

Do we need to worry about introducing more dependency into the pipeline? cosign and sigstore will be added into the go.mod, not sure if that is ok. But since SPIFFE/SPIRE also integrate into the pipeline, maybe that is fine?

I think this is ok. If we really have to, it should be straightforward to split off into its own binary

We have the command line code for singing the yaml files. Where is the best place to put them? Under pipeline/cmd?

I think eventually this should make it into tkn

There are several ways to integrate the Trusted Tesource with Remote Resource Resolution


| Method | Pros | Cons |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you leaning towards one of these options at the moment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the implementation is based on callbacks in validation webhook of pipeline. I think the benefit of having it in webhook is that we can fail the run/build instantly when it fails the verification.

I discussed the integration with remote resolution in one of the working groups before, and Priti suggested me to wait until the remote resolution come to beta version.

Copy link
Member

@dibyom dibyom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this good to merge as proposed though we should decide if it makes sense to support remote resources verification via a webhook or not.

/approve

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

I think there are still some details to be fleshed out, but it looks good for "proposed" status, this is definitely a capability we should have.

/approve

Comment on lines 129 to 130
This proposal will introduce a new admission webhook to Tekton to verify a
Tekton Resource and can indicate that the verification must occur.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sentence still up to date? An admission controller might not work for remote resources?

Copy link
Member Author

@Yongxuanzhang Yongxuanzhang Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

  1. Right now it is not necessarily a new admission webhhook, after some experiments last week we can just add some new callbacks into Pipelines's validating webhook to do the verification. I will update the tep.
  2. Correct me if I'm wrong, Admission controller should be able to fetch remote resources just like what we do in reconciler, right now the implementation can fetch the oci bundles using k8schain?

Comment on lines 160 to 163
the builder images. These builder images may have been compromised and the
Task would still be considered verified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are "builder images"? Are these the images used in the steps? If so, I would recommend to stick to the standard Tekton nomenclature to avoid confusion, and define any new nomenclature that needs to be introduced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it refers to the step images. So should I use step images?


**Author publishing a Resource.**
A Resource Author may want their Resource and its
outputs to be trusted so that provenance attestations can be confidently made. The author will use the private key to sign the Resource and signatures are stored in `Annotations` map of the Resource. Public keys are stored as secrets or via URI stored in the configmap. Similar mechanism is used in [Chains](https://github.com/tektoncd/chains).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to expand on "the" private key.
Where does this key come from, what role does an author need to have to have access to a private key and sign resources, do we trust all signed resources or can specify a set of trusted signing keys?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is a great point.
The current idea/assumption is that, the private keys should come from the users who want to use this feature. So it is quite flexible that some users can have their own private keys and publish the public keys. Tektoncd or pipeline doesn't provide private keys right now.

Copy link
Member Author

@Yongxuanzhang Yongxuanzhang Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this makes sense, I can expand this section to clarify. Of if you have any other suggestions pls let me know. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly just a phrasing thing? the use of "the private" makes it sound like there's a preexisting key that users should use, when really IIUC this is just an arbitrary key pair under the users control?

Suggested change
outputs to be trusted so that provenance attestations can be confidently made. The author will use the private key to sign the Resource and signatures are stored in `Annotations` map of the Resource. Public keys are stored as secrets or via URI stored in the configmap. Similar mechanism is used in [Chains](https://github.com/tektoncd/chains).
outputs to be trusted so that provenance attestations can be confidently made. The author will use their own public/private key pair to sign the Resource and signatures are stored in `Annotations` map of the Resource. Public keys are stored as secrets or via URI stored in the configmap. Similar mechanism is used in [Chains](https://github.com/tektoncd/chains).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually reading on below with the configmap, might need some more clarification

outputs to be trusted so that provenance attestations can be confidently made. The author will use the private key to sign the Resource and signatures are stored in `Annotations` map of the Resource. Public keys are stored as secrets or via URI stored in the configmap. Similar mechanism is used in [Chains](https://github.com/tektoncd/chains).

**Verify the Resource via validating webhook.**
The user will need to deploy the validating webhook to the cluster, by default the webhook will reject a Resource if it fails the verification. The webhook will mark if this Resource fails or passes the verification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the deployment be on by default and enforcement controlled via configuration?
Any implication on the operator here?

Copy link
Member Author

@Yongxuanzhang Yongxuanzhang Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the deployment be on by default and enforcement controlled via configuration?

This will be gated by alpha and a new configmap, and by default it should just skip the verification. This section is not updated and I will update right now!


### Performance (optional)

For remote Resources such as OCI bundle, it will take time to fetch the resources. Resources that are not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work with a validation webhook and remote resources?
Would the validation webhook apply to the TaskRun / PipelineRun? That would mean that both the webhook and controller would have to fetch remote resources, unless the controller would have a way to use the resources fetched by webhook.

Comment on lines +234 to +240
The verification should be done in the Tekton Pipeline's admission webhook. A new configmap should be added to gate the verification
code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all Tekton resources are installed on the cluster, so the admission webhook will not catch remote resources and OCI bundles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what we want to include in this verification process, we need to fetch these resources


| Method | Pros | Cons |
| -------- | ----------- | ----------- |
| Call Resolution in Webhook | Easy decoupled implementation, less dependency introduced to Pipeline | Duplicate work for Resolution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which webhook does this refer to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the current validating webhook. I just updated it.

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, dibyom, wlynch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Some possible mitigations include the following:

1. Update the Resources to add the missing mutating values (e.g. Use `SetDefaults`) everytime we bump the version of Tekton Pipeline.
2. Avoid mutating Tekton Resources or move the mutating after verification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for moving defaults to the reconciler - the less we mutate the config before verification the better.


**Author publishing a Resource.**
A Resource Author may want their Resource and its
outputs to be trusted so that provenance attestations can be confidently made. The author will use the private key to sign the Resource and signatures are stored in `Annotations` map of the Resource. Public keys are stored as secrets or via URI stored in the configmap. Similar mechanism is used in [Chains](https://github.com/tektoncd/chains).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly just a phrasing thing? the use of "the private" makes it sound like there's a preexisting key that users should use, when really IIUC this is just an arbitrary key pair under the users control?

Suggested change
outputs to be trusted so that provenance attestations can be confidently made. The author will use the private key to sign the Resource and signatures are stored in `Annotations` map of the Resource. Public keys are stored as secrets or via URI stored in the configmap. Similar mechanism is used in [Chains](https://github.com/tektoncd/chains).
outputs to be trusted so that provenance attestations can be confidently made. The author will use their own public/private key pair to sign the Resource and signatures are stored in `Annotations` map of the Resource. Public keys are stored as secrets or via URI stored in the configmap. Similar mechanism is used in [Chains](https://github.com/tektoncd/chains).


`skip-verification`. (Optional, default `true`), if true then directly return nil, if false then do the verfication

`cosign-pubkey-path`. (Optional, default `/etc/signing-secrets/cosign.pub`), it specifies the secret path to store the cosign pubkey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be difficult to support for multiple keys. You'll probably want a policy configuration similar to https://github.com/sigstore/policy-controller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked this repo. So ClusterImagePolicy is actually a new CRD created? Maybe to support multiple keys we can also have a new keyref type in tekton?


### Performance (optional)

For remote Resources such as OCI bundle, it will take time to fetch the resources. Resources that are not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to balance when thinking about webhook vs reconcile validation - validation before admission is the safer strategy because the resource can be blocked before downstream informers are aware of them.

Other OCI based policy webhooks (OPA, Kyverno, sigstore/policy-controller) seem to do fine with network based lookups at admission time, though I get we also want to dedup fetching efforts between this and RemoteResolution.

If we go down the reconcile validation path, we should be careful about documenting when resources are considered trusted and how that information is exposed to users in the Run status (e.g. it might be worth having a dedicated termination status marking a Run as unverified?). Even then, there may still be some utility in a webhook based validation to make sure that the Run object itself hasn't been tampered with. 🤔

@Yongxuanzhang
Copy link
Member Author

@jagathprakash

@Yongxuanzhang
Copy link
Member Author

Yongxuanzhang commented Aug 10, 2022

Summarize some discussions here cuz it may be difficult to track new replies:
Where should we do the verification?

  1. Validating webhook: Fail fast of the resources but will need to do dup work of resources resolution.
  2. Reconciler: The tampered resources may be persistent in storage, but no dup work no extra cost.
  3. Mutating Webhook and Validating webhook: Mutating Webhook first verify, and mark the taskrun as fail or pass, later validating webhook can check and reject it. (Optional: The mutating webhook can also resign the resources via the private keys, so if there are case for multiple mutating webhook, they can be verified again. )

I personally prefer to do the verification in admission webhooks, so that we can make sure the resources who are not trusted be kept out of the gate.

@Yongxuanzhang
Copy link
Member Author

How to avoid the impact from mutating webhooks to the verification, this is only a concern for local resources, those who are applied in. cluster:

  1. Mitigation methods: call setdefaults when signing the yaml files, every time we bump the pipeline version or change the mutating webhooks, users may need to resign the files again.
  2. Move mutating webhook into reconciler: The flow will be Validating+Verification -> Reconcile (Mutating+Validation). There are some validation logic based on mutating webhook. So this means we need to decouple this dependency.
  3. Do verification in mutating webhook before all the mutations.

@dibyom
Copy link
Member

dibyom commented Aug 15, 2022

@Yongxuanzhang thanks for the summary. Very helpful. Do you want to add these open questions to the TEP and we merge this PR or should we keep discussing those options on this PR itself? (I think I have a preference for the first option)

This commit proposes one solution to veriy the Tekton Resources via
admission webhook. It aims at starting the discussion on this work and
drive the Trusted Resources launched to Alpha.
@Yongxuanzhang
Copy link
Member Author

Where should we do the verification?

  1. Validating webhook: Fail fast of the resources but will need to do dup work of resources resolution.
  2. Reconciler: The tampered resources may be persistent in storage, but no dup work no extra cost.
  3. Mutating Webhook and Validating webhook: Mutating Webhook first verify, and mark the taskrun as fail or pass, later validating webhook can check and reject it. (Optional: The mutating webhook can also resign the resources via the private keys, so if there are case for multiple mutating webhook, they can be verified again. )

Thank you! @dibyom 😄 I have included them and updated this PR. Maybe we can merge it first!

@dibyom
Copy link
Member

dibyom commented Aug 15, 2022

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 15, 2022
@tekton-robot tekton-robot merged commit a618e10 into tektoncd:main Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: UnAssigned
Development

Successfully merging this pull request may close these issues.

8 participants