Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TrustTask] Trusted taskspec verification #833

Merged
merged 1 commit into from
Mar 2, 2022

Conversation

Yongxuanzhang
Copy link
Member

@Yongxuanzhang Yongxuanzhang commented Feb 14, 2022

Changes

This commit is an initial step of work for Trust Task.(Forked from tektoncd/community#537)

Prior this commit we cannot verify if the taskrun is manipulated or not before applying to the cluster.

This commit leverages cosign to do verification on taskspec. Admission webhook and verification on taskrun.spec will be introduced in following PRs. With these work we can deploy a standalone webhook for verification without changing current Tekton Pipeline APIs.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Commit messages includes a project tag in the subject - e.g. [OCI], [hub], [results], [task-loops]

See the contribution guide
for more details.

@tekton-robot tekton-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 14, 2022
@Yongxuanzhang Yongxuanzhang changed the title Trusted resources verification [Proof of concept] Trusted resources verification Feb 14, 2022
@Yongxuanzhang Yongxuanzhang changed the title [Proof of concept] Trusted resources verification [Proof of Concept] Trusted resources verification Feb 14, 2022

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/tektoncd/experimental/blob/master/LICENSE)

This is an experimental project to provide a seperate webhook for remote resources verification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this? Is there a doc or issue I can read to catch up on the goal of this project? It sounds very interesting 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This is proposed by @wlynch. The idea is to allow users to install/deploy a webhook in current tekton pipeline, where we can verify the remote resources (taskrun, pipelinerun, oci bundle etc.) with Cosign.
Maybe he can share the doc with you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super rough notes here: https://hackmd.io/93mfJPyDQKCyn0IKwjzgWQ#Design-Details - tl;dr it's basically a modification of Jason's original trusted task TEP, but using validation webhooks so we can prototype something before needing to make modifications to the Pipelines API.

@Yongxuanzhang Yongxuanzhang marked this pull request as ready for review February 22, 2022 14:22
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 22, 2022
Copy link
Member

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass - probably more to dig into but waiting on some of the structural changes to be made to simplify some of the impl.

pipeline/trusted-resources/pkg/webhook/controller.go Outdated Show resolved Hide resolved
pipeline/trusted-resources/config/200-clusterrole.yaml Outdated Show resolved Hide resolved
pipeline/trusted-resources/config/200-serviceaccount.yaml Outdated Show resolved Hide resolved
pipeline/trusted-resources/pkg/webhook/validation_admit.go Outdated Show resolved Hide resolved
pipeline/trusted-resources/pkg/webhook/validation_admit.go Outdated Show resolved Hide resolved
serviceAccount = "tekton-pipelines-webhook"
)

func Test_verifyResources(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful to have tests for both individual funcs (e.g. just the signature generation, etc.) as well as the overall verification. This way it's easier to figure out where exactly changes have been made if/when something breaks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

@Yongxuanzhang
Copy link
Member Author

/test tekton-experimental-unit-tests

Comment on lines 5 to 9
approvers:
- afrittoli
- bobcatfish
- ImJasonH
- jerop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copy/paste from pipeline in pipeline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will update them!

Copy link
Member

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to go over tests, but this should get you started. I'd recommend breaking this up into 2 PRs:

  1. Add TaskRun validation / signature checking libraries.
  2. Creating AdmissionWebhook.

pipeline/trusted-resources/OWNERS Outdated Show resolved Hide resolved
pipeline/trusted-resources/config/200-role.yaml Outdated Show resolved Hide resolved
pipeline/trusted-resources/config/200-clusterrole.yaml Outdated Show resolved Hide resolved
pipeline/trusted-resources/config/200-clusterrole.yaml Outdated Show resolved Hide resolved
Comment on lines 75 to 180

// Look up the webhook secret, and fetch the CA cert bundle.
secret, err := ac.secretlister.Secrets(system.Namespace()).Get(ac.secretName)
if err != nil {
logger.Errorw("Error fetching secret", zap.Error(err))
return err
}
caCert, ok := secret.Data[certresources.CACert]
if !ok {
return fmt.Errorf("secret %q is missing %q key", ac.secretName, certresources.CACert)
}

// Reconcile the webhook configuration.
return ac.reconcileValidatingWebhook(ctx, caCert)
}

func (ac *reconciler) reconcileValidatingWebhook(ctx context.Context, caCert []byte) error {
logger := logging.FromContext(ctx)

rules := make([]admissionregistrationv1.RuleWithOperations, 0, len(ac.handlers))
for gvk := range ac.handlers {
plural := strings.ToLower(flect.Pluralize(gvk.Kind))

rules = append(rules, admissionregistrationv1.RuleWithOperations{
Operations: []admissionregistrationv1.OperationType{
admissionregistrationv1.Create,
admissionregistrationv1.Update,
admissionregistrationv1.Delete,
},
Rule: admissionregistrationv1.Rule{
APIGroups: []string{gvk.Group},
APIVersions: []string{gvk.Version},
Resources: []string{plural, plural + "/status"},
},
})
}

// Sort the rules by Group, Version, Kind so that things are deterministically ordered.
sort.Slice(rules, func(i, j int) bool {
lhs, rhs := rules[i], rules[j]
if lhs.APIGroups[0] != rhs.APIGroups[0] {
return lhs.APIGroups[0] < rhs.APIGroups[0]
}
if lhs.APIVersions[0] != rhs.APIVersions[0] {
return lhs.APIVersions[0] < rhs.APIVersions[0]
}
return lhs.Resources[0] < rhs.Resources[0]
})

configuredWebhook, err := ac.vwhlister.Get(ac.key.Name)
if err != nil {
return fmt.Errorf("error retrieving webhook: %w", err)
}

current := configuredWebhook.DeepCopy()

// Set the owner to namespace.
ns, err := ac.client.CoreV1().Namespaces().Get(ctx, system.Namespace(), metav1.GetOptions{})
if err != nil {
return fmt.Errorf("failed to fetch namespace: %w", err)
}
nsRef := *metav1.NewControllerRef(ns, corev1.SchemeGroupVersion.WithKind("Namespace"))
current.OwnerReferences = []metav1.OwnerReference{nsRef}

for i, wh := range current.Webhooks {
if wh.Name != current.Name {
continue
}
cur := &current.Webhooks[i]
cur.Rules = rules

cur.NamespaceSelector = webhook.EnsureLabelSelectorExpressions(
cur.NamespaceSelector,
&metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{{
Key: "webhooks.knative.dev/exclude",
Operator: metav1.LabelSelectorOpDoesNotExist,
}},
})

cur.ClientConfig.CABundle = caCert
if cur.ClientConfig.Service == nil {
return fmt.Errorf("missing service reference for webhook: %s", wh.Name)
}
cur.ClientConfig.Service.Path = ptr.String(ac.Path())
}

if ok, err := kmp.SafeEqual(configuredWebhook, current); err != nil {
return fmt.Errorf("error diffing webhooks: %w", err)
} else if !ok {
logger.Info("Updating webhook")
vwhclient := ac.client.AdmissionregistrationV1().ValidatingWebhookConfigurations()
if _, err := vwhclient.Update(ctx, current, metav1.UpdateOptions{}); err != nil {
return fmt.Errorf("failed to update webhook: %w", err)
}
} else {
logger.Info("Webhook is valid")
}
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading through this and the other file, I think I get what you're trying to do here.

Instead of forking the resourcesemantics code, I think what we should try to do is lean on the handlers field and inject our own Validatable object.

e.g. I'm thinking something like...

func New(...) *controller.Impl {
	return validation.NewAdmissionController(ctx,
		"webhook.trustedtasks.tekton.dev",
		"/resource-validation",
		
		// List the types to validate, this from knative.dev/sample-controller
		map[schema.GroupVersionKind]resourcesemantics.GenericCRD{
			v1beta1.SchemeGroupVersion.WithKind("TaskRun"): &trustedTaskRun{},
		},

		nil,
		true,
	)
}

type trustedTaskRun = v1beta.TaskRun

func (*trustedTaskRun) Validate(ctx context.Context) {
	// validate here
	...
}

func (*trustedTaskRun) SetDefaults(ctx context.Context) {
	// Purposely do nothing
}

I think this should do what we want, but still let us rely on the knative controller code for cert management and webhook setup. We could also look into the callbacks field if handlers doesn't quite fit what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we're still stuck we can always reach out to the Knative folks on slack. 😃

Copy link
Member Author

@Yongxuanzhang Yongxuanzhang Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this looks interesting, but when we call the validate in webhook, how do we make sure it is calling this Validate not the one we have in v1beta.TaskRun?

and this code will get
invalid receiver type *"github.com/tektoncd/pipeline/pkg/apis/pipeline/v1beta1".TaskRun (type not defined in this package)

Do you mean something like this? (I tried to correct some syntax error but not sure if this is what we want)

type trustedTaskRun struct{
	v1beta1.TaskRun
}
func (tr *trustedTaskRun) Validate(ctx context.Context) (errs *apis.FieldError){
	// validate here
	fmt.Println("!!!! validate new")
	return nil
}

func (tr *trustedTaskRun) SetDefaults(ctx context.Context) {
	// do nothing
	fmt.Println("!!!! set defaults new")
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline/trusted-resources/pkg/webhook/validation_admit.go Outdated Show resolved Hide resolved
var verifier sigstoresignature.Verifier
if tr.ObjectMeta.Annotations[kmsKey] != "" {
// Fetch key from kms
verifier, err = kms.Get(ctx, tr.ObjectMeta.Annotations[kmsKey], crypto.SHA256)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this will always be SHA256?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 115 to 120
// Fetch public key from annotation
cosignPublicKeypath := filepath.Join(tr.ObjectMeta.Annotations[publicKey], "cosign.pub")
verifier, err = cosignsignature.LoadPublicKey(ctx, cosignPublicKeypath)
if err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A local path on a remote resource is problematic, since this depends on the remote state to match what was done locally at signing.

This might make more sense if this was a reference to a secret?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored this a little bit

Comment on lines 288 to 520
tmpPrivFile, err := os.Create(filepath.Join(tmpDir, "cosign.key"))
if err != nil {
t.Fatalf("failed to create temp key file: %v", err)
}
defer tmpPrivFile.Close()
tmpPubFile, err := os.Create(filepath.Join(tmpDir, "cosign.pub"))
if err != nil {
t.Fatalf("failed to create temp pub file: %v", err)
}
defer tmpPubFile.Close()

// Generate a valid keypair.
keys, err := cosign.GenerateKeyPair(pf)
if err != nil {
t.Fatalf("failed to generate keypair: %v", err)
}

if _, err := tmpPrivFile.Write(keys.PrivateBytes); err != nil {
t.Fatalf("failed to write key file: %v", err)
}
if _, err := tmpPubFile.Write(keys.PublicBytes); err != nil {
t.Fatalf("failed to write pub file: %v", err)
}
return tmpPrivFile.Name(), tmpPubFile.Name()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have 2 versions of this - one that's completely in memory and one that writes out the file. For most of the verify funcs, we can actually just generate the keys in memory and pass back a SignerVerifier. It's only for testing the e2e flows when reading from a file that we need to write this out.

// Validate here
func (tr *TrustedTaskRun) Validate(ctx context.Context) (errs *apis.FieldError) {
k8sclient := kubeclient.Get(ctx)
//tektonClient := tkclient.Get(ctx)
Copy link
Member Author

@Yongxuanzhang Yongxuanzhang Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this doesn't work, the error msg is
"Unable to fetch github.com/tektoncd/pipeline/pkg/client/resource/clientset/versioned.Interface from context. This context is not the application context (which is typically given to constructors via sharedmain).")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client injection needs to be done somewhere - typically this would be done by the sharedmain func in the controller, but in the unit test you need to do this yourself.

Luckily, knative client-gen gives us an easy solution for this! If you use this package instead of the normal fake client, it should work. https://pkg.go.dev/github.com/tektoncd/pipeline@v0.33.1/pkg/client/injection/client/fake

@Yongxuanzhang
Copy link
Member Author

/assign @wlynch

pipeline/trusted-resources/pkg/trustedtask/doc.go Outdated Show resolved Hide resolved
// Validate here
func (tr *TrustedTaskRun) Validate(ctx context.Context) (errs *apis.FieldError) {
k8sclient := kubeclient.Get(ctx)
//tektonClient := tkclient.Get(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client injection needs to be done somewhere - typically this would be done by the sharedmain func in the controller, but in the unit test you need to do this yourself.

Luckily, knative client-gen gives us an easy solution for this! If you use this package instead of the normal fake client, it should work. https://pkg.go.dev/github.com/tektoncd/pipeline@v0.33.1/pkg/client/injection/client/fake

@wlynch
Copy link
Member

wlynch commented Feb 28, 2022

Also make sure to add a line here so that this can be included in presubmit tests -

projects="cel commit-status-tracker generators helm hub oci pipeline/cleanup pipelines-in-pipelines pipeline-to-taskrun tekdoc task-loops notifiers/github-app cloudevents"

Copy link
Member

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Few more small comments.

pipeline/trusted-resources/OWNERS Outdated Show resolved Hide resolved
tektonClient versioned.Interface,
) (errs *apis.FieldError) {
logger := logging.FromContext(ctx)
logger.Info("Verifying Resources")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rule of thumb is - "would this be useful for a random person reading this in the logs with little/no context, or would this just be noise?"

Sometimes it means adding some more details / identifiers to be able to see what is going on with multiple requests.
Sometimes it's redundant and it's easiest to just remove them.

No wrong answer here, as long as we're being intentional about it! Use your best judgement.

bundle string,
verifier signature.Verifier,
signatureDecoding []byte,
k8sclient kubernetes.Interface,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking was we don't actually need the k8sclient - it's just a means to create the k8schain which eventually becomes an Option, so we can just plumb it all the way through.

It's not that big of a deal though. We can ignore this for now.

@Yongxuanzhang Yongxuanzhang force-pushed the trusted-resources branch 2 times, most recently from cec049f to 36487b6 Compare March 1, 2022 17:20
@Yongxuanzhang Yongxuanzhang changed the title [Proof of Concept] Trusted resources verification [TrustTask] Trusted resources verification Mar 1, 2022
@Yongxuanzhang Yongxuanzhang changed the title [TrustTask] Trusted resources verification [TrustTask] Trusted taskrun verification Mar 2, 2022
Copy link
Member

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Please update the commit message / PR description to describe the changes we're making in this PR - see https://github.com/tektoncd/community/blob/main/standards.md#commits

Comment on lines 10 to 32
## Install

Install and configure [`ko`](https://github.com/google/ko).

Install tekton pipeline. To install from source, checkout to pipeline repo and execute:
```bash
ko apply -f config/
```

Then install the new admission webhook:
```bash
kubectl create namespace trusted-task

# cosign generate-key-pair k8s://tekton-pipelines/signing-secrets
cosign generate-key-pair
kubectl create secret generic signing-secrets \
--from-file=cosign.key=./cosign.key \
--from-literal=cosign.password='1234'\
--from-file=cosign.pub=./cosign.pub \
-n trusted-task

ko apply -f config/
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README needs to be updated, since the config files don't actually exist yet.

if err != nil {
t.Fatalf("Unexpected err %v", err)
}
signed.Annotations["tekton.dev/signature"] = signTaskSpec(t, signer, &ts.Spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay tests! This makes it much easier to see what is going on.

I'm okay with this as a first pass, but we'll want to address this in another PR.

As is, this is setting the signature of the TaskRun as the signature of the underlying Task - this means that you can actually tamper the TaskRun and still pass validation if you do something like...

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: read-repo-run
spec:
  taskRef:
    name: foo
  params:
-    bar: asdf
+    bar: <bad modified thing>

Since the signature is only capturing the TaskRef Task and not the entire TaskRun spec. This is also true for OCI and TaskSpec variants.
This would make sense as an annotation for the Task itself though!

Copy link
Member Author

@Yongxuanzhang Yongxuanzhang Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, so we should not only verify the embedded tasks but also the taskrun.spec.

@Yongxuanzhang Yongxuanzhang changed the title [TrustTask] Trusted taskrun verification [TrustTask] Trusted taskspec verification Mar 2, 2022
Copy link
Member

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! One last thing - thanks for updating the PR message, but it looks like the commit message body is still empty. Can you add the commit description there as well? Thanks!

Copy link
Member

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 2, 2022
This commit is an initial step of work for Trust Task.(Forked from tektoncd/community#537)

Prior this commit we cannot verify if the taskrun is manipulated or not before applying to the cluster.

This commit leverages cosign to do verification on taskspec. Admission webhook and verification on taskrun.spec will be introduced in following PRs. With these work we can deploy a standalone webhook for verification without changing current Tekton Pipeline APIs.
@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 2, 2022
@tekton-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wlynch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2022
Copy link
Member

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 2, 2022
@tekton-robot tekton-robot merged commit d5280c6 into tektoncd:main Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants