Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize cosign's dependency tree #1462

Open
imjasonh opened this issue Feb 15, 2022 · 22 comments
Open

Sanitize cosign's dependency tree #1462

imjasonh opened this issue Feb 15, 2022 · 22 comments
Labels
enhancement New feature or request

Comments

@imjasonh
Copy link
Member

Problem

Cosign's module dependencies are pretty heavy.

Its go.mod currently transitively depends on k8s.io/client-go, the AWS SDK, the Azure SDK, a GCP cloud storage API client, Docker internals, fsnotify, OpenTelemetry, a Kubernetes controller framework, a GitHub API client, cuelang, and many more.

This can make it difficult to consume for clients that don't want all that functionality, and may just want to sign some bytes using a key, push the resulting image to an OCI registry, and upload data to Rekor with certs via Fulcio.

For example, sget, which does relatively little compared to cosign -- it verifies signatures on an image and fetches a blob -- has a go.mod file that's 280 lines, producing a 72 MB binary 🏋️ .

Large dependency trees lead to larger built binaries (see above), larger attack surfaces, longer and more fragile builds, and (most interesting to me personally as a maintainer!) more toil for maintainers to update dependencies and sift through security alerts for those vulnerabilities.

Possible Solutions

  • Moving more dependencies into sigstore/sigstore (consider moving core sign & verify functions from cosign to sigstore sigstore#81), and taking that opportunity to reconsider if any dependencies can be replaced by lighter-weight alternatives
  • Taking better advantage of Go interfaces, especially for KMS services, and moving implementations of those interfaces into separate Go modules or separate repositories
  • Splitting the cosigned webhook into a separate Go module from the cosign CLI, possibly even moving it to a separate repo

The cosign CLI would likely not get any smaller as a result of this work -- it still naturally depends on all the KMSes, K8s, cloud SDKs -- but other clients that want to do cosign-like things (signing, and especially verifying) wouldn't need to.

cc @luhring @jonjohnsonjr

@imjasonh imjasonh added the enhancement New feature or request label Feb 15, 2022
@dlorenc
Copy link
Member

dlorenc commented Feb 15, 2022

  • Splitting the cosigned webhook into a separate Go module from the cosign CLI, possibly even moving it to a separate repo

I don't think this will really help, but I might be misunderstanding the exact issue. The k8s libraries shouldn't show up in a downstream binary or dependency tree unless they're actually used.

@imjasonh
Copy link
Member Author

I don't think this will really help, but I might be misunderstanding the exact issue. The k8s libraries shouldn't show up in a downstream binary or dependency tree unless they're actually used.

I mentioned this since the cosign CLI depends on k8s.io/client-go just to get/create/update secrets, when it could shell out to kubectl instead (introducing a different set of problems, understandibly). But that wouldn't help anyway, because the Go module it shares with the cosigned webhook necessarily and correctly depends on knative/pkg, k8s.io/client-go, etc.

Go might be smart enough to trim that dependency from the binary, but it still incurs maintenance cost for cosign CLI and SDK maintainers who shouldn't care about K8s' own large dependency graph, security alerts, dependabots, etc.

@luhring
Copy link
Contributor

luhring commented Feb 15, 2022

it still incurs maintenance cost for cosign CLI and SDK maintainers who shouldn't care about K8s' own large dependency graph, security alerts, dependabots, etc.

💯 We're working on integrating some Cosign functionality into Syft, and it looks like we need k8s.io/api via Cosign:

$ go mod why -m k8s.io/api
# k8s.io/api
github.com/anchore/syft/cmd
github.com/sigstore/cosign/cmd/cosign/cli/sign
github.com/sigstore/cosign/pkg/signature
github.com/sigstore/cosign/pkg/cosign/kubernetes
k8s.io/api/core/v1

(from Syft at ec1cceb)

It could be that there's a better way to structure the code to avoid these dependencies, but it's not clear to me how. I think if the "package graph" extends into a given module, that module is needed during build.

@dlorenc
Copy link
Member

dlorenc commented Feb 15, 2022

💯 We're working on integrating some Cosign functionality into Syft, and it looks like we need k8s.io/api via Cosign:

This is correct - today cosign does depend on k8s/api directly for the generate-key-pair command which can create a secret in k8s. Moving this to use a smaller library or shell out to kubectl would solve that problem.

It would not remove k8s from the go.mod file though, because the cosigned webhook still imports that stuff.

I think there are a few different things we're trying to optimize for here:

  • As a user of cosign, I want the binary to be small
  • As a library author that imports cosign code, I want fine-grained control over what packages get pulled into my dependency tree
  • As a cosign maintainer, I want a small go.mod file

I don't think moving the cosigned webhook somewhere else helps with any of these, it just shifts the problem around and means there's a separate repo and go.mod file maintainers need to keep up to date.

@imjasonh
Copy link
Member Author

I don't think moving the cosigned webhook somewhere else helps with any of these, it just shifts the problem around and means there's a separate repo and go.mod file maintainers need to keep up to date.

It helps me as library author and cosign maintainer, since it shifts the problem somewhere where I'm less (or not) involved. It does make dependency maintenance slightly more work for cosigned maintainers, but there are always going to be many fewer of those than there will be consumers of a cosign SDK.

I imagine the following "hierarchy of needs" for consumers of cosign logic:

  • sign some bytes, get some bytes (sigstore/sigstore)
  • sign some bytes, get a container image I can push (sigstore? cosign SDK?)
  • sign some bytes using a key in KMS, get a container image (cosign SDK? w/ pluggable KMS impls?)
  • verify some bytes using a key (sigstore/sigstore)
  • verify a container image's signatures using a key in KMS (cosign SDK?)
  • verify a container image's signatures using a key in KMS, in kubernetes (cosigned)

Today, if I want to do anything with container images, my go.mod picks up deps to serve all of the above.

@dlorenc
Copy link
Member

dlorenc commented Feb 15, 2022

Today, if I want to do anything with container images, my go.mod picks up deps to serve all of the above.

I don't think this part is necessarily true, but that might be what I'm missing. A downstream client should be able to import code from cosign without introducing all the dependencies to their own go.mod.

@imjasonh
Copy link
Member Author

For KMSes it might be as simple as moving KMS provider registration out of pkg/signature/kms, into each impl's package, so you can selectively depend on only those KMSes you actually want, and depend on none of them if you want none of them:

https://github.com/sigstore/sigstore/blob/v1.1.0/pkg/signature/kms/kms.go#L31-L44

@sambhav
Copy link
Contributor

sambhav commented Feb 15, 2022

Also related #651

@luhring
Copy link
Contributor

luhring commented Feb 16, 2022

Today, if I want to do anything with container images, my go.mod picks up deps to serve all of the above.

I don't think this part is necessarily true, but that might be what I'm missing. A downstream client should be able to import code from cosign without introducing all the dependencies to their own go.mod.

I think this is right, in that this doesn't have to be true, but I think we're seeing that it happens to be true today with the current code. (Maybe not "all" deps, but more deps than are actually needed by the downstream consumer.)

Taking better advantage of Go interfaces, especially for KMS services, and moving implementations of those interfaces into separate Go modules or separate repositories

This might be worth exploring first, since it's presumably less drastic of a change than introducing a new repo / Go module.

My understanding of Go's build approach is that it looks at the call graph to determine which packages are needed, and can then determine which modules are needed.

@imjasonh's example is really interesting. init() is tricky because it can effectively start its own call graph, which means potentially traversing more packages and modules, and potentially without the consumer having any practical use for that additional code.

Also, this might be overly pedantic, but what I'm looking to minimize (as a lib consumer) isn't go.mod entries in my project, but the number of modules in the "build list" in my project's binary (e.g. Syft).

@dlorenc
Copy link
Member

dlorenc commented Feb 16, 2022

Also, this might be overly pedantic, but what I'm looking to minimize (as a lib consumer) isn't go.mod entries in my project, but the number of modules in the "build list" in my project's binary (e.g. Syft).

Bingo! This is what we can control without needing to split things up across repos.

@vrothberg
Copy link

Shall we set up a dedicated meeting for this topic? I think we can start with getting the very low-level bits done (i.e., signing/verifying bytes + metadata) but I have no vision on how the code should be structured into smaller Go packages (to prune the heavier dependencies).

Cc: @mtrmac

@imjasonh
Copy link
Member Author

I 👍 ed this but I realize it doesn't notify anybody when you do that. 🤦‍♂️

So, 👍, we should meet some day next week (I'm out Mon+Tues) to map out specific packages/APIs that we'd want to refactor/move/etc.

@lukehinds
Copy link
Member

Just looping back on this issue as cosigned is now going to move into its own repo and we have the following issues which @lkatalin has expressed an interest in working on:

I can create a sigstore community calendar invite, how should we set a time? @vrothberg / @imjasonh et al?

@vrothberg
Copy link

Thank you for the ping. I'd join the meeting ✔️

@lukehinds
Copy link
Member

How would next Wednesday work for 11am EST?

@imjasonh
Copy link
Member Author

That works for me!

@vrothberg
Copy link

Works for me as well ✔️

@lukehinds
Copy link
Member

@imjasonh @vrothberg done, you will see this on the community calendar. access to calendar is here:

https://github.com/sigstore/community#community-meetings

I may not be able to make it, only just noticed I have a webinar / talk thing going on. Sometimes hangouts (even though its on the community calendar) inherits Red Hat corp google rights, but we have @vrothberg who should be able to open the meeting and admit others.

@jeff-mccoy
Copy link

Just finding this issue from twitter. Quick comment for https://github.com/defenseunicorns/zarf as we include both cosign and syft as sdks, the dependency tree is large—but I’d be more concerned with Cosign just falling back to kubectl for us as we can’t guarantee that binary is available when our tools runs.

@imjasonh
Copy link
Member Author

Just finding this issue from twitter. Quick comment for https://github.com/defenseunicorns/zarf as we include both cosign and syft as sdks, the dependency tree is large—but I’d be more concerned with Cosign just falling back to kubectl for us as we can’t guarantee that binary is available when our tools runs.

I assume that's responding to #1462 (comment):

I mentioned this since the cosign CLI depends on k8s.io/client-go just to get/create/update secrets, when it could shell out to kubectl instead (introducing a different set of problems, understandibly). But that wouldn't help anyway, because the Go module it shares with the cosigned webhook necessarily and correctly depends on knative/pkg, k8s.io/client-go, etc.

I agree I don't think shelling out to kubectl will be a good approach, for basically the reasons you describe.

It might be worth exploring replacing the k8s client used to get/create/update secrets with standard HTTP calls, but even that will probably be more headache than it's worth since we'll also want to find and parse kubeconfigs to manage auth to the cluster in question. I think we're stuck with our k8s dependency for good.

@fmoessbauer
Copy link

The current cosign binary is over 100MB in size, making it unsuitable for embedded use-cases (discussed in #2989 as well). Also from a security perspective it is not desired to have a huge dependency tree. While the sigstore-go implementation is a bit smaller (60M) and can only perform validation, it is no real solution to the problem:

  • Too many dependencies
  • Too big binary
  • Not scalable: each additional KMS backend will add further dependencies
  • Hard to get CVE-free, as all components likely need to be updated in a lockstep
  • All components are always included due to the statically linking nature of go (xz-utils attack vector, oh dear...)
  • Hard / impossible to be packaged for distros that require in-distro dependency trees (like Debian)

Instead, it would be much better to have a minimal CLI interface that dynamically loads exactly the components that are needed for a particular operation. This would significantly reduce the attack surface, as well as making it possible to only include the components that are needed for a particular operation.

AFAIK the only go solution to that problem is the plugin mechanism which has major limitations. Are there any plans to implement lightweight clients in C or RUST (but modularized, not statically linked)?

@haydentherapper
Copy link
Contributor

We aim to make sigstore-go smaller (sigstore/sigstore-go#23), though there's tradeoffs between reimplementation of dependencies, maintenance and library size that we have to consider. We'll also be adding signing soon to sigstore-go (sigstore/sigstore-go#136), without pulling in KMS dependencies.

There is an implementation of the sigstore signing and verification logic in Rust in sigstore/sigstore-rs, though it has not yet reached a 1.0 release. I'd love to see an example of calling into this library through the FFI - this is on Sigstore's community roadmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants