Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image Tag Mutating Webhook Causes Existing Deployments to go in Creation Loop #1105

Closed
SUSTAPLE117 opened this issue Nov 22, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@SUSTAPLE117
Copy link

SUSTAPLE117 commented Nov 22, 2023

Description
When tagging the namespace our apps(deployed with a tag) are deployed in to validate our image signatures we encounter an issue.

We are experiencing a cascade of ReplicaSet and pod creation in our Kubernetes cluster, which we believe is triggered by the mutating webhook designed to convert image tags to digests while validating container image signatures.

The webhook seems to be causing an unexpected behavior: when new pods are created (e.g., by the HPA), their image tags are replaced with digests, leading to a mismatch between the actual pod specifications and the expected specifications in the Deployment's template.

Consequently, the deployment controller attempts to rectify this perceived discrepancy by creating new ReplicaSets. However, it seems something is not correct in the way the change is made and get a bunch of repeated 'already exists' errors and a loop of ReplicaSet creation and deletion, overwhelming the cluster.

Example API logs we see:

Found a hash collision for deployment "findings-processor" - bumping collisionCount (4573->4574) to resolve it

"Error syncing deployment" deployment="default/findings-processor" err="replicasets.apps "findings-processor-888599648" already exists"
I1120 16:33:25.380980 12 deployment_controller.go:490] "Error syncing deployment" deployment="default/findings-processor" err="replicasets.apps "findings-processor-888599648" already exists"

Here is the policy-controller logs: policy-controller_logs.json

Version
Kubernetes: 1.24 on EKS
policy-controller: 0.8.2

@SUSTAPLE117 SUSTAPLE117 added the bug Something isn't working label Nov 22, 2023
@vaikas
Copy link
Collaborator

vaikas commented Nov 24, 2023

Looking at the logs, it seems like the Deployments are not getting patched (there are no patchbytes, it gets called however), or already has been patched (and therefore, there's no need to patch again). ReplicaSet / Pods are getting patched though. I'm confused why when the Deployment gets created, it does not get patched (or is not seen in the logs). If the Deployment gets patched correctly, then the RS (and therefore pods) should get the digest and not the tag. Can you verify that the Deployment has been correctly patched?

@SUSTAPLE117
Copy link
Author

@vaikas This is existing deployments that exist in the namespace we tag with policy.sigstore.dev/include=true. These cascading creations happen pretty much immediately because of scaling up/down of deployments are constantly happening. Is there any guidance on how to deploy this to existing deployments? We would have to manually patch all deployments to trigger the webhook?

@hectorj2f
Copy link
Collaborator

@SUSTAPLE117 I'd recommend to deploy the policy-controller and then label the target namespaces before deploying any resources on them. By the way there is a feature to match certain resources or resources with labels https://docs.sigstore.dev/policy-controller/overview/#policies-matching-specific-resource-types-and-labels.

@SUSTAPLE117
Copy link
Author

@hectorj2f that makes a lot of sense. Thank you!

@SUSTAPLE117
Copy link
Author

@hectorj2f Hi I came back to this and used the match feature to only target the deployments as follow:

spec:
  authorities:
    ...
  match:
    - group: apps
      resource: deployments
      version: v1
  mode: warn

However the MutatingWebhook doesn't respect the match and still patches ReplicaSets and Pods to translate the image tag to digest. Is that the expected behavior? If so it's not easy to do gradual rollouts of PolicyController in an existing env. unless I'm missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants