Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarf image mutation happens twice in some scenarios #1988

Closed
mjnagel opened this issue Aug 25, 2023 · 3 comments · Fixed by #1989
Closed

Zarf image mutation happens twice in some scenarios #1988

mjnagel opened this issue Aug 25, 2023 · 3 comments · Fixed by #1989
Assignees

Comments

@mjnagel
Copy link
Contributor

mjnagel commented Aug 25, 2023

Environment

Device and OS: macOS, m1 mac, deployment on Ubuntu 20
App version: 0.29.0
Kubernetes distro being used: RKE2 1.26

Steps to reproduce

  1. Create a deployment or job that already has a mutated image (a "127.0.0.1:31999/something:1.2.3-zarf-xxxxx" image rather than "docker.io/something:1.2.3"). This is probably difficult to do in practice 🙃
  2. Validate that the image is "double mutated" by the zarf agent. The pod created will end up having a double "zarf-xxxx" tag (with two separate hashes based on the different starting images). The image will also fail to pull as expected due to the wrong tag.

It might be easier to try and reproduce with this package which is where I experienced the issue. More details on that in the additional context section below.

Expected result

Zarf skips over images it has already mutated and not mutate their tags again.

Actual Result

If a "parent resource" of a pod (deployment, job, etc) has an already mutated image specified then the pod will pass through zarf's mutation and end up with a doubled zarf-hash tag mutation.

Visual Proof (screenshots, videos, text, etc)

Logs from zarf agent showing a patch with 2 zarf-hash tags:

  DEBUG   2023-08-25T18:33:17Z  -  PATCH:  [{"op":"replace","path":"/spec/imagePullSecrets","value":[{"name":"private-registry"}]},{"op":"replace","path":"/spec/initContainers/0/image","value":"127.0.0.1:31999/ironbank/rook/ceph:v1.12.2-zarf-3113775139-zarf-1434334155"},{"op":"replace","path":"/spec/containers/0/image","value":"127.0.0.1:31999/ironbank/opensource/ceph/ceph:v17.2.6-zarf-2774537164"},{"op":"replace","path":"/metadata/labels/zarf-agent","value":"patched"}]

Severity/Priority

Low-medium. This is a weird edge case but could be encountered without any workarounds in some scenarios.

Additional Context

This is probably a really wild edge case. The scenario I ran into this with was a zarf package deploying the rook operator. The operator will spin up several jobs based on the image it is running (which happens to be an already mutated pod in zarf's registry). The resulting pod for that job then experiences the behavior described above.

This package is definitely WIP but if you want to experience the error you should be able to build and deploy - https://github.com/defenseunicorns/uds-package-rook-ceph/blob/docs-values-updates/zarf.yaml

@cmwylie19 cmwylie19 self-assigned this Aug 25, 2023
Racer159 added a commit that referenced this issue Aug 28, 2023
… self-reference) (#1989)

## Description

Checks if image.Host has already been patched

## Related Issue

Fixes #1988
<!-- or -->
Relates to #

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor Guide
Steps](https://github.com/defenseunicorns/zarf/blob/main/CONTRIBUTING.md#developer-workflow)
followed

---------

Signed-off-by: Case Wylie <cmwylie19@defenseunicorns.com>
Co-authored-by: Wayne Starr <Racer159@users.noreply.github.com>
@pat-earl
Copy link

pat-earl commented Oct 6, 2023

Hello. Sorry to comment on a closed issue, but I'm coming across the same potential bug.

Environment

Device and OS: VMWare ESXi VM running RHEL 8.8
App Version: 0.30.0
Kubernetes Distro: RKE2 v1.26.9

Using the same package referenced from above (accepting it's a WIP), I'm running into an issue with a ceph job getting the double zarf-hash tagging. The log output looks similar to the one posted above. However, I'm using an external registry.

  DEBUG   2023-10-06T20:23:38Z  -  PATCH:  [{"op":"replace","path":"/spec/imagePullSecrets","value":[{"name":"private-registry"}]},{"op":"replace","path":"/spec/initContainers/0/image","value":"192.168.4.5:31999/ironbank/rook/ceph:v1.12.4-zarf-3113775139-zarf-32993765"},{"op":"replace","path":"/spec/containers/0/image","value":"192.168.4.5:31999/ironbank/opensource/ceph/ceph-csi:v3.9.0-zarf-3897253400"},{"op":"replace","path":"/metadata/labels/zarf-agent","value":"patched"}]

@mjnagel
Copy link
Contributor Author

mjnagel commented Oct 6, 2023

@pat-earl looks like the CI for rook/ceph on the zarf update branch here passed so I suspect this is an issue specific to external registries? Might be worth opening a new issue for visibility into this (and referencing this one for context).

Curious if this was working for you on a previous zarf or just hasn't worked in general?

@pat-earl
Copy link

pat-earl commented Oct 7, 2023

First time working with zarf. I was originally on v0.29.0 before I saw the pre-req of v0.29.1 and rebuilt the cluster and upgraded zarf related stuff to v0.30.0. I'm going to setup a separate RKE2 environment and try using an internal registry.

I'll open a new issue after some further testing. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants