-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serverSideDiff error when removing block from spec.containers[] in Helm chart #20792
Comments
Can you share the helm charts, please? In particular, container spec. |
@andrii-korotkov-verkada Yes, of course. You can find the chart here: There are also corresponding Git tags for |
When this happens, can you looks at the current vs desired manifest, please? |
@andrii-korotkov-verkada The diff seems to be in a bit of a "confused" state from my viewing. See screenshots below.
I hope that helps. Let me know what additional information I can provide. |
Can you copy-paste whole manifests, please? Sorry for bugging you, it's just I'm looking for things out of line and need full manifests to check that. |
@andrii-korotkov-verkada No problem at all! Here's the Live Manifest (managed fields unhidden): apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: '3'
creationTimestamp: '2024-11-13T17:14:51Z'
generation: 3
labels:
app.kubernetes.io/instance: test-app
argocd.argoproj.io/instance: test-app
managedFields:
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
f:app.kubernetes.io/instance: {}
f:argocd.argoproj.io/instance: {}
f:spec:
f:minReadySeconds: {}
f:progressDeadlineSeconds: {}
f:replicas: {}
f:revisionHistoryLimit: {}
f:selector: {}
f:template:
f:metadata:
f:labels:
f:app.kubernetes.io/instance: {}
f:spec:
f:containers:
k:{"name":"http-echo"}:
.: {}
f:env:
k:{"name":"PORT"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"VERSION"}:
.: {}
f:name: {}
f:value: {}
f:image: {}
f:imagePullPolicy: {}
f:livenessProbe:
f:httpGet:
f:path: {}
f:port: {}
f:name: {}
f:ports:
k:{"containerPort":5678,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:name: {}
f:protocol: {}
f:readinessProbe:
f:httpGet:
f:path: {}
f:port: {}
f:resources:
f:requests:
f:cpu: {}
f:memory: {}
f:securityContext:
f:allowPrivilegeEscalation: {}
f:capabilities:
f:drop: {}
f:privileged: {}
f:startupProbe:
f:httpGet:
f:path: {}
f:port: {}
f:hostIPC: {}
f:hostNetwork: {}
f:hostPID: {}
f:securityContext:
f:fsGroup: {}
f:runAsGroup: {}
f:runAsNonRoot: {}
f:runAsUser: {}
f:seccompProfile:
f:type: {}
f:terminationGracePeriodSeconds: {}
manager: argocd-controller
operation: Apply
time: '2024-11-14T14:43:51Z'
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:deployment.kubernetes.io/revision: {}
f:status:
f:availableReplicas: {}
f:conditions:
.: {}
k:{"type":"Available"}:
.: {}
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"Progressing"}:
.: {}
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
f:observedGeneration: {}
f:readyReplicas: {}
f:replicas: {}
f:updatedReplicas: {}
manager: kube-controller-manager
operation: Update
subresource: status
time: '2024-11-14T14:44:34Z'
name: test-app
namespace: sandbox-aaron
resourceVersion: '200923886'
uid: 7b7349f2-9000-4fe3-a443-9eb4e1a1a659
spec:
minReadySeconds: 10
progressDeadlineSeconds: 300
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: test-app
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: test-app
spec:
containers:
- env:
- name: VERSION
value: 0.0.1
- name: PORT
value: '5678'
image: hashicorp/http-echo
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: 5678
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: http-echo
ports:
- containerPort: 5678
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 5678
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 100m
memory: 32Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
startupProbe:
failureThreshold: 3
httpGet:
path: /
port: 5678
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
terminationGracePeriodSeconds: 80
status:
availableReplicas: 2
conditions:
- lastTransitionTime: '2024-11-14T06:03:11Z'
lastUpdateTime: '2024-11-14T06:03:11Z'
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: 'True'
type: Available
- lastTransitionTime: '2024-11-13T17:14:51Z'
lastUpdateTime: '2024-11-14T14:44:34Z'
message: ReplicaSet "test-app-74dfc69c76" has successfully progressed.
reason: NewReplicaSetAvailable
status: 'True'
type: Progressing
observedGeneration: 3
readyReplicas: 2
replicas: 2
updatedReplicas: 2 Desired Manifest: apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/instance: test-app
argocd.argoproj.io/instance: test-app
name: test-app
namespace: sandbox-aaron
spec:
minReadySeconds: 10
progressDeadlineSeconds: 300
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: test-app
template:
metadata:
labels:
app.kubernetes.io/instance: test-app
spec:
containers:
- env:
- name: VERSION
value: 0.0.2
- name: PORT
value: '5678'
image: hashicorp/http-echo
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /
port: 5678
name: http-echo
ports:
- containerPort: 5678
name: http
protocol: TCP
readinessProbe:
httpGet:
path: /
port: 5678
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
startupProbe:
httpGet:
path: /
port: 5678
hostIPC: false
hostNetwork: false
hostPID: false
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
terminationGracePeriodSeconds: 80 Let me know if you'd like me to re-post with the managed fields hidden, or anything else. Thanks! |
Hm, I don't see anything obviously wrong at the moment. |
Indeed. Notably, setting Are you able to reproduce on your side? I believe the instructions and charts I provided should be enough to do so, but please advise if I can provide anything else. |
One more thing - do you have mutating webhooks setup in the cluster? |
We have three MutatingWebhooks:
As I understand it, all are a part of Amazon EKS and AWS Load Balancer Controller. |
Hi @andrii-korotkov-verkada. I have some additional information which should help you in finding the root cause of this. I use Argo Rollouts for most of my applications (thus I use Out of curiosity, I decided to see if the same error would trigger when using Please see my latest chart versions:
See the code for 0.3.1 and 0.3.2 here. I had to add two very basic The chart artifacts are built and uploaded as before, so you can simply update I hope this helps. Thanks - Aaron |
I can see that various probes are being modified as well, like readiness probe, i.e. live manifest has extra fields comparing to the target one. Also, Do you know what adds those? |
@andrii-korotkov-verkada The probe changes are coming from Kubernetes defaults: i.e. the default The same for |
I see. At this point, I don't know what can be the issue. I'd suggest trying to get a minimal repro, i.e. remove fields from the spec until it starts working normally. |
@andrii-korotkov-verkada I'm not sure what else I can remove here - I feel the chart I provided (a single Our real applications are significantly more complex, so I already trimmed down as much as I could, going for this "single Deployment echo-server" example. One thing that I know fixes the issue is setting But it's unclear to me why this fixes the issue. From a pure and simple perspective, why would removing a field from a I note many similarities with this open issue: #17568. The reporter is seeing the same error, word-for-word. The error itself doesn't even seem to make sense (it's complaining that Thanks for your help! |
I see. I'll give a shot to reading the code for removing webhook mutations and try to find bugs there. I'll get back with results of my investigation. |
Thanks @andrii-korotkov-verkada - hugely appreciated. As always, let me know if there's any additional information I can provide. |
I'll try to upgrade the dependency lib to see if that helps argoproj/gitops-engine#637. |
Fixes argoproj#20792 Use a new version of gitops engine with a fix. This needs to be cherry-picked in v2.11-2.13. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
Helps with argoproj/argo-cd#20792 Removed and modified sets may only contain the fields that changed, not including key fields like "name". This can cause merge to fail, since it expects those fields to be present if they are present in the predicted live. Fortunately, we can inspect the set and derive the key fields necessary. Then they can be added to the set and used during a merge. Also, have a new test which fails before the fix, but passes now. Failure of the new test before the fix ``` Error: Received unexpected error: error removing non config mutations for resource Deployment/nginx-deployment: error reverting webhook removed fields in predicted live resource: .spec.template.spec.containers: element 0: associative list with keys has an element that omits key field "name" (and doesn't have default value) Test: TestServerSideDiff/will_test_removing_some_field_with_undoing_changes_done_by_webhook ``` Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
In theory, structured merge diff library should be able to merge using key values from paths. I've opened kubernetes-sigs/structured-merge-diff#273 to follow-up with them. |
I've also posted on Kubernetes Slack in the respective channel and tagged whom seems to be the maintainer based on commits. |
Helps with argoproj/argo-cd#20792 Removed and modified sets may only contain the fields that changed, not including key fields like "name". This can cause merge to fail, since it expects those fields to be present if they are present in the predicted live. Fortunately, we can inspect the set and derive the key fields necessary. Then they can be added to the set and used during a merge. Also, have a new test which fails before the fix, but passes now. Failure of the new test before the fix ``` Error: Received unexpected error: error removing non config mutations for resource Deployment/nginx-deployment: error reverting webhook removed fields in predicted live resource: .spec.template.spec.containers: element 0: associative list with keys has an element that omits key field "name" (and doesn't have default value) Test: TestServerSideDiff/will_test_removing_some_field_with_undoing_changes_done_by_webhook ``` Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
Hi @andrii-korotkov-verkada. Thanks for the discussion on today's contributor call. Am I correct in my understanding that you are waiting on further discussion with the Thanks again. |
We are waiting for them, yes. There's a volunteer to look into this at the moment. I'm not sure a workaround would get merged unless they really push back hard. |
@thecosmicfrog @andrii-korotkov-verkada Structured-merge-diff PR with the fix is now merged: |
Helps with argoproj/argo-cd#20792 Removed and modified sets may only contain the fields that changed, not including key fields like "name". This can cause merge to fail, since it expects those fields to be present if they are present in the predicted live. Fortunately, we can inspect the set and derive the key fields necessary. Then they can be added to the set and used during a merge. Also, have a new test which fails before the fix, but passes now. Failure of the new test before the fix ``` Error: Received unexpected error: error removing non config mutations for resource Deployment/nginx-deployment: error reverting webhook removed fields in predicted live resource: .spec.template.spec.containers: element 0: associative list with keys has an element that omits key field "name" (and doesn't have default value) Test: TestServerSideDiff/will_test_removing_some_field_with_undoing_changes_done_by_webhook ``` Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
* fix: Server side diff now works correctly with some fields removal Helps with argoproj/argo-cd#20792 Removed and modified sets may only contain the fields that changed, not including key fields like "name". This can cause merge to fail, since it expects those fields to be present if they are present in the predicted live. Fortunately, we can inspect the set and derive the key fields necessary. Then they can be added to the set and used during a merge. Also, have a new test which fails before the fix, but passes now. Failure of the new test before the fix ``` Error: Received unexpected error: error removing non config mutations for resource Deployment/nginx-deployment: error reverting webhook removed fields in predicted live resource: .spec.template.spec.containers: element 0: associative list with keys has an element that omits key field "name" (and doesn't have default value) Test: TestServerSideDiff/will_test_removing_some_field_with_undoing_changes_done_by_webhook ``` Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * Use new version of structured merge diff with a new option Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * Add DCO Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * Try to fix sonar exclusions config Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> --------- Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
@thecosmicfrog The fix is in this PR: #20842 and auto-merge is enabled. Will you be able to validate this build in your environment once merged? |
* fix: Fix server side diff with fields removal (#20792) Fixes #20792 Use a new version of gitops engine with a fix. This needs to be cherry-picked in v2.11-2.13. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * Use newer versions of gitops-engine and structured-merge-diff Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> --------- Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
@leoluz Great news, thanks! I'm using |
@thecosmicfrog, I usually use |
Reopening and waiting for @thecosmicfrog to confirm if the issue is addressed in current |
Thanks @leoluz @andrii-korotkov-verkada - will probably be tomorrow before I can check. Will report back ASAP. |
Hi @andrii-korotkov-verkada @leoluz. Great news. I have tested three previously-failing applications and they are now working as expected! No more errors related to server-side diff and the applications sync up correctly. I just wanted to express my thanks to you for working on this. It's been a great experience and I hugely appreciate it. Side question - do you know when this new image will be released in I'll go ahead and close the issue now. All the best! 🙂 |
You are welcome! |
…roj#20842) * fix: Fix server side diff with fields removal (argoproj#20792) Fixes argoproj#20792 Use a new version of gitops engine with a fix. This needs to be cherry-picked in v2.11-2.13. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * Use newer versions of gitops-engine and structured-merge-diff Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> --------- Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
@thecosmicfrog This fix is going to be part for Argo CD 2.14. |
@leoluz Fantastic, thanks! |
Checklist:
argocd version
.Describe the bug
I am seeing an error in Argo CD when upgrading a Helm chart from one version to another. The only difference between the Helm chart versions is that the new version removes the
resources
block fromspec.template.spec.containers[0]
in theDeployment
object. I have noticed that removing other blocks (e.g.env
) results in the same issue (so it is not just aresources
problem).The specific error is:
Additional details:
ServerSideApply=true
.ServerSideDiff
is enforced on the server side by settingcontroller.diff.server.side: true
in theargo-helm
chart values.To Reproduce
I have built a Helm chart to reproduce this, with two versions (
0.0.1
and0.0.2
). Updating will not work, as you will see.Prerequisites:
kube-system
namespace.controller.diff.server.side: true
).bug
(to host the k8s objects).Create a file called
application.yaml
to represent our Argo CDApplication
object:Apply this to the cluster:
kubectl apply -n kube-system -f application.yaml
Pods should come up without issue and everything should be correctly synced - as expected for a simple
Deployment
.Update
application.yaml
to bump the Helm chart to the new version:Apply this to the cluster:
kubectl apply -n kube-system -f application.yaml
Expected behavior
The new chart version should install without error, as it is such a straightforward change (
resources
block removed fromcontainers[0]
block).Actual behavior
Sync Status enters an Unknown state with the new chart version, and App Conditions displays 1 Error. That error is:
The only way to seemingly complete the update is to manually sync, which works without issue. We have Auto Sync enabled, so I'm not sure why that does not resolve the issue.
Screenshots
Argo CD UI after applying chart
0.0.2
:Several (5-ish?) minutes later - with no intervention from me - error "appears" to be resolved... but note the versions are not matching between both sync fields:
Then, clicking Refresh...
...results in the same 1 Error outcome as before...
...and the Pods present on the cluster are still from the "old" (
0.0.1
) chart version:The only way to fix is to manually Sync:
Which finally brings the app into sync at
0.0.2
:Version
2024/11/13 14:23:55 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined argocd: v2.13.0+347f221 BuildDate: 2024-11-04T15:31:13Z GitCommit: 347f221adba5599ef4d5f12ee572b2c17d01db4d GitTreeState: clean GoVersion: go1.23.2 Compiler: gc Platform: darwin/arm64 WARN[0001] Failed to invoke grpc call. Use flag --grpc-web in grpc calls. To avoid this warning message, use flag --grpc-web. argocd-server: v2.13.0+347f221 BuildDate: 2024-11-04T12:09:06Z GitCommit: 347f221adba5599ef4d5f12ee572b2c17d01db4d GitTreeState: clean GoVersion: go1.23.1 Compiler: gc Platform: linux/amd64 Kustomize Version: v5.4.3 2024-07-19T16:40:33Z Helm Version: v3.15.4+gfa9efb0 Kubectl Version: v0.31.0 Jsonnet Version: v0.20.0
Logs
Let me know if the above is enough information to reproduce the issue.
Thanks for your time - Aaron
The text was updated successfully, but these errors were encountered: