-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kustomize controller does not detect changes on a ressource #3552
Comments
Can you post here please the output of these commands:
|
Sure
|
So fi you commit the HR without |
exactly, the customCardTemplate is removed. There are no erros on the kustomizations:
The logs of the kustomize-controller tell that the HelmRelease is unchanged: |
Hmm but your using a tag |
Yes. This release containes the patch were the value was removed. Was the first thing I checked. |
Hello, we are hit by the same problem on different resources on multiple clusters from different providers for few weeks now. We are currently running The weird thing is that even if we go Now comparing managed fields on one resource that have the problem, only the 2 fields we are trying to remove are missing from the list. |
I wasn't able to reproduce the issue on a kind cluster with Kubernetes 1.24.6 and Flux 0.35.0 from scratch so I suspect that a sequence of changes put the cluster in a state where this happens. @schmidt-i are you able to reproduce the issue even on a fresh cluster? |
I've just hit what looks like an identical issue today - do let me know if I should create a separate issue if it sounds like a different problem!
My scenario is that I've got a PrometheusRule with a couple of groups in it, something along these lines: ---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cloud-admission-ctl-alerts-short-span
namespace: cloud-admission-controller
labels:
target: alertmanager
spec:
groups:
- name: cloud-admission-controller
rules:
- alert: CloudAdmissionCtlDown
(...)
- name: cloud-admission-controller-probe
rules:
- alert: CloudAdmissionCtlProbeFailed
(...)
- alert: CloudAdmissionCtlProbeStale
(...)
- alert: CloudAdmissionCtlProbeHugelyStale
(...) I am decommissioning the ---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
(...)
patches:
- target:
group: monitoring.coreos.com
version: v1
kind: PrometheusRule
name: cloud-admission-ctl-alerts-short-span
patch: |-
- op: remove
path: /spec/groups/0 This patch is replicated in four separate places: one cluster and three accounts (for every account there's an overlay which is included by the clusters in that account). The result of this being applied by Flux is quite surprising because: After a while of scratching my head and trying different things (mostly making changes to the PrometheusRule in one of the affected clusters) I tried removing the group I want removed via a manual @makkes's last question might actually be relevant here because the two clusters where the patch worked are pretty new (only built a couple of weeks ago) and thus have only had one version of Flux 2 deployed to them with no subsequent upgrades as well as never had any Flux 1 components deployed to them. The clusters where I'm experiencing the issue have had (and still do) Flux 1 deployed and have been through a few Flux 2 upgrades. FWIW I'm deploying Flux2 using the community helm chart. To make things even more interesting, just as I was writing this up I thought I should try making another edit to this PrometheusRule in the cluster where I've done the manual edit before (by adding a fake group with a fake alert) - to my surprise the next reconciliation has correctly removed the edit. I'm in a position where I can actually leave things as they are for a few days so please do let me know if there's any further debugging I could do to triage this issue further. Thanks! |
I'm hitting similar issue, running flux I've seen this on few I'm not sure if it is source controller that cache this key or helm/kustomization controller. I've tried to force reconcilation by We have 8 clusters and each cluster shares the same config, the behavior is random as on some clusters the key is removed properly. I've also tried setting I'd like to know if there is any workaround that will force reinstall helm using clean values without removing resources itself. The example of removed key from kube-prometheus-stack helm:
After removing |
We are on |
This is pretty serious. Is there a maintainer we can ping or a path to escalation? Thanks! |
We are also seeing this on |
We're also seeing this on Like jkotiuk mentioned, manually editing the |
I've also seen this issue on My git diff looks like this (it's part of a patch): diff --git a/clusters/dev-sandbox-redux/flux-components/vault/values.yaml b/clusters/dev-sandbox-redux/flux-components/vault/values.yaml
index 52f453ba..ff8d1e4f 100644
--- a/clusters/dev-sandbox-redux/flux-components/vault/values.yaml
+++ b/clusters/dev-sandbox-redux/flux-components/vault/values.yaml
@@ -6,7 +6,7 @@ metadata:
spec:
chart:
spec:
- version: "v0.19.0"
+ version: "v0.20.1"
values:
global:
tlsDisable: false # Enable HTTPS (uses certificates from Cert-manager)
@@ -15,8 +15,6 @@ spec:
- name: dockerhub
server:
repository: "public.ecr.aws/hashicorp/vault"
- image:
- tag: "1.9.3"
extraArgs: "-config=/config/vault-config/config.hcl" # Get configuration from K8s secret (provisioned by Terraform)
extraVolumes:
- type: secret
@@ -76,7 +74,6 @@ spec:
injector:
agentImage:
repository: "public.ecr.aws/hashicorp/vault"
- tag: "1.9.2"
replicas: 2
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false" When I run apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
labels:
kustomize.toolkit.fluxcd.io/name: vault
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: vault
namespace: hashicorp
spec:
chart:
spec:
chart: vault
sourceRef:
kind: HelmRepository
name: vault
namespace: flux-system
version: v0.20.1
install:
remediation:
retries: 3
interval: 1h0m0s
releaseName: vault
upgrade:
crds: CreateReplace
values:
global:
imagePullSecrets:
- name: quay
- name: dockerhub
tlsDisable: false
injector:
agentImage:
repository: public.ecr.aws/hashicorp/vault
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
replicas: 2
server:
authDelegator:
enabled: false
dataStorage:
enabled: false
extraArgs: -config=/config/vault-config/config.hcl
extraVolumes:
- name: vault-service-cluster-zone-tls
type: secret
- name: vault-config
path: /config
type: secret
ha:
config: |
storage "consul" {
path = "vault"
address = "HOST_IP:8500"
}
telemetry {
dogstatsd_addr = "HOST_IP:8125"
}
disruptionBudget.maxUnavailable: 2
enabled: true
replicas: 5
ingress:
activeService: false
annotations:
external-dns.alpha.kubernetes.io/cloudflare-proxied: "false"
kubernetes.io/ingress.class: nginx-internal
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
enabled: true
hosts: "...redacted..."
repository: public.ecr.aws/hashicorp/vault
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 1
memory: 1Gi
service:
annotations: {}
updateStrategyType: RollingUpdate So it's definitively not there in the build. But when I apply it, it doesn't prune the previously explicitly set image tag values from the HelmRelease. They're also not detected by
We end up having to manually edit the HelmRelease to remove these left-behind fields. A copy of my current HelmRelease (before any changes) is here: apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
creationTimestamp: "2023-03-29T19:17:31Z"
finalizers:
- finalizers.fluxcd.io
generation: 5
labels:
kustomize.toolkit.fluxcd.io/name: vault
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: vault
namespace: hashicorp
resourceVersion: "1585885902"
uid: b2397151-653c-46e9-8a70-fd7ee27e0977
spec:
chart:
spec:
chart: vault
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: vault
namespace: flux-system
version: v0.19.0
install:
remediation:
retries: 3
interval: 1h0m0s
releaseName: vault
upgrade:
crds: CreateReplace
values:
global:
imagePullSecrets:
- name: quay
- name: dockerhub
tlsDisable: false
injector:
agentImage:
repository: public.ecr.aws/hashicorp/vault
tag: 1.9.2
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
replicas: 2
server:
authDelegator:
enabled: false
dataStorage:
enabled: false
extraArgs: -config=/config/vault-config/config.hcl
extraVolumes:
- name: vault-service-cluster-zone-tls
type: secret
- name: vault-config
path: /config
type: secret
ha:
config: |
storage "consul" {
path = "vault"
address = "HOST_IP:8500"
}
telemetry {
dogstatsd_addr = "HOST_IP:8125"
}
disruptionBudget.maxUnavailable: 2
enabled: true
replicas: 5
image:
tag: 1.9.3
ingress:
activeService: false
annotations:
external-dns.alpha.kubernetes.io/cloudflare-proxied: "false"
kubernetes.io/ingress.class: nginx-internal
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
enabled: true
hosts: "...redacted..."
repository: public.ecr.aws/hashicorp/vault
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 1
memory: 1Gi
service:
annotations: {}
updateStrategyType: RollingUpdate
status:
conditions:
- lastTransitionTime: "2023-05-31T11:40:19Z"
message: Release reconciliation succeeded
reason: ReconciliationSucceeded
status: "True"
type: Ready
- lastTransitionTime: "2023-05-17T20:38:27Z"
message: Helm upgrade succeeded
reason: UpgradeSucceeded
status: "True"
type: Released
helmChart: flux-system/hashicorp-vault
lastAppliedRevision: 0.19.0
lastAttemptedRevision: 0.19.0
lastAttemptedValuesChecksum: 5635480e74c3e338b6741546341da74945b593ea
lastReleaseRevision: 14
observedGeneration: 5 I'm also happy to help debug this if that's of any use. Thanks for looking into it! |
Is the patch diff part of a Flux kustomization file? You could be hitting this issue, which is fixed after 2.0.1 @ebachle - could you take a look at this and see if it sounds like your same issue? The original report is from a very old version, unless we have an active reporter after 2.0.1 I think we should close it. If you can read the description of the linked PR, confirm the location of the patch, and check briefly if you think you should be using |
Hey @kingdonb, this is going to be a bit of a long answer, but here's what I eventually found out. No guarantees on this, but I almost certain the version we installed the What I ultimately have come to concude is the issue is in what fields Kubernetes thinks the This is a before of the stuff around apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
creationTimestamp: "2023-03-29T19:17:31Z"
finalizers:
- finalizers.fluxcd.io
generation: 5
labels:
kustomize.toolkit.fluxcd.io/name: vault
kustomize.toolkit.fluxcd.io/namespace: flux-system
managedFields:
- apiVersion: helm.toolkit.fluxcd.io/v2beta1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
f:kustomize.toolkit.fluxcd.io/name: {}
f:kustomize.toolkit.fluxcd.io/namespace: {}
f:spec:
f:chart:
f:spec:
f:chart: {}
f:sourceRef:
f:kind: {}
f:name: {}
f:namespace: {}
f:version: {}
f:install:
f:remediation:
f:retries: {}
f:interval: {}
f:releaseName: {}
f:upgrade:
f:crds: {}
f:values: {}
manager: kustomize-controller
operation: Apply
time: "2023-05-17T20:38:24Z"
- apiVersion: helm.toolkit.fluxcd.io/v2beta1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.: {}
v:"finalizers.fluxcd.io": {}
manager: helm-controller
operation: Update
time: "2023-03-29T19:17:31Z"
- apiVersion: helm.toolkit.fluxcd.io/v2beta1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions: {}
f:helmChart: {}
f:lastAppliedRevision: {}
f:lastAttemptedRevision: {}
f:lastAttemptedValuesChecksum: {}
f:lastReleaseRevision: {}
f:observedGeneration: {}
manager: helm-controller
operation: Update
subresource: status
time: "2023-05-31T11:40:19Z"
name: vault
namespace: hashicorp
resourceVersion: "1585885902"
uid: b2397151-653c-46e9-8a70-fd7ee27e0977
spec: When I apply a change that doesn't remove a field, really just making a random change (additive or mutating), but forces the apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
creationTimestamp: "2023-03-29T19:17:31Z"
finalizers:
- finalizers.fluxcd.io
generation: 6
labels:
kustomize.toolkit.fluxcd.io/name: vault
kustomize.toolkit.fluxcd.io/namespace: flux-system
managedFields:
- apiVersion: helm.toolkit.fluxcd.io/v2beta1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
f:kustomize.toolkit.fluxcd.io/name: {}
f:kustomize.toolkit.fluxcd.io/namespace: {}
f:spec:
f:chart:
f:spec:
f:chart: {}
f:sourceRef:
f:kind: {}
f:name: {}
f:namespace: {}
f:version: {}
f:install:
f:remediation:
f:retries: {}
f:interval: {}
f:releaseName: {}
f:upgrade:
f:crds: {}
f:values:
f:global:
.: {}
f:imagePullSecrets: {}
f:tlsDisable: {}
f:injector:
.: {}
f:agentImage:
.: {}
f:repository: {}
f:annotations:
.: {}
f:cluster-autoscaler.kubernetes.io/safe-to-evict: {}
f:replicas: {}
f:server:
.: {}
f:authDelegator:
.: {}
f:enabled: {}
f:dataStorage:
.: {}
f:enabled: {}
f:extraArgs: {}
f:extraVolumes: {}
f:ha:
.: {}
f:config: {}
f:disruptionBudget.maxUnavailable: {}
f:enabled: {}
f:replicas: {}
f:ingress:
.: {}
f:activeService: {}
f:annotations:
.: {}
f:external-dns.alpha.kubernetes.io/cloudflare-proxied: {}
f:kubernetes.io/ingress.class: {}
f:nginx.ingress.kubernetes.io/ssl-passthrough: {}
f:enabled: {}
f:hosts: {}
f:repository: {}
f:resources:
.: {}
f:limits:
.: {}
f:cpu: {}
f:memory: {}
f:requests:
.: {}
f:cpu: {}
f:memory: {}
f:service:
.: {}
f:annotations:
.: {}
f:ad.datadoghq.com/service.check_names: {}
f:ad.datadoghq.com/service.init_configs: {}
f:ad.datadoghq.com/service.instances: {}
f:updateStrategyType: {}
manager: kustomize-controller
operation: Apply
time: "2023-07-19T15:31:24Z"
- apiVersion: helm.toolkit.fluxcd.io/v2beta1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.: {}
v:"finalizers.fluxcd.io": {}
manager: helm-controller
operation: Update
time: "2023-03-29T19:17:31Z"
- apiVersion: helm.toolkit.fluxcd.io/v2beta1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions: {}
f:helmChart: {}
f:lastAppliedRevision: {}
f:lastAttemptedRevision: {}
f:lastAttemptedValuesChecksum: {}
f:lastReleaseRevision: {}
f:observedGeneration: {}
manager: helm-controller
operation: Update
subresource: status
time: "2023-07-19T15:31:24Z"
name: vault
namespace: hashicorp
resourceVersion: "1749511002"
uid: b2397151-653c-46e9-8a70-fd7ee27e0977
spec: Namely that the
After that point I'm able to modify the value of my image field in a separate commit and all works as expected. I've reviewed the changelog of the The other possibility is maybe it's something that change between our upgrade to 1.22 and 1.23 in this time? But I'd be hard pressed to find that one either. I'd be curious if there's any ideas, but I did regardless want to share my findings in case anyone else finds themselves in this pickle. I'm also not sure if there's a change that could be made to force this addition of new managed fields before applying the change. But that also feels rather risky of a change in general. Especially as releases after the GA may not have this issue. Deep detailsSome details on the exact change I made... I updated the chart version in one PR: diff --git a/clusters/arryn-staging-redux/flux-components/vault/values.yaml b/clusters/arryn-staging-redux/flux-components/vault/values.yaml
index 7b375891..bca9bea2 100644
--- a/clusters/arryn-staging-redux/flux-components/vault/values.yaml
+++ b/clusters/arryn-staging-redux/flux-components/vault/values.yaml
@@ -6,7 +6,7 @@ metadata:
spec:
chart:
spec:
- version: "v0.19.0"
+ version: "v0.20.1"
values:
global:
tlsDisable: false # Enable HTTPS (uses certificates from Cert-manager) This resulted in this diff from
After that point the fields are managed as expected. Then I made a separte PR to remove the image field I no longer want to differ from the default values. ANd this is once the diff --git a/clusters/arryn-staging-redux/flux-components/vault/values.yaml b/clusters/arryn-staging-redux/flux-components/vault/values.yaml
index 4fcd2b0a..ff8d1e4f 100644
--- a/clusters/arryn-staging-redux/flux-components/vault/values.yaml
+++ b/clusters/arryn-staging-redux/flux-components/vault/values.yaml
@@ -15,8 +15,6 @@ spec:
- name: dockerhub
server:
image:
repository: "public.ecr.aws/hashicorp/vault"
- tag: "1.9.3"
extraArgs: "-config=/config/vault-config/config.hcl" # Get configuration from K8s secret (provisioned by Terraform)
extraVolumes:
- type: secret
@@ -76,7 +74,6 @@ spec:
injector:
agentImage:
repository: "public.ecr.aws/hashicorp/vault"
- tag: "1.9.2"
replicas: 2
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false" At that point my
And all things seem managed as expected, including the removal/update of the field. |
I appreciate you sharing your findings! I just wanted to ensure I read the conclusion, you found that the upgrade did resolve the issue, though it sounds like you may have still had to force a change somehow to see the updated result in the end. There were definitely updates in kustomize controller that affected how server side apply reconciles sub-structures in later versions, I'm not sure of the exact versions that would have included these changes. So long as you're able to work with the current state in GA, and since it sounds like you have (had) a repro of the issue on a version matching the report, if I understood all that correctly, then I believe based on your update we can close this issue. Thanks again for reporting back @schmidt-i. Have I got that right? |
Hi,Does this issue be resolved in 2.0.1?I'm also facing this bug in flux 0.28.5 |
We haven't gotten any more feedback on this issue for a couple of months now. 0.28.5 is more than 2 years old so if you would like to help here, you could upgrade to the latest Flux version and see if the issue goes away. |
I'm still seeing this issue on v2.2.3 |
Describe the bug
Changes on a HelmRelease manifest from a Git repo are not applied by the kustomize controller nor being found by
flux diff
Steps to reproduce
Expected behavior
Changes are applied by the kustomize controller and the helm release is reconciled.
Screenshots and recordings
No response
OS / Distro
Linux
Flux version
v0.35.0
Flux check
► checking prerequisites
✔ Kubernetes 1.24.6 >=1.20.6-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.25.0
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.26.0
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.22.0
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.29.0
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.27.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.30.0
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta1
✔ buckets.source.toolkit.fluxcd.io/v1beta1
✔ gitrepositories.source.toolkit.fluxcd.io/v1beta1
✔ helmcharts.source.toolkit.fluxcd.io/v1beta1
✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1
✔ helmrepositories.source.toolkit.fluxcd.io/v1beta1
✔ imagepolicies.image.toolkit.fluxcd.io/v1beta1
✔ imagerepositories.image.toolkit.fluxcd.io/v1beta1
✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1beta2
✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2
✔ providers.notification.toolkit.fluxcd.io/v1beta1
✔ receivers.notification.toolkit.fluxcd.io/v1beta1
✔ all checks passed
Git provider
GitHub (Enterprise)
Container Registry provider
No response
Additional context
The change in the HelmRelease is a removal of a multiline yaml configuration from the values section.
flux diff shows no difference in the current configuration to the applied configuration even.
Currently configured ressource in the cluster:
Configuration in the GitRepo:
As you can see the value "customCardTemplate" is no longer present. However the kustomize controller does not identify any change here.
Code of Conduct
The text was updated successfully, but these errors were encountered: