Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kustomization status "failed to parse digest: invalid checksum digest format" #3861

Closed
1 task done
MartinEmrich opened this issue May 9, 2023 · 7 comments · Fixed by fluxcd/source-controller#1088
Closed
1 task done

Comments

@MartinEmrich
Copy link

Describe the bug

A Kustomization referring to a GitRepository with only two CustomResourceDefinition objects, sometimes gets stuck in Status failed to parse digest: invalid checksum digest format.

Googling this error messages, it appears mostly in the context of (damaged?) container image layers from OCI repositories. But around this Kustomization, there's nothing even close to a PodSpecTemplate, and no containers.

Steps to reproduce

  1. Apply GitRepository and Kustomization
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: GitRepository
metadata:
  name: aws-load-balancer-controller
  namespace: kube-system
spec:
  ref:
    tag: "v2.4.7"
  timeout: 20s
  interval: "2h"
  url: https://github.com/kubernetes-sigs/aws-load-balancer-controller.git
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: aws-load-balancer-controller-crd
  namespace: kube-system
spec:
  interval: 2h
  path: "helm/aws-load-balancer-controller/crds"
  prune: true
  sourceRef:
    kind: GitRepository
    name: aws-load-balancer-controller
  1. Wait
  2. At some time, this error appears.
...
  status:
    conditions:
    - lastTransitionTime: "2023-05-09T14:39:47Z"
      message: Fetching manifests for revision v2.4.7@sha1:2ba14d1e232c1f2aa02063a3edd2ef855ba468d9
        with a timeout of 1h59m30s
      observedGeneration: 1
      reason: ProgressingWithRetry
      status: "True"
      type: Reconciling
    - lastTransitionTime: "2023-05-09T14:39:47Z"
      message: 'failed to parse digest: invalid checksum digest format'
      observedGeneration: 1
      reason: ArtifactFailed
      status: "False"
      type: Ready
    inventory:
      entries:
      - id: _ingressclassparams.elbv2.k8s.aws_apiextensions.k8s.io_CustomResourceDefinition
        v: v1
      - id: _targetgroupbindings.elbv2.k8s.aws_apiextensions.k8s.io_CustomResourceDefinition
        v: v1
    lastAppliedRevision: v2.4.7@sha1:2ba14d1e232c1f2aa02063a3edd2ef855ba468d9
    lastAttemptedRevision: v2.4.7@sha1:2ba14d1e232c1f2aa02063a3edd2ef855ba468d9
    lastHandledReconcileAt: "2023-04-19T11:44:20.5516562+02:00"
    observedGeneration: 1
...

Expected behavior

The Kustomization should stay in status "Applied revision xxxxxxx".

If this error is "real", I would expect an error message leading me to a possible cause/solution.

Screenshots and recordings

No response

OS / Distro

AWS EKS 1.24

Flux version

N/A

Flux check

► checking prerequisites
✔ Kubernetes 1.24.12-eks-ec5523e >=1.20.6-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.32.1
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.17.1
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.13.2
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v1.0.0-rc.1
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v1.0.0-rc.1
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v1.0.0-rc.1
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta2
✔ buckets.source.toolkit.fluxcd.io/v1beta2
✔ gitrepositories.source.toolkit.fluxcd.io/v1
✔ helmcharts.source.toolkit.fluxcd.io/v1beta2
✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1
✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2
✔ imagepolicies.image.toolkit.fluxcd.io/v1beta1
✔ imagerepositories.image.toolkit.fluxcd.io/v1beta1
✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1
✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2
✔ providers.notification.toolkit.fluxcd.io/v1beta2
✔ receivers.notification.toolkit.fluxcd.io/v1
✔ all checks passed

Git provider

No response

Container Registry provider

No response

Additional context

Server-side versions:

flux: v2.0.0-rc.1
helm-controller: v0.32.1
image-automation-controller: v0.17.1
image-reflector-controller: v0.13.2
kustomize-controller: v1.0.0-rc.1
notification-controller: v1.0.0-rc.1
source-controller: v1.0.0-rc.1

Code of Conduct

  • I agree to follow this project's Code of Conduct
@stefanprodan
Copy link
Member

Please update your GitRepository and Kustomization manifests to v1, the checksum format has changed in RC.1

@MartinEmrich
Copy link
Author

@stefanprodan thanks! I changed both to plain /v1, and ran a reconcile first on the GitRepository (successful), and then on the Kustomization, which still gives the same error.

Might I have to "clean up" manually some store where these "old style" checksums are stored?

@stefanprodan
Copy link
Member

If you delete the source-controller pod, the checksums should be recalculated, please see if this fixes it.

@hiddeco
Copy link
Member

hiddeco commented May 9, 2023

Can you share the Status of the GitRepository object?

@MartinEmrich
Copy link
Author

MartinEmrich commented May 9, 2023

The status of the GitRepository (before restarting source-controller):

status:
  artifact:
    lastUpdateTime: "2023-04-18T22:13:33Z"
    path: gitrepository/kube-system/aws-load-balancer-controller/2ba14d1e232c1f2aa02063a3edd2ef855ba468d9.tar.gz
    revision: v2.4.7@sha1:2ba14d1e232c1f2aa02063a3edd2ef855ba468d9
    url: http://source-controller.flux-system.svc.cluster.local./gitrepository/kube-system/aws-load-balancer-controller/2ba14d1e232c1f2aa02063a3edd2ef855ba468d9.tar.gz
  conditions:
  - lastTransitionTime: "2023-04-18T22:13:33Z"
    message: stored artifact for revision 'v2.4.7@sha1:2ba14d1e232c1f2aa02063a3edd2ef855ba468d9'
    observedGeneration: 4
    reason: Succeeded
    status: "True"
    type: Ready
  - lastTransitionTime: "2023-04-18T22:13:33Z"
    message: stored artifact for revision 'v2.4.7@sha1:2ba14d1e232c1f2aa02063a3edd2ef855ba468d9'
    observedGeneration: 4
    reason: Succeeded
    status: "True"
    type: ArtifactInStorage
  lastHandledReconcileAt: "2023-05-09T17:16:00.740904774+02:00"
  observedGeneration: 4

After restarting source-controller and reconciling GitRepository:

status:
  artifact:
    digest: sha256:7f758796b007a234e31157ac66f02c0ce052904f585a144cc49b1abdc1163cf5
    lastUpdateTime: "2023-05-09T15:34:35Z"
    path: gitrepository/kube-system/aws-load-balancer-controller/2ba14d1e232c1f2aa02063a3edd2ef855ba468d9.tar.gz
    revision: v2.4.7@sha1:2ba14d1e232c1f2aa02063a3edd2ef855ba468d9
    size: 608378
    url: http://source-controller.flux-system.svc.cluster.local./gitrepository/kube-system/aws-load-balancer-controller/2ba14d1e232c1f2aa02063a3edd2ef855ba468d9.tar.gz
  conditions:
  - lastTransitionTime: "2023-05-09T15:34:35Z"
    message: stored artifact for revision 'v2.4.7@sha1:2ba14d1e232c1f2aa02063a3edd2ef855ba468d9'
    observedGeneration: 4
    reason: Succeeded
    status: "True"
    type: Ready
  - lastTransitionTime: "2023-05-09T15:34:35Z"
    message: stored artifact for revision 'v2.4.7@sha1:2ba14d1e232c1f2aa02063a3edd2ef855ba468d9'
    observedGeneration: 4
    reason: Succeeded
    status: "True"
    type: ArtifactInStorage
  lastHandledReconcileAt: "2023-05-09T17:35:03.438065288+02:00"
  observedGeneration: 4

I notice that after restarting the source-controller, a status.digest field appeared in the GitRepository object.

And now, reconciling the Kustomization works, too. Thanks! I'll watch out for the next days if it reappears, but I guess that should have fixed it.

@MartinEmrich
Copy link
Author

Upgraded to v1 everywhere, no more issues so far.

@hiddeco
Copy link
Member

hiddeco commented May 10, 2023

In the upcoming RC.3 release, this should be resolved automatically due to the change from the PR referenced above.

Thank you for the report @MartinEmrich. 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants