Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oci source for prometheus-community charts doesn't work from flux #3313

Closed
1 task done
sereinity opened this issue Nov 14, 2022 · 9 comments
Closed
1 task done

oci source for prometheus-community charts doesn't work from flux #3313

sereinity opened this issue Nov 14, 2022 · 9 comments

Comments

@sereinity
Copy link

Describe the bug

Since #3303 has been merged flux can't operate charts coming from prometheus-community.

flux -n monitoring get sources helm prometheus-community
NAME                    REVISION        SUSPENDED       READY   MESSAGE
prometheus-community                    False           True    Helm repository is ready

here the status of charts from this registry:

flux -n monitoring get sources chart
NAME                                            REVISION        SUSPENDED       READY   MESSAGE

elastic-prometheus-elasticsearch-exporter       4.14.0          False           False   chart pull error: failed to download chart for remote reference: failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden

monitoring-kube-prometheus-stack                41.7.3          False           False   chart verification error: failed to verify oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack:41.7.4: GET https://ghcr.io/v2/prometheus-community/charts/kube-prometheus-stack/manifests/41.7.4: MANIFEST_UNKNOWN: manifest unknown

I tried to suspend, reconcile, resume those resources (and associated helmreleases) without any success.

I didn't try yet from a fresh installation of flux and all, will probably be done by the week/day as I need to provision a fresh cluster.

Steps to reproduce

  1. Beeing prior november 9th
  2. Use https://github.com/fluxcd/flux2 (branch main) as a GitRepository object
  3. Add a Kustomization that loads ./manifests/monitoring/kube-prometheus-stack (there is a patch especially on the chart version: 41.7.3)
  4. Passing the november 9th (the chart repository uri changed)
  5. Trying to upgrade kube-prometheus-stack to 41.7.4

Expected behavior

The new chart is pulled, verified and applied

Screenshots and recordings

No response

OS / Distro

Arch Linux

Flux version

v0.36.0

Flux check

► checking prerequisites
✔ Kubernetes 1.24.6-gke.1500 >=1.20.6-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.26.0
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.30.0
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.28.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.31.0
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta1
✔ buckets.source.toolkit.fluxcd.io/v1beta2
✔ gitrepositories.source.toolkit.fluxcd.io/v1beta2
✔ helmcharts.source.toolkit.fluxcd.io/v1beta2
✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1
✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1beta2
✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2
✔ providers.notification.toolkit.fluxcd.io/v1beta1
✔ receivers.notification.toolkit.fluxcd.io/v1beta1
✔ all checks passed

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@sereinity
Copy link
Author

I confirm the issue from a fresh cluster.

@stefanprodan
Copy link
Member

@sereinity
Copy link
Author

Indeed, they only publish new releases in the OCI registry and it looks like the process failed on the commit of 41.7.4.

As I also use the HelmRepository for the other charts that haven't new release yet (exporters), I decided − for now − to revert to the official repository (and disable verification), but I confirm that reverting to 41.7.3 would also fix the deployment.

Note: I said official repository as it's the one still documented in their pages and README.

Extra notes:

@kingdonb
Copy link
Member

kingdonb commented Nov 14, 2022

It looks like some changes went in since the failed release that have mitigated the 403 error (maybe?)

At least there has been a test since the failed build in the prometheus-community repo with a different chart tag, and it passed (3 days ago), so I suspect the next tag would succeed. I have gone through the issue reports you mentioned @sereinity and that's the conclusion I came to...

Maybe OT or maybe not... Is there a way to publish a new GHA job that then runs for all tags, including old ones that haven't run the job yet? (I'm trying to think of ways to publish all back versions of charts, once we are sure we have the job right.)

I think we'd want to see all of the charts going forward reliably published as OCI, with a method to publish those that were skipped due to failures or that came before the update... is this a blocker before we document this and make it the preferred distribution?

I'd probably settle for a way to publish the missing chart in OCI, hopefully without removing or recreating any tags, but ideally we'd be publishing old versions backwards from the date when OCI first became available – this might be a challenge.

The benefit would be so people can switch their repos at any time without worrying about whether they are on a late enough version of a given chart, or if they need to upgrade Prom first. Ideally it's all of the chart versions, scripted back to a given threshold date, or just literally all of them, as far back as will work (with a version of Helm that had OCI support, I guess?)

Maybe worth adding a GHA job with workflow dispatch trigger that only publishes the OCI chart for a given tag on-demand, so we can patch up issues like this one as they come along (hopefully not very often) – I don't know how much value there is in going back and republishing all the old charts, but I wouldn't rule it out as many people will probably see that as a blocker to adopt, (until/unless the old HTTP-based chart repo is axed and not supported anymore for new chart versions.)

@scottrigby
Copy link
Member

Yes this was a temporary error in the new CI for https://github.com/prometheus-community/helm-charts. It should be fixed now, but yes makes sense that a package version was missing. I'll manually push the chart version 41.7.4 today. Note I still need to push previous chart history to OCI, it is currently only new chart versions (minus ones missing due to the recent new CI errors that are fixed now).

@stefanprodan
Copy link
Member

@scottrigby the exporters seems to have no OCI charts, can you push all charts from that repo to GHCR please?

@scottrigby
Copy link
Member

@stefanprodan ok yes will do 👍

@scottrigby
Copy link
Member

Current versions of all Prometheus community charts are pushed to ghcr, signed and public packages ✅

Still need to push package history. There are thousands of past versions, and when pushing locally there is an OIDC prompt. So I think I may put my automation into a GitHub action to run on a cron schedule for backfilling in OCI.

@sereinity
Copy link
Author

thank you @scottrigby, I'm really happy that we come to this solution.

I'm closing the issue as this is no more related to this repository, and the main issue is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants