Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Tracing a bad deployment/HelmRelease #2028

Closed
usamaahmadkhan opened this issue May 9, 2019 · 5 comments
Closed

Tracing a bad deployment/HelmRelease #2028

usamaahmadkhan opened this issue May 9, 2019 · 5 comments
Labels

Comments

@usamaahmadkhan
Copy link

I am deploying HelmRelease through flux. And many times Flux applies the manifest but HelmOperator fails due to bad configuration. How can I trace a bad deployment?

e.g. I updated a chart version from 1.0 to 1.1
Scenario 1:
chart 1.1 does not exist, Flux applies the manifest successfully but HelmOperator errors saying Chart does not exist.

Scenario 2:
chart 1.1 gets deployed but there was a configuration issue. Now chart 1.0 is removed but chart 1.1 also cannot be deployed due to config issue increasing down time.

Is there a way we can track/trace these two scenarios without having to look into logs manually And ensuring 100 percent uptime?

Can there be any alert/any other method for this so that I know exactly if a helm release fails and if I can either roll back or fix to avoid downtime

@squaremo
Copy link
Member

squaremo commented May 9, 2019

  • .status.conditions in the HelmRelease will usually suggest what is going on with a particular release -- especially if it's a problem fetching the chart
  • there's work somewhere on adding Prometheus metrics to helm-operator, so they can be used for alerts (sorry, I can't find it right now :-S)
  • meanwhile, we've some ongoing work to make the helm-operator deal with failed releases better -- there's Log and return early if release is not upgradable #2008 which improved one particular case, and Provide optional rollback support for HelmReleases #2006 which may be better still

@squaremo
Copy link
Member

squaremo commented May 9, 2019

Does that answer your question @usamaahmadkhan ? If not, what would you like to see (i.e., what should be the next step here)?

@2opremio
Copy link
Contributor

2opremio commented May 9, 2019

Related: #1340

@kahootali
Copy link

kahootali commented May 9, 2019

@squaremo Even if it roll backs, there isn't any way for a developer to know that his latest change is not published and it was failed due to a certain reason except looking into logs of operator pod. I believe there should be a UI for this. I have created an issue for this

@stefanprodan
Copy link
Member

Flux v2, based on the GitOps Toolkit, has support for health assessment of deployments https://toolkit.fluxcd.io/components/kustomize/kustomization/#health-assessment

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants