Tracing a bad deployment/HelmRelease #2028

usamaahmadkhan · 2019-05-09T09:46:51Z

I am deploying HelmRelease through flux. And many times Flux applies the manifest but HelmOperator fails due to bad configuration. How can I trace a bad deployment?

e.g. I updated a chart version from 1.0 to 1.1
Scenario 1:
chart 1.1 does not exist, Flux applies the manifest successfully but HelmOperator errors saying Chart does not exist.

Scenario 2:
chart 1.1 gets deployed but there was a configuration issue. Now chart 1.0 is removed but chart 1.1 also cannot be deployed due to config issue increasing down time.

Is there a way we can track/trace these two scenarios without having to look into logs manually And ensuring 100 percent uptime?

Can there be any alert/any other method for this so that I know exactly if a helm release fails and if I can either roll back or fix to avoid downtime

squaremo · 2019-05-09T09:53:52Z

.status.conditions in the HelmRelease will usually suggest what is going on with a particular release -- especially if it's a problem fetching the chart
there's work somewhere on adding Prometheus metrics to helm-operator, so they can be used for alerts (sorry, I can't find it right now :-S)
meanwhile, we've some ongoing work to make the helm-operator deal with failed releases better -- there's Log and return early if release is not upgradable #2008 which improved one particular case, and Provide optional rollback support for HelmReleases #2006 which may be better still

squaremo · 2019-05-09T10:18:07Z

Does that answer your question @usamaahmadkhan ? If not, what would you like to see (i.e., what should be the next step here)?

2opremio · 2019-05-09T10:25:05Z

Related: #1340

kahootali · 2019-05-09T11:55:48Z

@squaremo Even if it roll backs, there isn't any way for a developer to know that his latest change is not published and it was failed due to a certain reason except looking into logs of operator pod. I believe there should be a UI for this. I have created an issue for this

stefanprodan · 2020-08-17T07:04:45Z

Flux v2, based on the GitOps Toolkit, has support for health assessment of deployments https://toolkit.fluxcd.io/components/kustomize/kustomization/#health-assessment

squaremo added the question label May 9, 2019

stefanprodan closed this as completed Aug 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracing a bad deployment/HelmRelease #2028

Tracing a bad deployment/HelmRelease #2028

usamaahmadkhan commented May 9, 2019

squaremo commented May 9, 2019

squaremo commented May 9, 2019

2opremio commented May 9, 2019

kahootali commented May 9, 2019 •

edited

Loading

stefanprodan commented Aug 17, 2020

Tracing a bad deployment/HelmRelease #2028

Tracing a bad deployment/HelmRelease #2028

Comments

usamaahmadkhan commented May 9, 2019

squaremo commented May 9, 2019

squaremo commented May 9, 2019

2opremio commented May 9, 2019

kahootali commented May 9, 2019 • edited Loading

stefanprodan commented Aug 17, 2020

kahootali commented May 9, 2019 •

edited

Loading