Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

helm-operator stops working without any error #2147

Closed
runningman84 opened this issue Jun 11, 2019 · 3 comments
Closed

helm-operator stops working without any error #2147

runningman84 opened this issue Jun 11, 2019 · 3 comments
Labels

Comments

@runningman84
Copy link

Describe the bug
For an unknown reason the helm operator stopped working. I can still change helm releases in github, they are feteched by the flux daemon and installed as a helm release object. But the helm operator does not do anything. The same setup used to run fine. Other clusters with the same setup also work fine.

Expected behavior
All helm releases should be deployed. The traefik deployment is missing in my case.

Logs

$ helm ls
NAME                                    REVISION        UPDATED                         STATUS          CHART                                   APP VERSION     NAMESPACE             
cluster-autoscaler                      2               Wed May 29 13:09:46 2019        DEPLOYED        cluster-autoscaler-0.12.1               1.13.1          kube-system           
external-auth                           1               Fri May 31 12:07:28 2019        DEPLOYED        external-auth-server-0.1.0              1.0             kube-system           
external-dns                            1               Tue May 21 14:33:30 2019        DEPLOYED        external-dns-1.7.3                      0.5.12          kube-system           
flux                                    2               Tue Jun 11 15:03:33 2019        DEPLOYED        flux-0.9.5                              1.12.3          flux                  
k8s-spot-termination-handler            2               Wed May 29 13:36:45 2019        DEPLOYED        k8s-spot-termination-handler-1.1.0      1.13.0-1        kube-system           
kube2iam                                2               Wed May 29 13:09:46 2019        DEPLOYED        kube2iam-0.10.0                         0.10.4          kube-system           
kubernetes-dashboard                    1               Tue May 21 14:33:36 2019        DEPLOYED        kubernetes-dashboard-1.4.0              1.10.1          kube-system           
metrics-server                          1               Tue May 21 14:33:39 2019        DEPLOYED        metrics-server-2.6.0                    0.3.2           kube-system           
prometheus                              1               Tue May 28 13:15:52 2019        DEPLOYED        prometheus-operator-5.10.5              0.29.0          monitoring            

$ kubectl get pods -n flux
NAME                                 READY   STATUS    RESTARTS   AGE
flux-84b6b6d7fd-fkbj6                1/1     Running   1          36m
flux-helm-operator-df5746688-7lrjm   1/1     Running   1          53m
flux-memcached-6f8c446979-998r6      1/1     Running   0          37m
fluxcloud-864b944646-q6j28           1/1     Running   0          5d4h

$ kubectl get helmrelease --all-namespaces
NAMESPACE     NAME                           AGE
kube-system   cluster-autoscaler             21d
kube-system   external-auth                  11d
kube-system   external-dns                   21d
kube-system   k8s-spot-termination-handler   21d
kube-system   kube2iam                       21d
kube-system   kubernetes-dashboard           21d
kube-system   metrics-server                 21d
monitoring    prometheus-operator            14d
traefik       etcd-operator-traefik          5d
traefik       traefik                        50m

$ kubectl logs flux-helm-operator-df5746688-7lrjm -n flux
W0611 14:51:22.080678       8 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
ts=2019-06-11T14:51:22.106974537Z caller=helm.go:88 component=helm info="connected to Tiller" version="sem_ver:\"v2.12.2\" git_commit:\"7d2b0c73d734f6586ed222a567c5d103fed435be\" git_tree_state:\"clean\" " host=tiller-deploy.kube-system:44134 options="{Host: Port: Namespace:kube-system TLSVerify:false TLSEnable:false TLSKey:/etc/fluxd/helm/tls.key TLSCert:/etc/fluxd/helm/tls.crt TLSCACert: TLSHostname:}"
ts=2019-06-11T14:51:22.107087726Z caller=chartsync.go:152 component=chartsync info="starting git chart sync loop"
ts=2019-06-11T14:51:22.107315937Z caller=operator.go:95 component=operator info="setting up event handlers"
ts=2019-06-11T14:51:22.107351687Z caller=operator.go:115 component=operator info="event handlers set up"
ts=2019-06-11T14:51:22.107384227Z caller=operator.go:128 component=operator info="starting operator"
ts=2019-06-11T14:51:22.107412087Z caller=operator.go:130 component=operator info="waiting for informer caches to sync"
ts=2019-06-11T14:51:22.107637908Z caller=server.go:41 component=daemonhttp info="starting HTTP server on :3030"
ts=2019-06-11T14:51:22.539276028Z caller=checkpoint.go:24 component=checkpoint msg="up to date" latest=0.9.1

 $ kail -n flux --since 5m
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:37:23.119420749Z caller=images.go:18 component=sync-loop msg="polling images"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:39:22.424954519Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:arvatoaws/flux-repo-vnr-dev.git branch=master HEAD=21ed2ca405b4644aa3d004b18de55916ca5f59f1
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:02.454619822Z caller=sync.go:476 component=cluster method=Sync cmd=apply args= count=26
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:02.760928728Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=306.230205ms err=null output="limitrange/mem-limit-range unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:02.956399309Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=195.396811ms err=null output="namespace/logging configured"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:03.112207105Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=155.732946ms err=null output="namespace/monitoring configured"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:03.262934787Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=150.657431ms err=null output="namespace/traefik configured"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:03.408942427Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=145.94606ms err=null output="service/fluxcloud unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:03.559561188Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=150.55773ms err=null output="secret/basic-auth unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:03.712094956Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=152.469907ms err=null output="limitrange/mem-limit-range unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:03.866221697Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=154.06782ms err=null output="limitrange/mem-limit-range unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:04.04815807Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=181.873353ms err=null output="limitrange/mem-limit-range unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:04.199906532Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=151.682861ms err=null output="limitrange/mem-limit-range unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:04.376437308Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=176.456207ms err=null output="limitrange/mem-limit-range unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:04.527354261Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=150.846673ms err=null output="limitrange/mem-limit-range unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:04.680857937Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=153.442025ms err=null output="deployment.extensions/fluxcloud unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:04.86042708Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=179.509863ms err=null output="helmrelease.flux.weave.works/cluster-autoscaler unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:05.00863459Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=148.14896ms err=null output="helmrelease.flux.weave.works/etcd-operator-traefik unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:05.189480774Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=180.775983ms err=null output="helmrelease.flux.weave.works/external-auth unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:05.36720594Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=177.661746ms err=null output="helmrelease.flux.weave.works/external-dns unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:05.512309763Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=145.010902ms err=null output="helmrele se.flux.weave.works/k8s-spot-termination-handler unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:05.665125663Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=152.75011ms err=null output="helmrelease.flux.weave.works/kube2iam unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:05.821228992Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=155.989547ms err=null output="helmrelease.flux.weave.works/kubernetes-dashboard unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:05.987318187Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=166.022245ms err=null output="helmrelease.flux.weave.works/metrics-server unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:06.138357622Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=150.971884ms err=null output="helmrelease.flux.weave.works/prometheus-operator unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:06.284161291Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=145.743899ms err=null output="storageclass.storage.k8s.io/sc1 configured"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:06.439472493Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=155.248351ms err=null output="storageclass.storage.k8s.io/st1 configured"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:06.588939583Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=149.39102ms err=null output="helmrelease.flux.weave.works/traefik unchanged"
flux/flux-84b6b6d7fd-fkbj6[flux]: ts=2019-06-11T15:41:06.774554739Z caller=sync.go:542 component=cluster method=Sync cmd="kubectl apply -f -" took=185.553695ms err=null output="etcdcluster.etcd.database.coreos.com/traefik-etcd unchanged"

Additional context
Add any other context about the problem here, e.g

  • Flux version: 1.12.3
  • Helm Operator version: 0.9.1
  • Kubernetes version: 1.12
  • Git provider: github
  • Container registry provider: ecr/dockerhub
@runningman84 runningman84 added blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels Jun 11, 2019
@hiddeco
Copy link
Member

hiddeco commented Jun 11, 2019

@runningman84 this is a nasty timing bug that has been fixed in master (#2103), as soon as the two outstanding PRs have been reviewed and merged a new release will be put out.

Until then you have two options:

  1. kick the pod
  2. switch to a prerelease image, the latest ones have the fix in it, e.g: docker.io/weaveworks/helm-operator-prerelease:master-0ef41643.

@stefanprodan stefanprodan removed the blocked-needs-validation Issue is waiting to be validated before we can proceed label Jun 11, 2019
@runningman84
Copy link
Author

thanks for the quick fix, it seems to work...

@hiddeco
Copy link
Member

hiddeco commented Jun 12, 2019

Closing this as it already has been fixed in master, I linked the PR with the fix to this issue so everything is neatly wired.

Thanks for the report @runningman84, although it was already fixed, the quality of it was good 🥇

@hiddeco hiddeco closed this as completed Jun 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants