Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Can't release new version of app, not committing it to git #2324

Closed
yellowmegaman opened this issue Aug 5, 2019 · 18 comments
Closed

Can't release new version of app, not committing it to git #2324

yellowmegaman opened this issue Aug 5, 2019 · 18 comments
Labels
bug integrations/kustomize Kustomize related issues and PRs

Comments

@yellowmegaman
Copy link

Describe the bug
Can't perform fluxctl release for new release, not yet committed to git

To Reproduce
Steps to reproduce the behaviour:
0. GKE + flux

  1. deploy any app via flux/git
  2. build new version of your app
  3. list images of your app via fluxctl
  4. fluxctl release new tag

Expected behavior
Change to new tag, add tag to git by fluxd

Logs

$ fluxctl list-images --workload backend:statefulset/backend
E0805 18:46:19.076371    6706 portforward.go:385] error copying from local connection to remote stream: read tcp4 127.0.0.1:39745->127.0.0.1:38966: read: connection reset by peer
WORKLOAD                     CONTAINER                    IMAGE                                                                      CREATED
backend:statefulset/backend  backend                      gcr.io/titanium-messenger-001/mmts-backend                                 
                                                          |   k8s-master-7a7a4419fbca5e3c9dd307443b7fac0de1f5a6d3                    05 Aug 19 09:13 UTC
                                                          |   k8s-WEL-220-futures-to-tasks-76a18b9b071b5b80d2f7d4f45565ac53cd6c9422  05 Aug 19 07:16 UTC
                                                          |   k8s-WEL-220-futures-to-tasks-3a5d445a8c57b22832316def36e5ec611a261248  05 Aug 19 07:13 UTC
                                                          |   k8s-WEL-220-futures-to-tasks-22eb692819799d733c80efb575f89278c6a511bc  05 Aug 19 07:12 UTC
                                                          |   k8s-WEL-220-futures-to-tasks-0e1d936ae8dae40638475dd0c5155a204d64345a  05 Aug 19 07:03 UTC
                                                          |   k8s-latest                                                             02 Aug 19 11:54 UTC
                                                          |   k8s-master-40f8de44031ab06b2aec060cf603e38a65a44803                    02 Aug 19 11:54 UTC
                                                          '-> k8s-master-1d85e954134a0b576c8e3715a69d40a7cd362573                    02 Aug 19 11:27 UTC
                                                              k8s-fix-program-durations-26e19d3426245d17864f2a9487913f0796c876ea     01 Aug 19 10:42 UTC
                                                              k8s-master-41e13458a943122e70ccdeb28c510271bc30a1fd                    31 Jul 19 12:36 UTC
                             check-postgres-availability  sorintlab/stolon                                                           
                                                          |   v0.14.0-pg11                                                           31 Jul 19 09:42 UTC
                                                          |   v0.14.0-pg10                                                           31 Jul 19 09:42 UTC
                                                          |   v0.14.0-pg9.6                                                          31 Jul 19 09:42 UTC
                                                          |   v0.14.0-pg9.5                                                          31 Jul 19 09:42 UTC
                                                          |   v0.14.0-pg9.4                                                          31 Jul 19 09:42 UTC
                                                          |   master-pg9.4                                                           31 Jul 19 09:36 UTC
                                                          |   master-pg9.5                                                           31 Jul 19 09:36 UTC
                                                          |   master-pg9.6                                                           31 Jul 19 09:36 UTC
                                                          '-> master-pg10                                                            31 Jul 19 09:36 UTC
                                                              master-pg11                                                            31 Jul 19 09:35 UTC
$ fluxctl release --workload=backend:statefulset/backend --update-image=gcr.io/titanium-messenger-001/mmts-backend:k8s-master-7a7a4419fbca5e3c9dd307443b7fac0de1f5a6d3
Submitting release ...
Error: verifying changes: failed to verify changes: the image for container "backend" in resource "backend:statefulset/backend" should be "gcr.io/titanium-messenger-001/mmts-backend:k8s-master-7a7a4419fbca5e3c9dd307443b7fac0de1f5a6d3", but is "gcr.io/titanium-messenger-001/mmts-backend:k8s-master-1d85e954134a0b576c8e3715a69d40a7cd362573"
Run 'fluxctl release --help' for usage.
$ k logs -f deploy/flux
ts=2019-08-05T15:46:32.513445458Z caller=loop.go:119 component=sync-loop jobID=7a7325ed-234d-87a5-7b3e-82ab5781c465 state=in-progress
ts=2019-08-05T15:46:35.070573653Z caller=releaser.go:59 component=sync-loop jobID=7a7325ed-234d-87a5-7b3e-82ab5781c465 type=release updates=1
ts=2019-08-05T15:46:36.968192639Z caller=loop.go:129 component=sync-loop jobID=7a7325ed-234d-87a5-7b3e-82ab5781c465 state=done success=false err="verifying changes: failed to verify changes: the image for container \"backend\" in resource \"backend:statefulset/backend\" should be \"gcr.io/titanium-messenger-001/mmts-backend:k8s-master-7a7a4419fbca5e3c9dd307443b7fac0de1f5a6d3\", but is \"gcr.io/titanium-messenger-001/mmts-backend:k8s-master-1d85e954134a0b576c8e3715a69d40a7cd362573\""
ts=2019-08-05T15:47:07.863810739Z caller=loop.go:111 component=sync-loop event=refreshed url=git@github.com:oktossm/gitops.git branch=master HEAD=17ec311f1391b321f1e4af855ee2a86861e928d5

Additional context
Add any other context about the problem here, e.g

  • Flux version: 1.13.1
  • Fluxctl version: 1.13.1
  • Kubernetes version: 1.13.7-gke.8
  • Git provider: github.com
  • Container registry provider: gcr.io

Thanks a bunch in advance!

@yellowmegaman yellowmegaman added blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels Aug 5, 2019
@squaremo
Copy link
Member

This kind of problem is occasionally due to the YAML update program not coping with a particular file. Do you mind posting (or gisting) the YAML file in question?

@yellowmegaman
Copy link
Author

Sure! but gonna submit another one, with same issues:

$ fluxctl list-images --workload stolon:deployment/stolon-proxy
WORKLOAD                        CONTAINER     IMAGE              CREATED
stolon:deployment/stolon-proxy  stolon-proxy  sorintlab/stolon   
                                              |   v0.14.0-pg11   31 Jul 19 09:42 UTC
                                              |   v0.14.0-pg10   31 Jul 19 09:42 UTC
                                              |   v0.14.0-pg9.6  31 Jul 19 09:42 UTC
                                              |   v0.14.0-pg9.5  31 Jul 19 09:42 UTC
                                              |   v0.14.0-pg9.4  31 Jul 19 09:42 UTC
                                              |   master-pg9.4   31 Jul 19 09:36 UTC
                                              |   master-pg9.5   31 Jul 19 09:36 UTC
                                              |   master-pg9.6   31 Jul 19 09:36 UTC
                                              '-> master-pg10    31 Jul 19 09:36 UTC
                                                  master-pg11    31 Jul 19 09:35 UTC
$
$ fluxctl release --workload=stolon:deployment/stolon-proxy --update-image=sorintlab/stolon:v0.14.0-pg10
Submitting release ...
Error: verifying changes: failed to verify changes: the image for container "stolon-proxy" in resource "stolon:deployment/stolon-proxy" should be "sorintlab/stolon:v0.14.0-pg10", but is "sorintlab/stolon:master-pg10"
Run 'fluxctl release --help' for usage.
$

stolon-proxy-deployment.yaml:

---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: stolon-proxy
  namespace: stolon
spec:
  replicas: 2
  template:
    metadata:
      labels:
        component: stolon-proxy
        stolon-cluster: kube-stolon
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      containers:
        - name: stolon-proxy
          image: sorintlab/stolon:master-pg10
          command:
            - "/bin/bash"
            - "-ec"
            - |
              exec gosu stolon stolon-proxy
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: STPROXY_CLUSTER_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['stolon-cluster']
            - name: STPROXY_STORE_BACKEND
              value: "kubernetes"
            - name: STPROXY_KUBE_RESOURCE_KIND
              value: "configmap"
            - name: STPROXY_LISTEN_ADDRESS
              value: "0.0.0.0"
            - name: STPROXY_METRICS_LISTEN_ADDRESS
              value: "0.0.0.0:8080"
          ports:
            - name: postgres
              containerPort: 5432
            - name: proxy
              containerPort: 8080
          readinessProbe:
            tcpSocket:
              port: postgres
            initialDelaySeconds: 10
            timeoutSeconds: 5
          livenessProbe:
            tcpSocket:
              port: postgres
            initialDelaySeconds: 10
            timeoutSeconds: 5
          resources:
            limits:
              cpu: "100m"
              memory: 128Mi
            requests:
              cpu: "100m"
              memory: 128Mi

BTW We do have manifest-generation=true and sync-garbage-collection

@yellowmegaman
Copy link
Author

Can it be possible that error is more high-level? Don't think all my yaml's are broken, since they are checked with yamllint + kubeval.

$ fluxctl list-images --workload kurento:deployment/kurento
WORKLOAD                    CONTAINER  IMAGE                         CREATED
kurento:deployment/kurento  kurento    kurento/kurento-media-server  
                                       |   6                         19 Jul 19 19:15 UTC
                                       |   6.11                      19 Jul 19 19:15 UTC
                                       |   6.11.0                    19 Jul 19 19:15 UTC
                                       |   6.11.0-20190719195344     19 Jul 19 19:15 UTC
                                       |   latest                    19 Jul 19 19:15 UTC
                                       '-> 6.10                      04 Apr 19 13:15 UTC
                                           6.10.0                    04 Apr 19 13:15 UTC
                                           6.10.0-20190404150939     04 Apr 19 13:15 UTC
                                           6.9.0                     19 Dec 18 13:09 UTC
                                           6.9.0-20181219            19 Dec 18 13:09 UTC
$ fluxctl release --workload=kurento:deployment/kurento --update-image=kurento/kurento-media-server:6.11.0
Submitting release ...
Error: verifying changes: failed to verify changes: the image for container "kurento" in resource "kurento:deployment/kurento" should be "kurento/kurento-media-server:6.11.0", but is "kurento/kurento-media-server:6.10"
Run 'fluxctl release --help' for usage.
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: kurento
  namespace: kurento
  labels:
    app: kurento
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kurento
  template:
    metadata:
      labels:
        app: kurento
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - kurento
              topologyKey: kubernetes.io/hostname
      terminationGracePeriodSeconds: 1800
      containers:
        - name: kurento
          image: kurento/kurento-media-server:6.10
          args:
            - "--modules-config-path=/etc/kurento-modules/..data"
          imagePullPolicy: Always
          ports:
            - containerPort: 8888
              name: intra-node
          resources:
            limits:
              cpu: "100m"
              memory: 256Mi
            requests:
              cpu: "100m"
              memory: 256Mi
          readinessProbe:
            exec:
              command:
                - pgrep
                - kurento
            initialDelaySeconds: 30
            timeoutSeconds: 5
          livenessProbe:
            tcpSocket:
              port: intra-node
            initialDelaySeconds: 30
            timeoutSeconds: 5
          volumeMounts:
            - name: kurento-conf-json
              mountPath: "/etc/kurento"
            - name: kurento-modules
              mountPath: "/etc/kurento-modules"
      volumes:
        - name: kurento-conf-json
          configMap:
            name: kurento-conf-json
        - name: kurento-modules
          configMap:
            name: kurento-modules

@yellowmegaman
Copy link
Author

Created new cluster on GKE with flux.
Deployed only kurento, full deployment yaml is here: https://gist.github.com/yellowmegaman/5847e9f79ce783cccb5b50d77fda9b4e

Tried to update it with same result.
Full flux args:

image = "docker.io/weaveworks/flux:1.13.1"
args  = ["--memcached-service=", "--git-timeout=100s", "--ssh-keygen-dir=/var/fluxd/keygen", "--git-url=git@github.com:oktossm/gitops.git", "--git-branch=${var.cluster_name}", "--listen-metrics=:3031", "--git-poll-interval=3m0s", "--sync-interval=3m0s", "--sync-garbage-collection", "--git-path=defaultcluster", "--git-ci-skip-message=[SKIP CI]", "--git-label=flux${var.cluster_name}", "--manifest-generation=true"]

@stefanprodan
Copy link
Member

If you place the deployment spec in its own file without any separators, does it work?

@yellowmegaman
Copy link
Author

yellowmegaman commented Aug 16, 2019

@stefanprodan ok that's what I did:

  • removed flux deployment from cluster
  • leaved kurento namespace in namespaces folder
  • put together all kurento resources in workloads/kurento/kurento-Combined.yaml
  • committed everything to git
  • removed all kurento resources from cluster and waited till kurento namespace is completely terminated
  • deployed flux back to the cluster
  • got same problems

If I remove ---, then kubeval check won't pass, and kubectl won't accept that too.

error: error validating "workloads/kurento/kurento-Combined.yaml": error validating data: ValidationError(Service): unknown field "data" in io.k8s.api.core.v1.Service; if you choose to ignore these errors, turn validation off with --validate=false```

@yellowmegaman
Copy link
Author

@stefanprodan Just understood what you actually meant. Previously, when just reported, all resources were in their own separate files in workloads/kurento folder.
But I had to use separators because I use manifest-generation=true, and according to docs, flux just read STDIN and there have to separated, since they are all merged in STDIN in one thing

@yellowmegaman
Copy link
Author

Here is my .flux.yaml

$ cat .flux.yaml 
---
version: 1
commandUpdated:
  generators:
    - command: >-
        cat namespaces/* | sed 's/envplaceholder/'$ENVNAME'/g'
    - command: >-
        cd workloads && cat */* | sed 's/envplaceholder/'$ENVNAME'/g' | sed 's/lbipplaceholder/'$LBIP'/g' | sed 's/domainplaceholder/'$ENVDOMAIN'/g'

It is used to achieve basic templating with env variables supplied to flux container.

@hiddeco
Copy link
Member

hiddeco commented Aug 16, 2019

Was able to reproduce the issue with the plain deployment manifest from #2324 (comment). Not precisely clear yet what goes wrong, except for that it happens during the calculation of updates.

But succeeded in releasing a lean deployment. @yellowmegaman are the workloads you are trying to release in a healthy state?

@hiddeco hiddeco added ☠ high user impact and removed blocked-needs-validation Issue is waiting to be validated before we can proceed ☠ high user impact labels Aug 16, 2019
@yellowmegaman
Copy link
Author

@hiddeco sorry, didn't see comment update. Yeah, they are 100% healthy.

@tobbbles
Copy link

tobbbles commented Oct 7, 2019

I'm now encountering this issue too, relevant info posted below that's causing this. Unfortunately the image is in a private repo using the commit SHA, so I cannot share

Flux version: 1.14.2
--sync-garbage-collection is on.


Command

fluxctl release --force --workload=redoc:deployment/api-docs -i eu.gcr.io/project/folder/api-spec:1c2884e26f1f5184796b67116665c6f6b8cc1671 --k8s-fwd-ns flux

Logs

ts=2019-10-07T14:11:27.031542995Z caller=images.go:111 component=sync-loop workload=redoc:deployment/api-docs container=redoc repo=eu.gcr.io/project/folder/api-spec pattern=glob:* current=eu.gcr.io/project/folder/api-spec:3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251 info="added update to automation run" new=eu.gcr.io/project/folder/api-spec:1c2884e26f1f5184796b67116665c6f6b8cc1671 reason="latest 1c2884e26f1f5184796b67116665c6f6b8cc1671 (2019-10-07 11:39:58.311378805 +0000 UTC) > current 3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251 (2019-10-07 09:46:36.468467058 +0000 UTC)"
...
ts=2019-10-07T14:15:51.986109515Z caller=loop.go:144 component=sync-loop jobID=be2b89b3-bea6-6d32-bb77-5d6321dda3d3 state=done success=false err="verifying changes: failed to verify changes: the image for container "redoc" in resource "redoc:deployment/api-docs" should be "eu.gcr.io/project/folder/api-spec:1c2884e26f1f5184796b67116665c6f6b8cc1671", but is "eu.gcr.io/project/folder/api-spec:3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251""

deployment.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-docs
  labels:
    app: api-docs
  annotations:
    fluxcd.io/automated: "true"
spec:
  strategy:
    rollingUpdate:
      maxUnavailable: 0
    type: RollingUpdate
  selector:
    matchLabels:
      app: api-docs
  template:
    metadata:
      labels:
        app: api-docs
    spec:
      containers:
        - name: redoc
          image: eu.gcr.io/project/folder/api-spec:3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251
          resources:
            limits:
              cpu: 100m
              memory: 128Mi
            requests:
              cpu: 10m
              memory: 64Mi

.flux.yaml

version: 1
commandUpdated:
  generators:
    - command: kustomize build .
  patchFile: patch.yaml

List images

$ fluxctl list-images --workload=redoc:deployment/api-docs --k8s-fwd-ns flux
WORKLOAD                                     CONTAINER     IMAGE                                         CREATED
legacy-api:deployment/api-docs  redoc         eu.gcr.io/project/folder/api-spec
                                                           |   1c2884e26f1f5184796b67116665c6f6b8cc1671  07 Oct 19 11:39 UTC
                                                           |   3751eed958e151b091da864ee808ccf7a22bb773  07 Oct 19 11:30 UTC
                                                           '-> 3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251  07 Oct 19 09:46 UTC
                                                               2820bbd012208ebc0e6c089e073ba9215890ca4b  27 Sep 19 10:21 UTC

List workloads

$ fluxctl list-workloads -a  --k8s-fwd-ns flux  | grep redoc
redoc:deployment/api-docs       redoc                       eu.gcr.io/project/folder/api-spec:3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251  ready    automated

@sunnoy
Copy link

sunnoy commented Oct 24, 2019

i have meet this 。is it resolved?

@tobbbles
Copy link

I see the error is sourced from

if beforeContainers[i].Image != afterContainers[i].Image {
return verificationError("the image for container %q in resource %q should be %q, but is %q", beforeContainers[i].Name, id, beforeContainers[i].Image.String(), afterContainers[i].Image.String())
}

But surely when releasing, the image is expected to change?

@hiddeco
Copy link
Member

hiddeco commented Oct 25, 2019

@tobbbles kubeyaml, the Python tool we use to selectively patch the YAML doesn't work well when there is no namespace present in the resource Flux is trying to write changes to. I expect things to work when you resolve this(, and we should adjust kubeyaml so that it works properly without a namespace present).

@hiddeco hiddeco added the integrations/kustomize Kustomize related issues and PRs label Oct 29, 2019
@squaremo
Copy link
Member

@tobbbles In your case, the release is failing (at least) because you have commandUpdated in .flux.yaml, rather than patchUpdated, so the patch file is ignored.

This issue is labelled integrations/kustomize because the fear was that kustomize configurations will often have manifests missing a namespace, with the namespace added later by a kustomization. And this is certainly a problem if you use commandUpdated, because it will either try to operate on the base files (no namespace) or it'll just not attempt any updates to files, thus the complaint about no changes. The problem is really that commandUpdated isn't suited to working with kustomize configurations, unless you build your own patches in an update command.

For the purpose of making it work with kustomize better, I can offer a couple of things:

  • .flux.yaml files could be checked against a schema, so that problems like the mismatched commandUpdated/patchFile are exposed
  • if there are no update commands given, fluxd should refuse update operations (if it doesn't already)

@squaremo
Copy link
Member

the release is failing (at least) because you have commandUpdated in .flux.yaml

@yellowmegaman You appear to have a similar problem: you haven't supplied any commands to update the files, so when fluxd tries to change do a release, nothing gets changed. The second point above will make this more obvious: if there are no update commands given, fluxd should refuse update operations.

@tobbbles
Copy link

tobbbles commented Nov 1, 2019

@squaremo Thank you; that was absolutely the issue, and using patchUpdated resolved all of the release issues I was experiencing.

I think being able to validate .flux.yaml (potentially from fluxctl) would be very valuable, especially for new users such as myself.

@kingdonb
Copy link
Member

Since .flux.yaml is not a part of future Flux versions, we have this migration discussion on the Flux v2 repo:

Further .flux.yaml experience improvements are not likely to be landed in Flux v1.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug integrations/kustomize Kustomize related issues and PRs
Projects
None yet
Development

No branches or pull requests

7 participants