GPII-3421 - Allow couchdb updates #179

stepanstipl · 2018-10-12T13:35:28Z

This fixes CouchDB updates by removing Version part from StatefulSet label, as Kubernetes don't allow updates to any other fields but 'replicas', 'template', and 'updateStrategy' (https://issues.gpii.net/browse/GPII-3421).

Also adds force_update to the CouchDB and this will cause StatefulSet recreation, but without it we wouldn't be able to update, as the chart is already deployed with the Version in the label.

While this causes CouchDB downtime, the underlying volumes and therefore data are preserved. In future we should be carefull about updates that modify anything but the fields mentioned above.

From the tests on my cluster:
PVC and associated PVs before the update:

$ kubectl get pvc -n gpii
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
database-storage-couchdb-couchdb-0   Bound     pvc-a65ac690-c6f0-11e8-ae25-42010a800276   10Gi       RWO            standard       9d
database-storage-couchdb-couchdb-1   Bound     pvc-a65fa6d6-c6f0-11e8-ae25-42010a800276   10Gi       RWO            standard       9d`

after the update:

$ kubectl get statefulset -n gpii
NAME              DESIRED   CURRENT   AGE
couchdb-couchdb   2         2         4m

$ kubectl get pvc -n gpii
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
database-storage-couchdb-couchdb-0   Bound     pvc-a65ac690-c6f0-11e8-ae25-42010a800276   10Gi       RWO            standard       9d
database-storage-couchdb-couchdb-1   Bound     pvc-a65fa6d6-c6f0-11e8-ae25-42010a800276   10Gi       RWO            standard       9d

-> the volumes were not affected by re-creating the StatefulSet (this is how it should work :)).

…updates

mrtyler

CouchDB downtime is undesirable, of course, so I'd like to hear more about this problem.

Can we do anything to eliminate or reduce the downtime (e.g. change the deployment/update strategy, or something)? It's one thing to take downtime to upgrade to a new version of CouchDB; it's another to take downtime to give CouchDB more CPU/memory.

mrtyler · 2018-10-16T05:36:31Z

shared/charts/couchdb/templates/statefulset.yaml

@@ -4,7 +4,7 @@ metadata:
  name: {{ template "couchdb.fullname" . }}
  labels:
    app: {{ template "couchdb.name" . }}
-    chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}


I think I understand why you are changing this. Are there any consequences of this change (e.g. users of this chart are no longer able to deploy multiple CouchDBs to a given cluster, or something)?

Possibly related question: can this change be contributed back to upstream? If so, please do so before closing this ticket :). If not, I'd like to discuss the situation further.

I'm not aware of any negative consequences of this, Helm does no relly on this info. AFAIK the only result is that you'll be actually allowed to update the DB (as opposed to thinking it's been updated and Helm/TF silently ignoring that real state is not desired state, which I think is much more dangerous behavior - think about the security update we did to CouchDB recently).

stepanstipl · 2018-10-16T13:47:59Z

@mrtyler thanks for the review. This PR is just addressing whether you can update the DB at all or not.

I'm looking into what can be done about minimizing/eliminating the downtime, as well as what can be done about Helm/TF ignoring the fact that actual state is not consistent with the desired state (mentioned in https://issues.gpii.net/browse/GPII-3421) and eventually will open different PRs for those 2 issues.

mrtyler · 2018-10-16T16:09:02Z

LGTM. Ok to merge this, but I don't think the ticket is complete until we've at least talked about the CouchDB downtime issue.

stepanstipl · 2018-10-16T16:18:24Z

LGTM. Ok to merge this, but I don't think the ticket is complete until we've at least talked about the CouchDB downtime issue.

@mrtyler cheers, agreed, I'm not closing the ticket until I either resolve those 2 other issues mentioned with another PR or discuss that I've failed miserably.

amatas · 2018-10-16T17:36:16Z

About the upgrading seems a common issue in the charts repository: helm/charts#7726
LGTM , but I agree with keeping working on how to upgrade the CouchDB without downtime. I remember that it was possible in AWS with plain statefulsets.

stepanstipl · 2018-10-17T14:02:59Z

Opened upstream PR - helm/charts#8527

stepanstipl added 2 commits October 12, 2018 14:30

Add force_udpate to couchdb Helm release

ddc6763

Remove chart version from CouchDB statefulSet - this prevents any ss …

28efcc1

…updates

stepanstipl requested review from amatas, natarajaya and mrtyler October 12, 2018 13:35

mrtyler reviewed Oct 16, 2018

View reviewed changes

stepanstipl merged commit e1ca654 into gpii-ops:master Oct 16, 2018

stepanstipl deleted the GPII-3421-allow-couchdb-updates branch March 21, 2019 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPII-3421 - Allow couchdb updates #179

GPII-3421 - Allow couchdb updates #179

stepanstipl commented Oct 12, 2018

mrtyler left a comment

mrtyler Oct 16, 2018

stepanstipl Oct 16, 2018

stepanstipl commented Oct 16, 2018

mrtyler commented Oct 16, 2018

stepanstipl commented Oct 16, 2018 •

edited

Loading

amatas commented Oct 16, 2018

stepanstipl commented Oct 17, 2018

GPII-3421 - Allow couchdb updates #179

GPII-3421 - Allow couchdb updates #179

Conversation

stepanstipl commented Oct 12, 2018

mrtyler left a comment

Choose a reason for hiding this comment

mrtyler Oct 16, 2018

Choose a reason for hiding this comment

stepanstipl Oct 16, 2018

Choose a reason for hiding this comment

stepanstipl commented Oct 16, 2018

mrtyler commented Oct 16, 2018

stepanstipl commented Oct 16, 2018 • edited Loading

amatas commented Oct 16, 2018

stepanstipl commented Oct 17, 2018

stepanstipl commented Oct 16, 2018 •

edited

Loading