Skip to content
This repository has been archived by the owner on Sep 21, 2020. It is now read-only.

GPII-3421 - Allow couchdb updates #179

Merged

Conversation

stepanstipl
Copy link
Contributor

This fixes CouchDB updates by removing Version part from StatefulSet label, as Kubernetes don't allow updates to any other fields but 'replicas', 'template', and 'updateStrategy' (https://issues.gpii.net/browse/GPII-3421).

Also adds force_update to the CouchDB and this will cause StatefulSet recreation, but without it we wouldn't be able to update, as the chart is already deployed with the Version in the label.

While this causes CouchDB downtime, the underlying volumes and therefore data are preserved. In future we should be carefull about updates that modify anything but the fields mentioned above.

From the tests on my cluster:
PVC and associated PVs before the update:

$ kubectl get pvc -n gpii
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
database-storage-couchdb-couchdb-0   Bound     pvc-a65ac690-c6f0-11e8-ae25-42010a800276   10Gi       RWO            standard       9d
database-storage-couchdb-couchdb-1   Bound     pvc-a65fa6d6-c6f0-11e8-ae25-42010a800276   10Gi       RWO            standard       9d`

after the update:

$ kubectl get statefulset -n gpii
NAME              DESIRED   CURRENT   AGE
couchdb-couchdb   2         2         4m

$ kubectl get pvc -n gpii
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
database-storage-couchdb-couchdb-0   Bound     pvc-a65ac690-c6f0-11e8-ae25-42010a800276   10Gi       RWO            standard       9d
database-storage-couchdb-couchdb-1   Bound     pvc-a65fa6d6-c6f0-11e8-ae25-42010a800276   10Gi       RWO            standard       9d

-> the volumes were not affected by re-creating the StatefulSet (this is how it should work :)).

Copy link
Contributor

@mrtyler mrtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CouchDB downtime is undesirable, of course, so I'd like to hear more about this problem.

Can we do anything to eliminate or reduce the downtime (e.g. change the deployment/update strategy, or something)? It's one thing to take downtime to upgrade to a new version of CouchDB; it's another to take downtime to give CouchDB more CPU/memory.

@@ -4,7 +4,7 @@ metadata:
name: {{ template "couchdb.fullname" . }}
labels:
app: {{ template "couchdb.name" . }}
chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand why you are changing this. Are there any consequences of this change (e.g. users of this chart are no longer able to deploy multiple CouchDBs to a given cluster, or something)?

Possibly related question: can this change be contributed back to upstream? If so, please do so before closing this ticket :). If not, I'd like to discuss the situation further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any negative consequences of this, Helm does no relly on this info. AFAIK the only result is that you'll be actually allowed to update the DB (as opposed to thinking it's been updated and Helm/TF silently ignoring that real state is not desired state, which I think is much more dangerous behavior - think about the security update we did to CouchDB recently).

@stepanstipl
Copy link
Contributor Author

@mrtyler thanks for the review. This PR is just addressing whether you can update the DB at all or not.

I'm looking into what can be done about minimizing/eliminating the downtime, as well as what can be done about Helm/TF ignoring the fact that actual state is not consistent with the desired state (mentioned in https://issues.gpii.net/browse/GPII-3421) and eventually will open different PRs for those 2 issues.

@mrtyler
Copy link
Contributor

mrtyler commented Oct 16, 2018

LGTM. Ok to merge this, but I don't think the ticket is complete until we've at least talked about the CouchDB downtime issue.

@stepanstipl
Copy link
Contributor Author

stepanstipl commented Oct 16, 2018

LGTM. Ok to merge this, but I don't think the ticket is complete until we've at least talked about the CouchDB downtime issue.

@mrtyler cheers, agreed, I'm not closing the ticket until I either resolve those 2 other issues mentioned with another PR or discuss that I've failed miserably.

@stepanstipl stepanstipl merged commit e1ca654 into gpii-ops:master Oct 16, 2018
@amatas
Copy link
Contributor

amatas commented Oct 16, 2018

About the upgrading seems a common issue in the charts repository: helm/charts#7726
LGTM , but I agree with keeping working on how to upgrade the CouchDB without downtime. I remember that it was possible in AWS with plain statefulsets.

@stepanstipl
Copy link
Contributor Author

Opened upstream PR - helm/charts#8527

@stepanstipl stepanstipl deleted the GPII-3421-allow-couchdb-updates branch March 21, 2019 14:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants