Detach a Postgres cluster from the operator temporarily #421

Jan-M · 2018-11-23T16:58:48Z

We were facing a situation where we needed to edit the statefulset of one cluster manually. This does not work while the operator is running and syncing objects.

We solved this by scaling operator to 0, but this is not ideal.

Suggest we look into the manifest to add something like:

sync: false
detach: true
maintenance: true

Or similar.

Jan-M · 2018-12-03T15:43:50Z

This change may need to be observed out of order, not sure how this could be done. But ideally it does not first apply all kind of pending changes on this cluster but stops sync immediately.

In first round though this could always be achieved with manual operator restart.

valer-cara · 2018-12-10T12:37:17Z

👍 Similar issue here. I have a 3-node cluster, and the first one is down because I got some EBS errors on an AWS M5 machine which rendered the volume read-only (dmesg here).

I wanted to unmount the volume and do a fsck on it, however to do that i'll need to stop the operator, delete the sts with --cascade=false to avoid pod re-creating, delete the affected pod & manually start debugging the volume. Finally restarting the operator would recreate the STS & the downed pod.

A maintenance feature per cluster would be most useful.

PS: for issues that don't require unmounting volumes one can simply use patronictl -c ./postgres.yaml pause to stop patroni temporarily and do whatever's needed. more here

valer-cara · 2018-12-10T12:41:15Z

So maybe there are multiple levels of maintenance that can be expressed in the cluster manifest?

maintenance:
  mode: paused

pause: where patroni is paused on the cluster
detach: where the operator doesn't monitor k8s resources anylonger
full: all the above?

Not sure if this is the best approach or naming, just trying to underline the types of "detachment" out there.

sdudoladov · 2019-03-15T09:44:16Z

one special case to watch out is the incomplete rolling update

The operator starts the rolling update of the cluster
the first replica gets restarted but for whatever reason cannot join the cluster again
this blocks the rolling upgrade

FxKu · 2020-03-19T19:14:52Z

With #802 merged this could be realized via acid.zalan.do/conrtoller annotations. So far, it's manual process. Closing for now.

Jan-M added enhancement idea labels Nov 23, 2018

FxKu mentioned this issue Feb 24, 2020

define ownership between operator and clusters via annotation #802

Merged

FxKu closed this as completed Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detach a Postgres cluster from the operator temporarily #421

Detach a Postgres cluster from the operator temporarily #421

Jan-M commented Nov 23, 2018

Jan-M commented Dec 3, 2018

valer-cara commented Dec 10, 2018

valer-cara commented Dec 10, 2018

sdudoladov commented Mar 15, 2019

FxKu commented Mar 19, 2020

Detach a Postgres cluster from the operator temporarily #421

Detach a Postgres cluster from the operator temporarily #421

Comments

Jan-M commented Nov 23, 2018

Jan-M commented Dec 3, 2018

valer-cara commented Dec 10, 2018

valer-cara commented Dec 10, 2018

sdudoladov commented Mar 15, 2019

FxKu commented Mar 19, 2020