Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detach a Postgres cluster from the operator temporarily #421

Closed
Jan-M opened this issue Nov 23, 2018 · 5 comments
Closed

Detach a Postgres cluster from the operator temporarily #421

Jan-M opened this issue Nov 23, 2018 · 5 comments

Comments

@Jan-M
Copy link
Member

Jan-M commented Nov 23, 2018

We were facing a situation where we needed to edit the statefulset of one cluster manually. This does not work while the operator is running and syncing objects.

We solved this by scaling operator to 0, but this is not ideal.

Suggest we look into the manifest to add something like:

sync: false
detach: true
maintenance: true

Or similar.

@Jan-M
Copy link
Member Author

Jan-M commented Dec 3, 2018

This change may need to be observed out of order, not sure how this could be done. But ideally it does not first apply all kind of pending changes on this cluster but stops sync immediately.

In first round though this could always be achieved with manual operator restart.

@valer-cara
Copy link
Contributor

👍 Similar issue here. I have a 3-node cluster, and the first one is down because I got some EBS errors on an AWS M5 machine which rendered the volume read-only (dmesg here).

I wanted to unmount the volume and do a fsck on it, however to do that i'll need to stop the operator, delete the sts with --cascade=false to avoid pod re-creating, delete the affected pod & manually start debugging the volume. Finally restarting the operator would recreate the STS & the downed pod.

A maintenance feature per cluster would be most useful.


PS: for issues that don't require unmounting volumes one can simply use patronictl -c ./postgres.yaml pause to stop patroni temporarily and do whatever's needed. more here

@valer-cara
Copy link
Contributor

So maybe there are multiple levels of maintenance that can be expressed in the cluster manifest?

maintenance:
  mode: paused
  • pause: where patroni is paused on the cluster
  • detach: where the operator doesn't monitor k8s resources anylonger
  • full: all the above?

Not sure if this is the best approach or naming, just trying to underline the types of "detachment" out there.

@sdudoladov
Copy link
Member

one special case to watch out is the incomplete rolling update

  1. The operator starts the rolling update of the cluster
  2. the first replica gets restarted but for whatever reason cannot join the cluster again
  3. this blocks the rolling upgrade

@FxKu
Copy link
Member

FxKu commented Mar 19, 2020

With #802 merged this could be realized via acid.zalan.do/conrtoller annotations. So far, it's manual process. Closing for now.

@FxKu FxKu closed this as completed Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants