Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digital Ocean Managed Postgre Primary Failover #1026

Closed
bjg2 opened this issue Feb 9, 2021 · 4 comments
Closed

Digital Ocean Managed Postgre Primary Failover #1026

bjg2 opened this issue Feb 9, 2021 · 4 comments

Comments

@bjg2
Copy link

bjg2 commented Feb 9, 2021

Hey everyone!

In my org, we're using Digital Ocean Managed Postgre, and we're having a database setup of 1 primary and 2 standbys. Digital Ocean does not provide hosts for all 3 nodes, just one host that always points to the primary node (something like XXX.a.db.ondigitalocean.com). Sometimes, by design, primary failover happens, so that primary node becomes secondary and some secondary becomes primary.

Problem is, when that happens, connections seem to remain open with the node that was primary and just became secondary. Those connections now forever start erroring out with write tcp YYY->ZZZ: write: connection timed out, and never recover.

Mitigation I came up with was db.SetConnMaxLifetime(time.Minute), but that's not ideal. Is there any better way around this problem at the moment?

PS: I saw a similar issue here #683, but I don't think it applies to our problem, as we do not have multiple hosts provided, just one host string, and that host points to the current primary.

@bjg2 bjg2 changed the title Digital Ocean Managed Postgre Failover Digital Ocean Managed Postgre Primary Failover Feb 9, 2021
@Lekensteyn
Copy link

Possibly a duplicate of #835.

@bjg2
Copy link
Author

bjg2 commented Jun 20, 2022

We need to find a way to mitigate this better, as we noticed that conn lifetime has really bad perf implications. Should we expect that this issue is fixed by #1013 ? It is very hard to test this, as DO does not provide an interface for triggering primary failover...

@zak905
Copy link

zak905 commented Feb 26, 2024

any updates ?
@bjg2 I am interested to know if you have found new ways to handle this issue ?

@bjg2
Copy link
Author

bjg2 commented Feb 27, 2024

I think we did not have that issue for a long time, I guess pq patched the issue with linked fix above.

@bjg2 bjg2 closed this as completed Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants