-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check and reconnect method to Database class #466
base: master
Are you sure you want to change the base?
Conversation
Hmmm. I thought about this some more last night and am not sure this is the best way forward. Issue: Dislike of this fix: Desire: The backend manager currently dies. At minimum, if this was captured and waited, 4CAT would generally be fine. All currently running workers also crash including the API (the only always running worker) and there is no mechanism to restart them except via a restart (perhaps that could be changed). The frontend cannot re-establish a connection. It does not actually crash, but even when the database is back up and could accept new connections, the frontend has no way of doing this. The error changes from unable to connect to a stale connection ad infinium and needs to be restarted. Possible better solutions: Backend... We could add to the manager to handle that disconnect. But maybe there is a better want to have any worker with a db disconnect wait. It could also check for self.interrupted to exit if needed. Logging may need to be reviewed. |
Some things I learned:
If we desire, we could use a connection pool and, instead of creating a new connection with every worker, share the Database class and then use a connection pool in, for example, the |
…retry them after reconnect
Refactored so instead of testing a query, it reconnects on any failed query. I tested disconnecting the database for both front and backend and believe I worked through any issues. |
This would allow the Database class to reconnect on SSL errors or recover if the connection was closed. If we run a bad query it will not recover from
psycopg2.errors.InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block
and require a rollback.I tested it by shutting down the database, attempting to run some processes (which obviously failed), and then restarting the database. It seemed to recover just fine (well, any processors that failed would not restart until we restart the backend, but the frontend recovered fine).