Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the database replica to read large tables when generating stats #1439

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alastair
Copy link
Member

@alastair alastair commented May 20, 2020

Description
Now that we have a database replica, we want to use it for some queries. Stats generation is one of the operations that historically has caused the database server to get slower, so use it for some queries.

I'm not sure about the best way to do the queries in this command. I've chosen a few of the larger tables, and manually selected the replica with the .using() method on a queryset.
I didn't do it with all models - what do we think here? Should we only use it for the large tables, or consistently use it on all objects? The replica database will always be completely up-to-date.

Alternatively, we could use a database router and somehow indicate that we want to use the replica for all read queries. This could be as simple as this command setting an environment variable, and us having a router that checks this environment variable and uses the replica database for all read queries.

Feedback welcome...

Deployment steps:
Needs local_settings in production to configure the ro_replica database

@alastair alastair requested a review from ffont May 20, 2020 17:59
@alastair
Copy link
Member Author

alastair commented Jun 5, 2020

There's a problem with this setup:
https://stackoverflow.com/questions/14592436/postgresql-error-canceling-statement-due-to-conflict-with-recovery

We need to work out how to prevent these queries from being cancelled. The solution of hot_standby_feedback = on or max_standby_streaming_delay = -1 are probably what we need to set.

@ffont
Copy link
Member

ffont commented Jun 10, 2020

Looks like max_standby_streaming_delay = -1 might be a better option.

About the way to configure which DB to use, I like the "manual" using way. I think we should only reserve the use of the replica to very specific queries (at least as a start), so it is fine to do it rather manually and controlled by a settings parameter like you did.

@alastair
Copy link
Member Author

@ffont what do you think about this? We've really had no slowdown issues since we did our migrations and fixes to the fileserver. In this case, I kind of think that we should forget this PR and stay with the single database

@ffont
Copy link
Member

ffont commented Mar 17, 2023

Yeah, considering that our main issues are not related to the DB structure but with the filesystem, I think it is good to keep things as they are and not add more complexity for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants