Use the database replica to read large tables when generating stats #1439
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Now that we have a database replica, we want to use it for some queries. Stats generation is one of the operations that historically has caused the database server to get slower, so use it for some queries.
I'm not sure about the best way to do the queries in this command. I've chosen a few of the larger tables, and manually selected the replica with the
.using()
method on a queryset.I didn't do it with all models - what do we think here? Should we only use it for the large tables, or consistently use it on all objects? The replica database will always be completely up-to-date.
Alternatively, we could use a database router and somehow indicate that we want to use the replica for all read queries. This could be as simple as this command setting an environment variable, and us having a router that checks this environment variable and uses the replica database for all read queries.
Feedback welcome...
Deployment steps:
Needs local_settings in production to configure the
ro_replica
database