Use the database replica to read large tables when generating stats #1439

alastair · 2020-05-20T17:59:27Z

Description
Now that we have a database replica, we want to use it for some queries. Stats generation is one of the operations that historically has caused the database server to get slower, so use it for some queries.

I'm not sure about the best way to do the queries in this command. I've chosen a few of the larger tables, and manually selected the replica with the .using() method on a queryset.
I didn't do it with all models - what do we think here? Should we only use it for the large tables, or consistently use it on all objects? The replica database will always be completely up-to-date.

Alternatively, we could use a database router and somehow indicate that we want to use the replica for all read queries. This could be as simple as this command setting an environment variable, and us having a router that checks this environment variable and uses the replica database for all read queries.

Feedback welcome...

Deployment steps:
Needs local_settings in production to configure the ro_replica database

alastair · 2020-06-05T15:55:54Z

There's a problem with this setup:
https://stackoverflow.com/questions/14592436/postgresql-error-canceling-statement-due-to-conflict-with-recovery

We need to work out how to prevent these queries from being cancelled. The solution of hot_standby_feedback = on or max_standby_streaming_delay = -1 are probably what we need to set.

ffont · 2020-06-10T13:47:56Z

Looks like max_standby_streaming_delay = -1 might be a better option.

About the way to configure which DB to use, I like the "manual" using way. I think we should only reserve the use of the replica to very specific queries (at least as a start), so it is fine to do it rather manually and controlled by a settings parameter like you did.

alastair · 2023-03-16T10:26:55Z

@ffont what do you think about this? We've really had no slowdown issues since we did our migrations and fixes to the fileserver. In this case, I kind of think that we should forget this PR and stay with the single database

ffont · 2023-03-17T08:51:24Z

Yeah, considering that our main issues are not related to the DB structure but with the filesystem, I think it is good to keep things as they are and not add more complexity for now.

Use the database replica to read large tables when generating stats

2a3c229

alastair requested a review from ffont May 20, 2020 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the database replica to read large tables when generating stats #1439

Use the database replica to read large tables when generating stats #1439

alastair commented May 20, 2020 •

edited

Loading

alastair commented Jun 5, 2020

ffont commented Jun 10, 2020

alastair commented Mar 16, 2023

ffont commented Mar 17, 2023

Use the database replica to read large tables when generating stats #1439

Are you sure you want to change the base?

Use the database replica to read large tables when generating stats #1439

Conversation

alastair commented May 20, 2020 • edited Loading

alastair commented Jun 5, 2020

ffont commented Jun 10, 2020

alastair commented Mar 16, 2023

ffont commented Mar 17, 2023

alastair commented May 20, 2020 •

edited

Loading