Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: db optimizations, backups optimizations #1664

Merged
merged 4 commits into from
Apr 1, 2022

Conversation

yocontra
Copy link
Contributor

@yocontra yocontra commented Mar 17, 2022

Partial of #1126 (does not include partitioning).

Included in this PR:

  • Improves the docker-compose for the database by specifying CPU and RAM constraints
  • Updates the postgres dockerfile to use the latest PG (13) to match what we run in production/staging
    • Also improves it by adding additional logging, which is useful when working with the DB on local dev
  • Improve the DB configuration
    • Many other settings aren't able to be configured like this (Heroku doesn't expose most flags), noted in Postgres improvements #782 so I did what I could within the constraints of what we can do inside the DB
  • Improves the "backup" system - instead of having its own table with a many to one relationship, add a new key backup_urls for each upload. This removes a redundant large table (12M uploads = > 12M backup rows), and speeds up create_upload by removing more inserts.

Copy link
Contributor

@hugomrdias hugomrdias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yocontra
Copy link
Contributor Author

yocontra commented Mar 24, 2022

Migration, tested against staging already:

ALTER TABLE "upload" ADD COLUMN "backup_urls" TEXT[];

UPDATE "upload" SET "backup_urls" = (SELECT array_agg("url") FROM "backup" WHERE "upload_id" = "upload"."id");

DROP TABLE "backup";

@yocontra
Copy link
Contributor Author

yocontra commented Mar 29, 2022

Optimized migration for production, batching updates for 64M rows into smaller queries to take less locks and prevent any downtime:

https://gist.github.com/yocontra/4ae719b42a54c4582ca28fcac74199d0

@yocontra
Copy link
Contributor Author

@mikeal @hugomrdias This is ready to deploy - it has been run against staging and tested against a clone of the production database. Please let me know what next steps are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants