Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections public dumps #513

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Collections public dumps #513

wants to merge 6 commits into from

Conversation

MonkeyDo
Copy link
Member

@MonkeyDo MonkeyDo commented Sep 9, 2020

Problem

With the introduction of user collections, private collections are currently going to be exported in the dumps.

Solution

This PR aims to create a database dump without collections, and another dump with only the public collections.
For that purpose, we create temporary tables (ie user_collection -> public_user_collection) with the appropriate select statements to ignore private collections its items and collaborators, and dump those three tables to a file before removing them.

We also want to rename the table names in the collections dump file once it has been created (for example to rename bookbrainz.public_user_collection -> bookbrainz.user_collection

@MonkeyDo MonkeyDo requested a review from mayhem September 9, 2020 15:51
@MonkeyDo
Copy link
Member Author

MonkeyDo commented Sep 9, 2020

Pinging @paramsingh instead of a review request which doesn't work…

@coveralls
Copy link

coveralls commented Sep 9, 2020

Coverage Status

Coverage increased (+0.04%) to 60.9% when pulling 447600b on collections-public-dumps into 2e762c9 on master.

Copy link
Collaborator

@paramsingh paramsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look reasonable to me, however, I'm wondering if it's a better idea to keep the collections in the same dump. different data dump files make it difficult for users to get into the project, or consume the data, is there a reason why we did the two dumps?

scripts/create-public-collection-dumps.sql Outdated Show resolved Hide resolved
@MonkeyDo
Copy link
Member Author

Different data dump files make it difficult for users to get into the project, or consume the data, is there a reason why we did the two dumps?

I didn't find any other way to not dump the private collections, other than create a dump without the collections and a dump with the public collections only (consisting of selected rows of three tables).

Do you think I could just concatenate the two sql dump files? I must admit I haven't tried that.

@MonkeyDo
Copy link
Member Author

I did end up concatenating the two dump files, and it works like a charm, thanks @paramsingh !
Back to a single dump file.

I also realized my duplicated tables didn't have their foreign keys, and had to add them to the collection dump sql script.

Copy link
Member

@mayhem mayhem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename tables to make it clear they are temp. Next, what happens with private collections? Will there be a follow up PR to handle those?


-- duplicate user_collection table with public collections only

CREATE table if not exists public_user_collection (LIKE bookbrainz.user_collection INCLUDING ALL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have these tables named starting with tmp_ to make it clear that they are dump tables. The clean-public-collection-dump-tables.sql script is a bit terrifying to read without the knowledge that these are not temp tables.

Create two dumps: one without any user collections and one with public user collections only.
We can then import both of those files and bob's your uncle!
Replace the existing single file import with importing both the main and user collection dumps, after extracting them from the tarball.
Modified instructions and links accordingly
In a transaction, with proper foreing keys, ready to be dumped.
& revert changes in instructions to that of a single .sql.bz2 file
@MonkeyDo MonkeyDo force-pushed the collections-public-dumps branch from 55e4d8f to 447600b Compare January 15, 2021 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants