Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[9.x] - 45782: Solve data to be dumped for separate schemes #45805

Conversation

eduance
Copy link
Contributor

@eduance eduance commented Jan 26, 2023

This issue refers to:
#45782


For MySQL a schema is defined as:

Conceptually, a schema is a set of interrelated database objects, such as tables, table columns, data types, indexes and so on. In MySQL a schema is synonymous with a database: you can swap out the words schema and database.

For PostgreSQL a schema is defined as:

A schema is a namespace for SQL objects, which all reside in the same database. Each SQL object must reside in exactly one schema.

More generically, the term schema is used to mean all data descriptions.

A schema can be used to organize database objects, or even allow having one database identical tables for different schemas. If you have a database named 'laravel', with schema 'symfony' and schema 'laravel', this database can have two tables named users in a PostgreSQL database.

In a MySQL database, you would simply have a database named laravel and symfony.

PostgreSQL has a default public schema, but for consistency sake I would recommend exporting all schemas for one database, so the artisan dump command simply dumps out all tables.

What we would like to achieve is that we want to get all schemas, without any data, excluding the migrations table.

This would look like something like:

pg_dump [connection_options] -s -T 'migrations' [db_name]

Why not the current approach?

The current approach will dump all database objects, excluding the table data defined in the public scheme. As we have no way of knowing what schemes a user might be using, and we shouldn't want to know for the dump command, it could be a more general solution to simply dump all sql objects, exclude data, for all schemes.

Another approach would be to only dump out all data under the default public scheme, but this would make a dump useless for companies that have a logical grouping the database.

@robclancy
Copy link
Contributor

This fixes the data issue but re-introduces this issue (which created the data issue lol) #35018.

@eduance
Copy link
Contributor Author

eduance commented Jan 26, 2023

@robclancy

Thanks for the quick replies. It looks like Taylor is trying to fix something else in that PR, by removing --schema-only and replacing it with excluding the table data. I just checked the issue that he was initially fixing: #34281 and excuse my French, but after having built too many Laravel projects, that error is telling us that he already has a migrations table and he is adding another migrations table or inserting duplicate rows, so therefore removing the migrations table from the dump seems to fix more or less his issue.

d-stephane explained the problem in more detail here:

#34694 (comment)

This issue seemed to be fixed after renaming to custom binary dumps, but that isn't true. What actually solves his issue was to exclude the migration data table which is by default always available in the public scheme, so to really fix his issue, we don't need a lot of black magic, we just needed the -T parameter which excluded the migrations table, and combine it with --schema-only so we can still get all schemes which are defined in our database.

I'll be up in case you have any additions, now getting some good ol' coffee.

@robclancy
Copy link
Contributor

The issue is that the original way it worked is to dump all the SQL and then append the migration data as SQL. Then someone made a change to use the custom postgres binary format so that data was no longer appended. The fix was then to instead exclude the data of all tables except the migration table, this only works on the current schema though so data is added for all other schemas.

The migration table and its data is still needed.

@eduance
Copy link
Contributor Author

eduance commented Jan 26, 2023

@robclancy Thanks for the explanation.

As Taylor wanted to get rid of the binary dumps as he stated back in 2020, I looked for a solution that was able to achieve the following:

  • Replace binary dumps with SQL dumps.
  • Get the schema from all tables, exclude data.
  • Take migration data into consideration and include it aswell.

Another commit, this could be an initial solution to both your problem as d-stephanes problem.

Tested this out in a project with about 30+ tables, and a bit more than 10 million records.

@eduance eduance changed the title 45782: Solve data to be dumped for separate schemes [9.x] - 45782: Solve data to be dumped for separate schemes Jan 26, 2023
@taylorotwell taylorotwell merged commit 217e18a into laravel:9.x Jan 31, 2023
@taylorotwell
Copy link
Member

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants