Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

managing seed or data-only migrations #2431

Closed
rikinsk-zz opened this issue Jun 26, 2019 · 16 comments
Closed

managing seed or data-only migrations #2431

rikinsk-zz opened this issue Jun 26, 2019 · 16 comments
Assignees
Labels
c/cli Related to CLI k/enhancement New feature or improve an existing feature s/wip Status: This issue is a work in progress

Comments

@rikinsk-zz
Copy link

rikinsk-zz commented Jun 26, 2019

Often one would like to add some data into tables as part of the db init process. We should document how to achieve that.

  • head to run SQL
  • add the insert sql statement
  • hit track this This is a migration
  • Run will insert data and create a migration file

Edit:
The above task is done. There are further suggestions for managing seed data migrations in the comments. Leaving this issue open for that discussion

@rikinsk-zz rikinsk-zz added the c/docs Related to docs label Jun 26, 2019
@coco98 coco98 added e/intermediate can be wrapped up in a week p/medium non-urgent issues/features that are candidates for being included in one of the upcoming sprints labels Jul 1, 2019
@jjangga0214
Copy link
Contributor

jjangga0214 commented Aug 12, 2019

@marionschleifer Currently I seed data by a pipeline that waits until hasura becomes ready (,which means migration of shema is done, so I can insert to database,) and then executes INSERT statements.

Could I ask

  1. When would this task (managing seed or data-only migrations #2431) be completed?
  2. Is there a way for even current version of hasura to execute a sql file including INSERT statement for seed data? So that I can remove an external script and delegate the job to hasura?

@marionschleifer marionschleifer added p/high candidate for being included in the upcoming sprint and removed p/medium non-urgent issues/features that are candidates for being included in one of the upcoming sprints labels Aug 12, 2019
@marionschleifer
Copy link
Contributor

@jjangga0214

  1. This task will be completed within the next few weeks, until mid September.
  2. Yes, you can do as described in the issue description: You can add the SQL (with insert statements) as a migration.
    Let me know if you have any further questions 🙂

@jjangga0214
Copy link
Contributor

jjangga0214 commented Aug 12, 2019

@marionschleifer Thank you for the prompt response :)

You can add the SQL (with insert statements) as a migration.

If I understand what you mentioned clear, then migrations would look like this, where 1565610380000_seed.up.sql contains all INSERT statements. The version 1565610380000 of 1565610380000_seed.up.sql is just arbitrarily chosen number as the file is not generated by hasura, but by me.

hasura
├── config.yaml
└── migrations
    ├── 1565510062048_init.up.sql
    ├── 1565510062048_init.up.yaml
    ├── 1565610349244_alter_table_public_author_add_column_test.down.yaml
    ├── 1565610349244_alter_table_public_author_add_column_test.up.yaml
    ├── 1565610379357_alter_table_public_author_alter_column_test.down.yaml
    ├── 1565610379357_alter_table_public_author_alter_column_test.up.yaml
    └── 1565610380000_seed.up.sql

The reason I want seed data is to easily populate data on local development environment. So seed data must be prevented to be fed to production database in our case. But satisfying this with migration model introduces, at least to my project, an inconvenient workflow.

Let's say I add a new migration named new_migration_example with version 1565610380001 like this.

hasura
├── config.yaml
└── migrations
... (previous versions are omitted for brevity)
    ├── 1565610380000_seed.up.sql
    ├── 1565610380001_new_migration_example.up.yaml
    └── 1565610380001_new_migration_example.down.yaml

While I want to just simply execute hasura migrate apply as like before adding seed, now I have to run hasura migrate apply --version 1565610380001. What if I have many versions to be applied? Then every version would need to be specified like hasura migrate apply --version 1565610380001 --version 1565610380120 --version ... (I haven't tested whether specifying multiple versions on one command works. If not, then multiple commands would be required.), rather than a simple command hasura migrate apply. This is cumbersome and error-prone as well.

In my opinion, "seed" should be an independent concept from migration. That's because migration can be applied to both dev and production environment, whereas seed, at least in my case, should only be applied to dev environment. For example, prisma provides prisma seed command, independent from prisma deploy, which is similar to hasura migration.

Therefore I suggest a new way.

Users add and manage 'seed sql files' under seeds directory like the below.

hasura
├── config.yaml
└── migrations
    ├── 1565510062048_init.up.sql
    └── 1565510062048_init.up.yaml
└── seeds
    └── 0_foo.sql
    └── 1_bar.sql

And the command below would execute every .sql files in 'seeds' folder in alphanumeric order.

hasura seed 

What's more, when using graphql-engine.cli-migrations docker image, attaching the seeds directory as a volume to /hasura-seeds would execute hasura seed AFTER auto-applying migration.

# other options like `-p` are omitted for brevity
docker run \
       -v hasura/migrations:/hasura-migrations \
       -v hasura/seeds:/hasura-seeds \
       hasura/graphql-engine:v1.0.0-alpha42.cli-migrations

Or, rather than attaching two volumes, just attaching a single directory, which contains migrations and seeds directory, would be cool as well.

docker run \
       -v hasura:/hasura \
       hasura/graphql-engine:v1.0.0-alpha42.cli-migrations

Applying seed by graphql-engine.cli-migrations matters. Because I frequently remove docker volumes, and want to re-initialize postgres and hasura with the desired "state"(metadata, schema, and seed data) by just executing docker-compose up.

So, this will allow users to easily build local development workflow, while still providing easy migration to production.

How do you feel?

@marionschleifer marionschleifer added the k/ideas Discuss new ideas / pre-proposals / roadmap label Aug 17, 2019
@coco98
Copy link
Contributor

coco98 commented Oct 2, 2019

@rikinsk @shahidhk Based on @jjangga0214 comments, do you think we can solve this by allowing a user to run an arbitrary migration/SQL without running it as a migration which updates the migration state on the database?

$ hasura migrate apply --one-off /path/to/folder

$ hasura migrate apply --one-off /path/to/file.sql

This will make it easy to run a seed data style migration, and we don't need to explicitly code in a seed directory into the CLI command flow?

@coco98 coco98 changed the title add docs for creating a seed data migration add docs / update CLI for managing seed or data-only migrations Oct 2, 2019
@rikinsk
Copy link
Member

rikinsk commented Oct 4, 2019

Why isn't executing an insert SQL after the migrations are run good enough? Any reason for this to be a migration if its not to migrate from one system to another

edit: I might have responded without understanding the solutions and requirements properly. @coco98 solution is basically same as what I was going for. But I dont see how it would help to add the seed data automatically at start up like @jjangga0214's solution

@mnlbox
Copy link
Contributor

mnlbox commented Nov 8, 2019

Any news?

@marionschleifer marionschleifer added the support/needs-action support ticket that requires action by team label Nov 20, 2019
@rikinsk rikinsk reopened this Nov 28, 2019
@rikinsk rikinsk changed the title add docs / update CLI for managing seed or data-only migrations managing seed or data-only migrations Nov 28, 2019
@marionschleifer marionschleifer added c/cli Related to CLI and removed support/needs-action support ticket that requires action by team c/docs Related to docs e/intermediate can be wrapped up in a week p/high candidate for being included in the upcoming sprint labels Nov 28, 2019
@shahidhk
Copy link
Member

For the CLI migrations image, we can provide another environment variable HASURA_GRAPHQL_MIGRATIONS_SEED_DIR where the user can keep all the seed data as Migrations.

Then use can keep one directory for schema/metadata migrations and another for see data.

@jjangga0214
Copy link
Contributor

jjangga0214 commented Dec 16, 2019

I'd like to suggest an additional proposal for seed data format.

I've seen folks have some different approaches for seeding.
For instance, some prefer SQL, while others, GraphQL Mutation.
What's more, some people have their raw seed data in JSON,
so they want to use it as-is or only with little modification.

JSON

Screenshot from 2019-12-16 20-25-08
Screenshot from 2019-12-16 20-27-32
Screenshot from 2019-12-16 20-30-25

SQL

Screenshot from 2019-12-16 20-29-35

GraphQL mutation

Screenshot from 2019-12-16 20-31-05

(This guy actually thinks json would be good as well, though)
Screenshot from 2019-12-16 20-30-56

I believe these comments are just a tip of real(or potential) needs for seed, with various approaches (at least under these 3 categories).

Thus, I think all of these 3 forms, JSON, SQL, GraphQL Mutation, are good to be used as data format for automatic seed.

@jjangga0214
Copy link
Contributor

jjangga0214 commented Dec 16, 2019

I'd like to suggest an additional proposal for seed data management.

hasura seed apply # or `hasura seed push`?

This command reads seed data, and inserts it to Postgres.
This would be convenient when editing seed data frequently, and don't want to reboot or directly handle Hasura and Postgres.
If Postgres already has some data, an user can choose (e.g. by flag) between

  1. Delete existing data, and then insert the seed.
  2. Just insert the seed regardless of the existing data. Override existing data by seed if unique key is overlapped.
hasura seed create # or `hasura seed pull`?  

This command reads existing data from Postgres, and generates seed data with format the user wants (e.g by flag), possibly one of SQL, JSON, GraphQL. It should probably write the seed data on file system, or print by stdout for pipeline.
This would be convenient when an user prefers making seed data on the console, or wants to convert seed data format(.e.g. GraphQL to JSON).

@marionschleifer
Copy link
Contributor

marionschleifer commented Dec 17, 2019

@jjangga0214 thank you for your observations. We have labelled it as ideas and let's see what the community thiks about this. Every contribution is welcome 🙂

@marionschleifer marionschleifer removed their assignment Dec 17, 2019
@GavinRay97
Copy link
Member

GavinRay97 commented Dec 23, 2019

I also am in the process of trying to come up with a solution for this at the moment.
Manually writing SQL statement seeds is technically an answer, sure, but that is going to cost our team a lot of time. Ideally, Hasura should provide utilities to track and insert records along with schema definitions.

I think the optimum solution here is to use the pg_dump API (which I had no clue existed in Hasura until I went digging for answers in the repo code).

My initial thought was to implement calling the pg_dump binary externally as part of the migration process, but I can see now that this has already been implemented 😅

Have a look at this:

image

I figure the way it would work is that either through CLI flags or the Console, users can select whether they want to track just the schema, or the data plus the schema in migrations.

This may be a bit of an undertaking, because it is going to require (I assume):

  • Integration into the Haskell engine core and test suite
  • Integration into the Web console
  • Integration into the Go CLI

A short-term solution would be to modify the Go CLI to accept a flag for whether or not it should dump the data as well:

Modify migrate_create.go and schema_dump.go to take an --include-data (or similarly-termed) flag. Modify ExportSchemaDump argument string to dynamically generate the array of flags based on whether or not --include-data was passed.

func newMigrateCreateCmd(ec *cli.ExecutionContext) *cobra.Command {

type migrateCreateOptions struct {
EC *cli.ExecutionContext
name string
flags *pflag.FlagSet
// Flags
fromServer bool
sqlFile string
sqlServer bool
metaDataFile string
metaDataServer bool
schemaNames []string
}

func (h *HasuraDB) ExportSchemaDump(schemaNames []string) ([]byte, error) {
opts := []string{"-O", "-x", "--schema-only"}
for _, s := range schemaNames {
opts = append(opts, "--schema", s)
}
query := SchemaDump{
Opts: opts,
CleanOutput: true,
}
resp, body, err := h.sendSchemaDumpQuery(query)

I am not sure who best to tag about this. @marionschleifer @coco98

@GavinRay97
Copy link
Member

GavinRay97 commented Dec 23, 2019

I do not have any experience writing Go, but here is what I imagine the function would roughly look like (barring the fact that if we are going to pass config here, it should be in an object (I suppose Go calls them interfaces)):

 func (h *HasuraDB) ExportSchemaDump(schemaNames []string, includeData bool) ([]byte, error) { 
	 // For whatever it is worth, I would actually pass these flags as
     // --no-owner and --no-acl because it carries more immediate semantic meaning.
     // I had to look these flags up in pg_dump documentation to find out what they did
	 opts := []string{"--no-owner", "--no-acl"}

	 if includeData {
		 opts = append(opts, "--inserts")
	 } else {
		opts = append(opts, "--schema-only")
	 }

	 // Probably want to add in a []string for tableNames here as well
	 // Then the implementation would look like:
	 if tableNames {
		for _, table := range tableNames {
			opts = append(opts, "-t", table)
		}
	 }

 	for _, s := range schemaNames { 
 		opts = append(opts, "--schema", s) 
	 }
	 
 	query := SchemaDump{ 
 		Opts:        opts, 
 		CleanOutput: true, 
 	} 
	resp, body, err := h.sendSchemaDumpQuery(query) 
}```

@marionschleifer
Copy link
Contributor

@shahidhk could you please help @GavinRay97 to get started on this? ✨

@GavinRay97
Copy link
Member

@lexi-lambda This has cross-cutting concerns with #2817 and would potentially take care of that too

@GavinRay97
Copy link
Member

GavinRay97 commented Dec 28, 2019

@joshuarobs

I added functionality for seed scripts into the CLI, but have some minor details to work out. Have not heard back from the Hasura team on this so it may be the case they are either on holiday or its not super high on roadmap.

Never written Go before and unsure how much more needs to be done on it to properly integrate it, so no promises if/when it will get merged into core. Would be really neat if it did though.

image

image

image

polRk pushed a commit to polRk/graphql-engine that referenced this issue Feb 12, 2020
@rikinsk rikinsk added k/enhancement New feature or improve an existing feature s/wip Status: This issue is a work in progress and removed k/ideas Discuss new ideas / pre-proposals / roadmap labels Mar 24, 2020
@Manubi
Copy link

Manubi commented Apr 8, 2020

I would actually love an option where I can update the seed data in the hasura console tab. Adjust the row and have the opportunity after I've updated a couple of rows to save them as a migration. So instead of writing for every small change a new SQL statement I would prefer to take a database snapshot from time to time.

Any ideas?

stevefan1999-personal pushed a commit to stevefan1999-personal/graphql-engine that referenced this issue Sep 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/cli Related to CLI k/enhancement New feature or improve an existing feature s/wip Status: This issue is a work in progress
Projects
None yet
Development

No branches or pull requests

10 participants