managing seed or data-only migrations #2431

rikinsk-zz · 2019-06-26T12:48:14Z

Often one would like to add some data into tables as part of the db init process. We should document how to achieve that.

head to run SQL
add the insert sql statement
hit ~~track this~~ This is a migration
Run will insert data and create a migration file

Edit:
The above task is done. There are further suggestions for managing seed data migrations in the comments. Leaving this issue open for that discussion

jjangga0214 · 2019-08-12T07:57:34Z

@marionschleifer Currently I seed data by a pipeline that waits until hasura becomes ready (,which means migration of shema is done, so I can insert to database,) and then executes INSERT statements.

Could I ask

When would this task (managing seed or data-only migrations #2431) be completed?
Is there a way for even current version of hasura to execute a sql file including INSERT statement for seed data? So that I can remove an external script and delegate the job to hasura?

marionschleifer · 2019-08-12T08:42:04Z

@jjangga0214

This task will be completed within the next few weeks, until mid September.
Yes, you can do as described in the issue description: You can add the SQL (with insert statements) as a migration.
Let me know if you have any further questions 🙂

jjangga0214 · 2019-08-12T14:55:41Z

@marionschleifer Thank you for the prompt response :)

You can add the SQL (with insert statements) as a migration.

If I understand what you mentioned clear, then migrations would look like this, where 1565610380000_seed.up.sql contains all INSERT statements. The version 1565610380000 of 1565610380000_seed.up.sql is just arbitrarily chosen number as the file is not generated by hasura, but by me.

hasura
├── config.yaml
└── migrations
    ├── 1565510062048_init.up.sql
    ├── 1565510062048_init.up.yaml
    ├── 1565610349244_alter_table_public_author_add_column_test.down.yaml
    ├── 1565610349244_alter_table_public_author_add_column_test.up.yaml
    ├── 1565610379357_alter_table_public_author_alter_column_test.down.yaml
    ├── 1565610379357_alter_table_public_author_alter_column_test.up.yaml
    └── 1565610380000_seed.up.sql

The reason I want seed data is to easily populate data on local development environment. So seed data must be prevented to be fed to production database in our case. But satisfying this with migration model introduces, at least to my project, an inconvenient workflow.

Let's say I add a new migration named new_migration_example with version 1565610380001 like this.

hasura
├── config.yaml
└── migrations
... (previous versions are omitted for brevity)
    ├── 1565610380000_seed.up.sql
    ├── 1565610380001_new_migration_example.up.yaml
    └── 1565610380001_new_migration_example.down.yaml

While I want to just simply execute hasura migrate apply as like before adding seed, now I have to run hasura migrate apply --version 1565610380001. What if I have many versions to be applied? Then every version would need to be specified like hasura migrate apply --version 1565610380001 --version 1565610380120 --version ... (I haven't tested whether specifying multiple versions on one command works. If not, then multiple commands would be required.), rather than a simple command hasura migrate apply. This is cumbersome and error-prone as well.

In my opinion, "seed" should be an independent concept from migration. That's because migration can be applied to both dev and production environment, whereas seed, at least in my case, should only be applied to dev environment. For example, prisma provides prisma seed command, independent from prisma deploy, which is similar to hasura migration.

Therefore I suggest a new way.

Users add and manage 'seed sql files' under seeds directory like the below.

hasura
├── config.yaml
└── migrations
    ├── 1565510062048_init.up.sql
    └── 1565510062048_init.up.yaml
└── seeds
    └── 0_foo.sql
    └── 1_bar.sql

And the command below would execute every .sql files in 'seeds' folder in alphanumeric order.

hasura seed

What's more, when using graphql-engine.cli-migrations docker image, attaching the seeds directory as a volume to /hasura-seeds would execute hasura seed AFTER auto-applying migration.

# other options like `-p` are omitted for brevity
docker run \
       -v hasura/migrations:/hasura-migrations \
       -v hasura/seeds:/hasura-seeds \
       hasura/graphql-engine:v1.0.0-alpha42.cli-migrations

Or, rather than attaching two volumes, just attaching a single directory, which contains migrations and seeds directory, would be cool as well.

docker run \
       -v hasura:/hasura \
       hasura/graphql-engine:v1.0.0-alpha42.cli-migrations

Applying seed by graphql-engine.cli-migrations matters. Because I frequently remove docker volumes, and want to re-initialize postgres and hasura with the desired "state"(metadata, schema, and seed data) by just executing docker-compose up.

So, this will allow users to easily build local development workflow, while still providing easy migration to production.

How do you feel?

coco98 · 2019-10-02T03:51:47Z

@rikinsk @shahidhk Based on @jjangga0214 comments, do you think we can solve this by allowing a user to run an arbitrary migration/SQL without running it as a migration which updates the migration state on the database?

$ hasura migrate apply --one-off /path/to/folder

$ hasura migrate apply --one-off /path/to/file.sql

This will make it easy to run a seed data style migration, and we don't need to explicitly code in a seed directory into the CLI command flow?

rikinsk · 2019-10-04T10:42:56Z

Why isn't executing an insert SQL after the migrations are run good enough? Any reason for this to be a migration if its not to migrate from one system to another

edit: I might have responded without understanding the solutions and requirements properly. @coco98 solution is basically same as what I was going for. But I dont see how it would help to add the seed data automatically at start up like @jjangga0214's solution

mnlbox · 2019-11-08T13:00:24Z

Any news?

shahidhk · 2019-12-12T06:27:48Z

For the CLI migrations image, we can provide another environment variable HASURA_GRAPHQL_MIGRATIONS_SEED_DIR where the user can keep all the seed data as Migrations.

Then use can keep one directory for schema/metadata migrations and another for see data.

jjangga0214 · 2019-12-16T12:06:11Z

I'd like to suggest an additional proposal for seed data format.

I've seen folks have some different approaches for seeding.
For instance, some prefer SQL, while others, GraphQL Mutation.
What's more, some people have their raw seed data in JSON,
so they want to use it as-is or only with little modification.

JSON

SQL

GraphQL mutation

(This guy actually thinks json would be good as well, though)

I believe these comments are just a tip of real(or potential) needs for seed, with various approaches (at least under these 3 categories).

Thus, I think all of these 3 forms, JSON, SQL, GraphQL Mutation, are good to be used as data format for automatic seed.

jjangga0214 · 2019-12-16T12:29:14Z

I'd like to suggest an additional proposal for seed data management.

hasura seed apply # or `hasura seed push`?

This command reads seed data, and inserts it to Postgres.
This would be convenient when editing seed data frequently, and don't want to reboot or directly handle Hasura and Postgres.
If Postgres already has some data, an user can choose (e.g. by flag) between

Delete existing data, and then insert the seed.
Just insert the seed regardless of the existing data. Override existing data by seed if unique key is overlapped.

hasura seed create # or `hasura seed pull`?

This command reads existing data from Postgres, and generates seed data with format the user wants (e.g by flag), possibly one of SQL, JSON, GraphQL. It should probably write the seed data on file system, or print by stdout for pipeline.
This would be convenient when an user prefers making seed data on the console, or wants to convert seed data format(.e.g. GraphQL to JSON).

marionschleifer · 2019-12-17T14:56:56Z

@jjangga0214 thank you for your observations. We have labelled it as ideas and let's see what the community thiks about this. Every contribution is welcome 🙂

GavinRay97 · 2019-12-23T19:54:33Z

I also am in the process of trying to come up with a solution for this at the moment.
Manually writing SQL statement seeds is technically an answer, sure, but that is going to cost our team a lot of time. Ideally, Hasura should provide utilities to track and insert records along with schema definitions.

I think the optimum solution here is to use the pg_dump API (which I had no clue existed in Hasura until I went digging for answers in the repo code).

My initial thought was to implement calling the pg_dump binary externally as part of the migration process, but I can see now that this has already been implemented 😅

Have a look at this:

I figure the way it would work is that either through CLI flags or the Console, users can select whether they want to track just the schema, or the data plus the schema in migrations.

This may be a bit of an undertaking, because it is going to require (I assume):

Integration into the Haskell engine core and test suite
Integration into the Web console
Integration into the Go CLI

A short-term solution would be to modify the Go CLI to accept a flag for whether or not it should dump the data as well:

Modify migrate_create.go and schema_dump.go to take an --include-data (or similarly-termed) flag. Modify ExportSchemaDump argument string to dynamically generate the array of flags based on whether or not --include-data was passed.

graphql-engine/cli/commands/migrate_create.go

Line 29 in f8ffbda

func newMigrateCreateCmd(ec *cli.ExecutionContext) *cobra.Command {

graphql-engine/cli/commands/migrate_create.go

Lines 85 to 98 in f8ffbda

    
           type migrateCreateOptions struct { 
        
           	EC *cli.ExecutionContext 
        
           	name  string 
        
           	flags *pflag.FlagSet 
        
           	// Flags 
        
           	fromServer     bool 
        
           	sqlFile        string 
        
           	sqlServer      bool 
        
           	metaDataFile   string 
        
           	metaDataServer bool 
        
           	schemaNames    []string 
        
           }

graphql-engine/cli/migrate/database/hasuradb/schema_dump.go

Lines 8 to 18 in f8ffbda

    
           func (h *HasuraDB) ExportSchemaDump(schemaNames []string) ([]byte, error) { 
        
           	opts := []string{"-O", "-x", "--schema-only"} 
        
           	for _, s := range schemaNames { 
        
           		opts = append(opts, "--schema", s) 
        
           	} 
        
           	query := SchemaDump{ 
        
           		Opts:        opts, 
        
           		CleanOutput: true, 
        
           	} 
        
           	resp, body, err := h.sendSchemaDumpQuery(query)

I am not sure who best to tag about this. @marionschleifer @coco98

GavinRay97 · 2019-12-23T20:02:27Z

I do not have any experience writing Go, but here is what I imagine the function would roughly look like (barring the fact that if we are going to pass config here, it should be in an object (I suppose Go calls them interfaces)):

 func (h *HasuraDB) ExportSchemaDump(schemaNames []string, includeData bool) ([]byte, error) { 
	 // For whatever it is worth, I would actually pass these flags as
     // --no-owner and --no-acl because it carries more immediate semantic meaning.
     // I had to look these flags up in pg_dump documentation to find out what they did
	 opts := []string{"--no-owner", "--no-acl"}

	 if includeData {
		 opts = append(opts, "--inserts")
	 } else {
		opts = append(opts, "--schema-only")
	 }

	 // Probably want to add in a []string for tableNames here as well
	 // Then the implementation would look like:
	 if tableNames {
		for _, table := range tableNames {
			opts = append(opts, "-t", table)
		}
	 }

 	for _, s := range schemaNames { 
 		opts = append(opts, "--schema", s) 
	 }
	 
 	query := SchemaDump{ 
 		Opts:        opts, 
 		CleanOutput: true, 
 	} 
	resp, body, err := h.sendSchemaDumpQuery(query) 
}```

marionschleifer · 2019-12-24T15:22:21Z

@shahidhk could you please help @GavinRay97 to get started on this? ✨

GavinRay97 · 2019-12-24T16:54:05Z

@lexi-lambda This has cross-cutting concerns with #2817 and would potentially take care of that too

GavinRay97 · 2019-12-28T18:38:35Z

@joshuarobs

I added functionality for seed scripts into the CLI, but have some minor details to work out. Have not heard back from the Hasura team on this so it may be the case they are either on holiday or its not super high on roadmap.

Never written Go before and unsure how much more needs to be done on it to properly integrate it, so no promises if/when it will get merged into core. Would be really neat if it did though.

Manubi · 2020-04-08T14:33:13Z

I would actually love an option where I can update the seed data in the hasura console tab. Adjust the row and have the opportunity after I've updated a couple of rows to save them as a migration. So instead of writing for every small change a new SQL statement I would prefer to take a database snapshot from time to time.

Any ideas?

…3763)

rikinsk-zz added the c/docs Related to docs label Jun 26, 2019

coco98 assigned marionschleifer Jul 1, 2019

coco98 added e/intermediate can be wrapped up in a week p/medium non-urgent issues/features that are candidates for being included in one of the upcoming sprints labels Jul 1, 2019

marionschleifer added p/high candidate for being included in the upcoming sprint and removed p/medium non-urgent issues/features that are candidates for being included in one of the upcoming sprints labels Aug 12, 2019

marionschleifer added the k/ideas Discuss new ideas / pre-proposals / roadmap label Aug 17, 2019

marionschleifer mentioned this issue Sep 12, 2019

Docs for seed data migration (close #2431) #2870

Merged

coco98 changed the title ~~add docs for creating a seed data migration~~ add docs / update CLI for managing seed or data-only migrations Oct 2, 2019

marionschleifer added the support/needs-action support ticket that requires action by team label Nov 20, 2019

rikinsk closed this as completed in 935abc2 Nov 28, 2019

rikinsk reopened this Nov 28, 2019

rikinsk changed the title ~~add docs / update CLI for managing seed or data-only migrations~~ managing seed or data-only migrations Nov 28, 2019

marionschleifer added c/cli Related to CLI and removed support/needs-action support ticket that requires action by team c/docs Related to docs e/intermediate can be wrapped up in a week p/high candidate for being included in the upcoming sprint labels Nov 28, 2019

marionschleifer removed their assignment Dec 17, 2019

GavinRay97 mentioned this issue Jan 1, 2020

Add seed ability and option to include data in migrations to CLI (close #2431, #2817) #3614

Closed

33 tasks

polRk pushed a commit to polRk/graphql-engine that referenced this issue Feb 12, 2020

add docs for seed data migration (close hasura#2431) (hasura#2870)

3acb3b9

rikinsk added k/enhancement New feature or improve an existing feature s/wip Status: This issue is a work in progress and removed k/ideas Discuss new ideas / pre-proposals / roadmap labels Mar 24, 2020

rikinsk assigned scriptonist Mar 24, 2020

scriptonist closed this as completed in 1f20379 Jun 16, 2020

stevefan1999-personal pushed a commit to stevefan1999-personal/graphql-engine that referenced this issue Sep 12, 2020

cli: manage seed migrations as sql files (close hasura#2431) (hasura#…

75d8b80

…3763)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

managing seed or data-only migrations #2431

managing seed or data-only migrations #2431

rikinsk-zz commented Jun 26, 2019 •

edited by rikinsk

Loading

jjangga0214 commented Aug 12, 2019 •

edited

Loading

marionschleifer commented Aug 12, 2019

jjangga0214 commented Aug 12, 2019 •

edited

Loading

coco98 commented Oct 2, 2019

rikinsk commented Oct 4, 2019 •

edited

Loading

mnlbox commented Nov 8, 2019

shahidhk commented Dec 12, 2019

jjangga0214 commented Dec 16, 2019 •

edited

Loading

jjangga0214 commented Dec 16, 2019 •

edited

Loading

marionschleifer commented Dec 17, 2019 •

edited by shahidhk

Loading

GavinRay97 commented Dec 23, 2019 •

edited

Loading

GavinRay97 commented Dec 23, 2019 •

edited

Loading

marionschleifer commented Dec 24, 2019

GavinRay97 commented Dec 24, 2019

GavinRay97 commented Dec 28, 2019 •

edited

Loading

Manubi commented Apr 8, 2020

managing seed or data-only migrations #2431

managing seed or data-only migrations #2431

Comments

rikinsk-zz commented Jun 26, 2019 • edited by rikinsk Loading

jjangga0214 commented Aug 12, 2019 • edited Loading

marionschleifer commented Aug 12, 2019

jjangga0214 commented Aug 12, 2019 • edited Loading

coco98 commented Oct 2, 2019

rikinsk commented Oct 4, 2019 • edited Loading

mnlbox commented Nov 8, 2019

shahidhk commented Dec 12, 2019

jjangga0214 commented Dec 16, 2019 • edited Loading

jjangga0214 commented Dec 16, 2019 • edited Loading

marionschleifer commented Dec 17, 2019 • edited by shahidhk Loading

GavinRay97 commented Dec 23, 2019 • edited Loading

GavinRay97 commented Dec 23, 2019 • edited Loading

marionschleifer commented Dec 24, 2019

GavinRay97 commented Dec 24, 2019

GavinRay97 commented Dec 28, 2019 • edited Loading

Manubi commented Apr 8, 2020

rikinsk-zz commented Jun 26, 2019 •

edited by rikinsk

Loading

jjangga0214 commented Aug 12, 2019 •

edited

Loading

jjangga0214 commented Aug 12, 2019 •

edited

Loading

rikinsk commented Oct 4, 2019 •

edited

Loading

jjangga0214 commented Dec 16, 2019 •

edited

Loading

jjangga0214 commented Dec 16, 2019 •

edited

Loading

marionschleifer commented Dec 17, 2019 •

edited by shahidhk

Loading

GavinRay97 commented Dec 23, 2019 •

edited

Loading

GavinRay97 commented Dec 23, 2019 •

edited

Loading

GavinRay97 commented Dec 28, 2019 •

edited

Loading