-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for seeding the database with test data #259
Comments
Is your proposal similar to #235 (comment), specifically step number 4? Where by "seed" you mean import an already dumped database? I certainly think there is room for improvement, as I've also (and in internal projects) have extended goose. So why not add those "nice" things to the core package. @soggycactus maybe if you can flush out what you're proposing we can have a more technical understanding of what a command such as "seed" would do |
It's similar to #235 but not exactly the same. Typically in our projects what we've done is create another migrations directory that essentially consists of a lot of We do some work outside of goose to make sure these additional migrations are only run locally and inside of CI and dev environments, since the static testing data helps keep our end-to-end test results predictable. When we deploy and run migrations for prod, we don't run the additional migrations that insert test data into the database. I'm curious what your thoughts are on what a potential |
bumping @mfridman any thoughts on the above? |
My initial thought was why is it a So I'm struggling a bit to understand how a
Is the intention to keep seed data alongside the actual migration data? Or to be able to identify it in the database table? Sorry maybe I'm missing something.
Any chance you can elaborate on this. I suspect there is a way to address your issues (and many others) but I'd like to understand a bit better. |
@mfridman you're exactly right about this - we manage our current project doing exactly this. It's a little annoying to have to write a few scripts and make sure that we're only running the second set of migrations in the lower environments. The versioning problem arises because the seed migrations always have to come after the schema migrations. So in a regular workflow, it's really easy to create a schema migration with a later timestamp than the existing seed migrations, apply the schema migrations, and then try to apply the seed migrations only to have goose say there's nothing to run. We get around this by making every seed migration happen in the year 3000, but again it feels hacky and is kind of annoying. I agree that a seed command in this context might not be as useful as it sounds - what if I just feel like there has to be a better way to seed the database with dummy data than what we're currently doing. |
There's a bit of history w.r.t to the current versioning done by However, say Curious, for your |
It might mitigate my current issues IF the seed migrations run after the schema migrations; don't want to be inserting data into tables that don't exist. We don't need to use the |
Yes, this is the intention, regardless when the migrations were created they'd be applied regardless. This pushes the complexity to the user to manage their migrations, but the outcome is what most users would want anyways. As long as the migration steps, migrations step followed by seed step, are coordinated and applied serially then it accomplishes what you want. I think this is another data point, that we should introduce |
@mfridman That sounds good, should I leave this issue open in the meantime so you can reference it if you implement |
@soggycactus Yes please, I'm going to carve a bit of time to think this through and make sure we get the correct implementation based on all the issues we received w.r.t versioning. I'd first like to land module support #261 and have a clear non-breaking ( |
I am looking for a similar feature. I need a way to seed and teardown test data. It is easy enough to run goose using two different directories, one for sql structure (migrations/) then a second on sql for test data (seedDir/), but it is not possible to rollback the test data. When running "goose down" on the test data directory I received the error It appears that goose fails to execute the "down" sql unless the file for the previous migrations are also present. My use case is that I need to have a HEAD of sql statements that seed test data, and roll them back at test cleanup. The seed sql only needs to be valid for the most recent version of the db schema, so it can change over time. |
Do you need to keep track of "seeded" migrations in the database? Also, if I understood correctly, the order of operations will be:
|
Are you asking for the order of operations to get the error I described? Or order of operations for my desired workflow? For my desired workflow: I don't think I need to keep track of the seeded migrations in the database in the same that that migrations are tracked with a serial/datetime id. I do not expect to have multiple layers of seed/test data. I do not down the migrations as part of my testing process.
I would also image that step 3 above could include deleting seed data, updating seed data in response to test changes, re-seed database, repeat as needed. This would be "as needed" as I've yet to encounter it. My workaround was to create two sql files, a down and an up, and run each one when needed using Exec() manually. The up is run after migrations, and the down is run defered and run at the end of TestMain(). Using os.ReadAll() and Exec() is not that much work, but I was hoping to have up/down of sql all be done one way with one tool for consistency. |
Apologies, I should have clarified.. I was looking for order of operations for desired workflow. I believe I understand the current limitation, so this is useful. Thank you! fwiw I'd like to add this functionality because I also found myself wrapping around What are your thoughts, as @soggycactus suggested, to have a distinct command (or flag) that applies ad-hoc SQL ? Something like: goose -dir ./seedDir -seed up
goose -dir ./seedDir -seed down And the So
|
I've been playing up around with an implementation and quite like it. The only thing I'm not sure about is how to expose the functionality to the end-user:
The Suggestions or alternatives welcome. One item for discussion, from this comment, is whether seed operations should be tracked in the database or not. |
I see two different use cases, "insert seed data after migration in test/prod" and "insert test data on demand in local/dev". In thinking this through I see that migrations and seed data are two different uses. I would then separate out migrations to only alter the structure and shape of the database, and seed data to only insert into tables and update rows. Insert seed data after migration in test/prodI can see there being a seed data file for every migration file. The migration updates the structure, or adds new structure, and the seed data inserts test/dummy data needed to use the new feature. In this case I would want the option to rollback seed data independently of the structure, or to roll them back in lock-step. Seed data would also need to only be applied after it's corresponding migration was applied. So, migration-01, seed-01, migration-02, seed-02. Insert test data on demand in local/devI would want looser options around apply test data. I want to add test data per-test, and then delete it after the test is run. Here is where I think that the test data wouldn't need to update the migration version in the goose table. Goose would just run the corresponding up or down sql, in file order, from the directory it's pointed to. I would see seed data files being separate from migration files. Really for my needs I think an "ignore version" or "don't update version" flag would allow me to point goose to another directory with sql and just up it or down it. I pointed goose to a different directory with sql and said "just run these" but goose complained that the current version in the db table was not available in that directory. If I had a flag to ignore that check it would have just up'd my test data inserts. |
That's a great summary of the problem. Thanks for putting the time to write that up. For case 1 I think they are one and the same and can be accomplished today (for the most part, caveat below). Whenever we needed some data to always be inserted into the database, we captured it within the same migration file (or a subsequent file). Caveat, there is no way (today) to run multiple migrations within a single transaction. There is already an open issue for this #222. So to keep For case 2, however, I think we can introduce a flag
|
I agree for case 1, nothing needs to be changed. This would be more of a pattern to be documented. I also agree for case 2. What is the working that would be most direct? "Ignore versioning", "No versioning", "Ad Hoc", "Omit versioning" Don't mind me, this bike sheds a decent enough color already. Would |
I'll carve out some time to add this feature, update the docs and write a short blog post explaining the use case. Don't mind bikeshedding a bit. With this feature I'd err on the side of caution to make the flag explicit and self explanatory:
If you have suggestion for a short flag, I'm open to it. Although something tells me this shouldn't have a short flag.
The short answer is, potentially, but not yet. The long answer is, |
I like either I understand about it being an old codebase with it's own ghosts. I appreciate you helping to maintain it. |
@ukiahsmith This functionality has been added in c8aa123, merged via #291 You'll likely find the following commands do what you described above:
All commands should support the Thanks for the input, glad we got this functionality into |
@mfridman Thank you for taking the time to talk through this and to implement it. |
Thanks for the feedback, wrote a quick post to summarize how this works for the next person. |
I have used goose at pretty much every job to manage my relational databases. It works like a charm, but I always find myself creating some sort of wrapper that allows me to use goose to seed my local and development environments with test data. It would be really nice if this were supported already.
I propose creating a new command
goose seed
that will allow users to continue managing their schema migrations with goose and also seed their non-production environments with test data for integration & end-to-end testing. Please note that I am not suggesting actual seed data be managed this way - if your application requires some static, pre-existing data, that can just be inserted along with the regular schema migrations.What are your thoughts on this? If we agree that this is useful and decide on an interface, I'd like to take on this work and open a PR to implement it.
The text was updated successfully, but these errors were encountered: