Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2348] [Feature] dbt clone command #7256

Closed
3 tasks done
Tracked by #7301
jtcohen6 opened this issue Apr 1, 2023 · 8 comments · Fixed by #7881
Closed
3 tasks done
Tracked by #7301

[CT-2348] [Feature] dbt clone command #7256

jtcohen6 opened this issue Apr 1, 2023 · 8 comments · Fixed by #7881
Assignees
Labels
artifacts enhancement New feature or request state Stateful selection (state:modified, defer) Team:Adapters Issues designated for the adapter area of the code

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 1, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Big idea: Clone my production state into my development schema, please!

Implementation details:

  • Sorta similar to --defer. Using the --state manifest, but actually create zero-copy clones/pointers to the "prod" versions
  • If the data platform supports create or replace table ... clone, use it!
  • If it doesn't, or if the prod version isn't a table, just create a simple view instead: select * from {prod_version}
  • By default, we shouldn't recreate already-existing relations in the current target, but users could override this via --full-refresh
  • Support a ton of threads! No need to go in DAG order for this.

Describe alternatives you've considered

Previously, I had thought (& written) that this should happen as part of dbt run --defer, as another deferral mode (instead of rewriting upstream references).

But I think that "cloning" should support node selection, too. I don't think there's a nice way to implicitly clone unselected resources and build selected resources, even if it's the ultimate UX we want to achieve.

Instead, you could do something like:

$ dbt clone --exclude state:modified+ --state path/to/manifest
$ dbt build --select state:modified+ --state path/to/manifest

Who will this benefit?

Anyone who wants to avoid rerunning lots of models in dev/CI — but who still want to have a "full" dev/CI schema, with a pointer/clone of every relational object, in order to hook up to downstream querying (e.g. BI)

Are you interested in contributing this feature?

I'll push up a draft, and we can start some conversation there :)

Anything else?

Lengthy previous discussion in:

Specifically, from #5095 (comment):

You could picture it as something like:

$ dbt clone -s -exclude state:modified+ --threads 10
$ dbt build -s state:modified+

Except, what if dbt did the first step for you implicitly?

Except, what if it actually didn't? And the pseudo-command I wrote above, back in December, is exactly how it should work? :)

@jtcohen6 jtcohen6 added enhancement New feature or request Team:Execution Team:Adapters Issues designated for the adapter area of the code Refinement Maintainer input needed labels Apr 1, 2023
@jtcohen6 jtcohen6 self-assigned this Apr 1, 2023
@github-actions github-actions bot changed the title [Feature] dbt clone command [CT-2348] [Feature] dbt clone command Apr 1, 2023
@jtcohen6 jtcohen6 mentioned this issue Apr 1, 2023
9 tasks
@pratik60
Copy link

pratik60 commented Apr 4, 2023

This is really cool, and could immediately see it useful to us as we're just migrating from Redshift to Snowflake!

I was reading this where it inferred that its much fast faster to copy individual tables over copying databases: https://select.dev/posts/snowflake-clones

As we're looking at manifest.json, I'm assuming that's what you're doing anyways underneath the hood, and leveraging tonnes of threads?

@jtcohen6
Copy link
Contributor Author

@pratik60 Exactly right, on all counts!

To get a sense for feasibility, and identify risks, I've drafted a first stab in #7258. I dropped a whole bunch of questions inline. Most of the questions are around technical implementation decisions, rather than the product/user experience, so I think it's appropriate to start chatting about this with more folks.

@jtcohen6 jtcohen6 removed the Refinement Maintainer input needed label Apr 10, 2023
@jtcohen6 jtcohen6 removed their assignment Apr 10, 2023
@jtcohen6 jtcohen6 added artifacts state Stateful selection (state:modified, defer) labels Apr 10, 2023
@iknox-fa
Copy link
Contributor

Per BLG 4/1/23
@jtcohen6
We need this to be broken out into separate tickets for each adapter (and possibly further broken down although we don't know enough about the method of implementation to say exactly how)

@iknox-fa iknox-fa added the Refinement Maintainer input needed label Apr 10, 2023
@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Apr 10, 2023

Separate tickets for each adapter makes sense! We can do that as part of fleshing out the epic tasklist (#7301)

It might make sense to dig into the comments/questions I left in #7258 to understand other ways we'd want to break this down. I'm perfectly happy if we use that draft PR as proof of feasibility, and to align on where the gotchas are here — and then throw all the code away and start fresh.

@sungchun12
Copy link
Contributor

I can't wait to throw this demo away in favor of this native feature: https://www.loom.com/share/bcfd2cf3b4b5471683bfc5b24587db3d

Let me know how I can help. Got a reference jinja macro in there too!

@Fleid
Copy link
Contributor

Fleid commented Apr 21, 2023

We need to talk about MVs in that context. Sorry I'm just planting a flag at the moment ><

@jtcohen6
Copy link
Contributor Author

@Fleid on discutera dans 3 minutes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifacts enhancement New feature or request state Stateful selection (state:modified, defer) Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants