-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only create schemas for selected nodes (#1239) #1258
Conversation
- pass selected uids along to the before_run method for schema creation - only schemas used by selected nodes are created - fix some big issues with check_schema_exists (such as: it does not work) - integration test changes to test this
9e6ca80
to
9ff8705
Compare
a4b77e4
to
7d332aa
Compare
Ignore that last force-push, it's not a change! Just reverting an accidental wrong-branch commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of comments, but otherwise this looks great
@@ -241,4 +241,4 @@ def check_schema_exists(self, database, schema, model_name=None): | |||
kwargs={'database': database, 'schema': schema}, | |||
connection_name=model_name | |||
) | |||
return results[0] > 0 | |||
return results[0][0] > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the existing check_schema_exists
was wrong and did not work.
schemas = set() | ||
for node in manifest.nodes.values(): | ||
if node.unique_id not in selected_uids: | ||
continue | ||
if cls.is_refable(node) and not cls.is_ephemeral(node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this going to cause a problem with Sources? I imagine sources are refable (idk if you actually made that change in the corresponding PR), but we definitely wouldn't want to touch source schemas at all! Might be worth being more explicit here? I'd only care about schemas for:
- models
- seeds
- archives (to come in Wilt Chamberlain)
No change required necessarily, but wanted to surface it because i saw it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sources are not refable
, so no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #1239
This PR passes along the set of selected UIDs to the
before_run
method of the chosen Runner class, which is then used to filter out the nodes that dbt needs to build schemas for. We still unconditionally create the default schema, for snowflake.Also I have no idea what was up with
check_schema_exists
, but it's fixed.For tests, I just modified some of the existing graph selection tests, which seemed sufficient.