Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix _dbt_max_partition declaration and initialization for BigQuery incremental models #2976

Merged
merged 3 commits into from
Feb 18, 2021
Merged

Fix _dbt_max_partition declaration and initialization for BigQuery incremental models #2976

merged 3 commits into from
Feb 18, 2021

Conversation

pcasteran
Copy link
Contributor

@pcasteran pcasteran commented Dec 24, 2020

resolves #2940

Description

Fixed the declaration and initialization of the _dbt_max_partition variable for BigQuery incremental models to be able to declare additional variables.

I only fixed the problem described in the issue #2940 and did not implemented the additional change you proposed @jtcohen6.
I don't really like the idea of mixing legacy sql execution, followed by client-side computation to find the latest partition and finally the actual model.
I would rather keep the declaration of the _dbt_max_partition variable like this and add a model option to be able to deactivate it when not needed in order to save processing time and money.

Maybe it is better to open a separate issue to discuss this and keep this PR only for the variable declaration issue.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Dec 24, 2020
@pcasteran pcasteran marked this pull request as ready for review December 24, 2020 23:47
@jtcohen6 jtcohen6 changed the base branch from dev/kiyoshi-kuromiya to dev/margaret-mead January 19, 2021 10:12
@jtcohen6
Copy link
Contributor

Thanks for the PR @pcasteran!

I don't really like the idea of mixing legacy sql execution, followed by client-side computation to find the latest partition and finally the actual model.

Totally fair. We're all just trying to optimize for BigQuery's cost structure, and in this (weird) case there's a powerful feature available in legacySQL that's yet to be implemented for standardSQL.

I would rather keep the declaration of the _dbt_max_partition variable like this and add a model option to be able to deactivate it when not needed in order to save processing time and money.

Also reasonable. Perhaps we should add support for a new config that would override the default value used within dbt today. So, for instance:

  • If the user sets the config to None, we'd skip setting the _dbt_max_partition var entirely, and save the cost of the standardSQL query
  • If the user wishes, they could set the config using adapter.get_partitions_metadata(target_relation) to get a $0 dynamic metadata lookup (though using legacySQL)
  • If the user wanted to set the max partition dynamically based on an arbitrary query, even a ref to a separate model, they could

Maybe it is better to open a separate issue to discuss this and keep this PR only for the variable declaration issue.

Would you be up to open that issue? Then I'd be happy to approve this narrower PR for v0.20.0.

@jtcohen6 jtcohen6 changed the base branch from dev/margaret-mead to develop February 8, 2021 11:05
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be up to open that issue? Then I'd be happy to approve this narrower PR for v0.20.0.

Update: This issue is effectively already open in #2278. I'll move my above comment over there.

@pcasteran Could you pull the changes from develop (new base branch), and move your changelog entry up to v0.20.0?

@pcasteran
Copy link
Contributor Author

@jtcohen6 as requested I rebased my branch on the new develop branch and moved the changelog entry up to v0.20.0.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @pcasteran!

@jtcohen6 jtcohen6 merged commit 8996cb1 into dbt-labs:develop Feb 18, 2021
@pcasteran pcasteran deleted the fix/2940-bq-incremental-var-declaration branch February 19, 2021 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery - Unable to use the "set_sql_header" macro to declare variables for incremental models
2 participants