Make partition metadata available to BigQuery users #2596

ran-eh · 2020-06-27T02:14:45Z

No description provided.

cla-bot · 2020-06-27T02:34:30Z

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @ran-eh

drewbanin · 2020-07-01T13:30:15Z

hey @ran-eh - thanks for opening this PR! Are you able to sign the CLA attached above? Once that's done, we'd be happy to take a look at the code in here :D

ran-eh · 2020-07-03T03:36:18Z

Thanks @drewbanin . I hope to have it signed next week.

ran-eh · 2020-07-18T01:01:34Z

@drewbanin, did you receive the signed CLA? Paypal legal say they sent it on Tuesday.

jtcohen6 · 2020-07-18T18:28:06Z

@cla-bot check

cla-bot · 2020-07-18T18:28:10Z

The cla-bot has been summoned, and re-checked this pull request!

ran-eh · 2020-07-18T19:50:01Z

@drewbanin @jtcohen6 Awesome! Can't wait for your review!

jtcohen6

@ran-eh Thank you for the contribution! And for wrangling the CLA :)

I left two comments around implementation specifics. More broadly:

It looks like there are some pep8 (python style) errors. See the flake8 report in circle.
You should add an integration test that executes get_partitions_metadata on a partitioned table (of known fixture data). Knowing the query succeeds is the minimum; even better to check that the columns and row count of the agate result match expectations. I think these tests could be a good starting point.

jtcohen6 · 2020-07-19T19:52:15Z

plugins/bigquery/dbt/adapters/bigquery/connections.py

+    # copy-pasted from raw_execute().  This is done in order to discorage
+    # use of legacySQL queries in DBT, except to obtain partition metadata.
+    # the method would be removed when partition metadata becomes available
+    # from standardSQL.


To get this working, I totally agree with the approach of duplicating a lot of code between raw_execute and _raw_execute_legacy_sql. The main benefit is that we avoid changing any existing code to support legacy SQL, which is good.

With the benefit of hindsight, it looks like we could avoid all code duplication by adding an optional argument to default raw_execute:

def raw_execute(self, sql, fetch=False, use_legacy_sql=False): conn = self.get_thread_connection() client = conn.handle logger.debug('On {}: {}', conn.name, sql) job_params = {'use_legacy_sql': use_legacy_sql}

Then, get_partitions_metadata below can just call:

_, iterator = self.raw_execute(sql, fetch='fetch_result', use_legacy_sql=True)

The ability to run arbitrary legacy SQL would still be unavailable from the Jinja environment. The main qualm would be an additional argument for an existing

@beckjake I see this as a code style question and totally defer to you here.

Yes, I agree with @jtcohen6 - the key here is that it's not jinja-accessible, internally it's fine if we expose legacy SQL.

The only thing I might do slightly differently is make use_legacy_sql a keyword-only argument by writing it as def raw_execute(self, sql, fetch=False, *, use_legacy_sql=False):.

jtcohen6 · 2020-07-19T20:01:16Z

plugins/bigquery/dbt/adapters/bigquery/connections.py

+
+        return query_job, iterator
+
+    def get_partitions_metadata(self, table_id):


IMO this method should take a relation as its argument, instead of a string (table_id). This change would mean that:

We can construct the legacy SQL table reference using relation components, rather than relying on from_string()

Users can pass a ref(), source(), or relation object to the Jinja macro directly. I expect the most common use case (incremental models) to call this as get_partitions_metadata(this).

jtcohen6 · 2020-08-14T17:03:29Z

@ran-eh Have you had a chance to take another look at this? We're planning to cut a release candidate of v0.18.0 soon.

…iyoshi-kuromiya)

jtcohen6 · 2020-10-12T15:40:23Z

Huzzah, passing tests! Could you:

Revert the change in 78bd7c9. I don't think it was the cause of the failing integration test.
Try adding an integration test, modeled off these, that runs get_partitions_metadata against a partitioned table and checks the length of the results. I'm happy to help with this piece + getting integration tests running locally

* Add tests using get_partitions_metadata * Readd asterisk to raw_execute

jtcohen6

@ran-eh Glad we got this over the finish line! Could you:

Changelog: under the v0.19.0 section, add a note for this feature, and add yourself as a contributor
open a new issue laying out the performance gains / cost savings we could realize by using adapter.get_partitions_metadata in the "dynamic" insert_overwrite incremental materialization, instead of the current select max(partition)

jtcohen6 · 2020-10-23T00:58:12Z

Thanks for the contribution @ran-eh!

ran-eh force-pushed the re-partition-metadata branch 4 times, most recently from 550a5d1 to 8b5b156 Compare June 27, 2020 02:34

dbt-labs deleted a comment from cla-bot bot Jul 1, 2020

jtcohen6 linked an issue Jul 7, 2020 that may be closed by this pull request

Support bq legacySQL queries, to access partition metadata #2552

Closed

cla-bot bot added the cla:yes label Jul 18, 2020

jtcohen6 reviewed Jul 19, 2020

View reviewed changes

ran-eh force-pushed the re-partition-metadata branch from 79aa622 to 82ddedd Compare October 11, 2020 00:40

Make partition metadata available to BigQuery users (rebased to dev/k…

cce5945

…iyoshi-kuromiya)

ran-eh force-pushed the re-partition-metadata branch from 82ddedd to cce5945 Compare October 11, 2020 00:44

ran-eh changed the base branch from dev/marian-anderson to dev/kiyoshi-kuromiya October 11, 2020 01:00

ran-eh added 3 commits October 11, 2020 11:03

Accommodate first round of comments

eda8641

Eliminate pep8 errors

d74df86

Eliminate asterisk from raw_execute to try an fix integration error

78bd7c9

Add tests for get_partitions_metadata (#1)

c31ba10

* Add tests using get_partitions_metadata * Readd asterisk to raw_execute

jtcohen6 approved these changes Oct 21, 2020

View reviewed changes

ran-eh and others added 3 commits October 22, 2020 15:53

Update CHANGELOG.md

3a751bc

merge from dev/kiyoshi-kuromiya

7503f0c

Merge branch 'dev/kiyoshi-kuromiya' into re-partition-metadata

bf5835d

jtcohen6 merged commit e945bca into dbt-labs:dev/kiyoshi-kuromiya Oct 23, 2020

jtcohen6 mentioned this pull request Dec 7, 2020

BigQuery - Unable to use the "set_sql_header" macro to declare variables for incremental models #2940

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make partition metadata available to BigQuery users #2596

Make partition metadata available to BigQuery users #2596

ran-eh commented Jun 27, 2020

cla-bot bot commented Jun 27, 2020

drewbanin commented Jul 1, 2020

ran-eh commented Jul 3, 2020

ran-eh commented Jul 18, 2020

jtcohen6 commented Jul 18, 2020

cla-bot bot commented Jul 18, 2020

ran-eh commented Jul 18, 2020

jtcohen6 left a comment

jtcohen6 Jul 19, 2020

beckjake Jul 20, 2020

jtcohen6 Jul 19, 2020

jtcohen6 commented Aug 14, 2020 •

edited

Loading

jtcohen6 commented Oct 12, 2020

jtcohen6 left a comment

jtcohen6 commented Oct 23, 2020


		return query_job, iterator

		def get_partitions_metadata(self, table_id):

Make partition metadata available to BigQuery users #2596

Make partition metadata available to BigQuery users #2596

Conversation

ran-eh commented Jun 27, 2020

cla-bot bot commented Jun 27, 2020

drewbanin commented Jul 1, 2020

ran-eh commented Jul 3, 2020

ran-eh commented Jul 18, 2020

jtcohen6 commented Jul 18, 2020

cla-bot bot commented Jul 18, 2020

ran-eh commented Jul 18, 2020

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 Jul 19, 2020

Choose a reason for hiding this comment

beckjake Jul 20, 2020

Choose a reason for hiding this comment

jtcohen6 Jul 19, 2020

Choose a reason for hiding this comment

jtcohen6 commented Aug 14, 2020 • edited Loading

jtcohen6 commented Oct 12, 2020

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 commented Oct 23, 2020

jtcohen6 commented Aug 14, 2020 •

edited

Loading