Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making package BigQuery compatible #32

Closed
wants to merge 6 commits into from

Conversation

DylanBaker
Copy link

A handful of changes in here to make the package BigQuery compatible:

  • Adds a nested_field macro which formats columns correctly, depending on the warehouse it is using. (This change has only been pushed for the Stitch data, which was the source it was being tested on.)
  • Add a stitch_base-table macro, which de-dupes Stitch data on BigQuery
  • Changes concat to dbt_utils.concat macro for cross-db support.

I think the two new macros could probably move to https://github.com/fishtown-analytics/stitch-utils. We implemented specifically for this repo. Thought it might be easier to first push this change and then discuss the broader implications for the stitch-utils package after. Happy to discuss!

@DylanBaker
Copy link
Author

@axelazaid @jtcohen6 Thought you two might be the best two to discuss with?

@jtcohen6
Copy link
Contributor

@DylanBaker I'd be very open to bringing nested_data and stitch_base_table into stitch-utils! (I've previously thought about adding something along the lines of the latter, so very happy to see it here.) Happy to work on this before, after, or alongside the current PR :)

@DylanBaker
Copy link
Author

@jtcohen6 I think my preference would be to do it in the following order (in part because I think it's easiest and in part because I'm most interested in getting it into facebook-ads quickest):

  • work on this here and merge into facebook-ads
  • bring the necessary logic into the stitch-utils package
  • change the facebook-ads package to use the logic from stitch-utils

Does that sound good to you?

@Qashraf
Copy link

Qashraf commented Apr 9, 2020

Hey @jtcohen6 , Not sure what the update with this is but like I mentioned @DylanBaker I used this branch and it worked for my needs. It seems like there's a conflict but that could probably be cleaned up with potentially an additional bigquery variable? Let me know if there's anything I can do to help get this PR cleaned up and added to to the package for everyone to use

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in this PR looks reasonable. I'll be honest, it's been some time since I last used this package, so it's good to hear that it met your needs @Qashraf.

@DylanBaker If you get a chance to fix the merge conflict, I think we could merge this and cut a new minor release of the package. It doesn't seem like there are any breaking changes, but it's been a long time since the last patch release.


{% macro bigquery__stitch_base_table(tablename, partition_fields=['id']) %}

(select *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see wanting this to be

Suggested change
(select *
(select * except (is_last_stitch_record)

Totally up to you on that one.

@clrcrl clrcrl mentioned this pull request Jun 12, 2020
@clrcrl
Copy link
Contributor

clrcrl commented Jul 20, 2020

Closing since it's stale

@clrcrl clrcrl closed this Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants