Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make dbt-bigquery python model easier to use #4174

Conversation

wazi55
Copy link

@wazi55 wazi55 commented Oct 3, 2023

[Re:] #4150

What are you changing in this pull request and why?

Checklist

  • Review the Content style guide and About versioning so my content adheres to these guidelines.
  • Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch."

Adding new pages (delete if not applicable):

  • Add page to website/sidebars.js
  • Provide a unique filename for the new page

Removing or renaming existing pages (delete if not applicable):

  • Remove page from website/sidebars.js
  • Add an entry website/static/_redirects
  • Ran link testing to update the links that point to the deleted page

@wazi55 wazi55 requested review from dataders and a team as code owners October 3, 2023 14:32
@vercel
Copy link

vercel bot commented Oct 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs-getdbt-com ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 3, 2023 10:45am

@github-actions github-actions bot added content Improvements or additions to content size: medium This change will take up to a week to address labels Oct 3, 2023
Copy link
Contributor

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wazi55! These docs have needed some TLC and your contributions are certainly in the right direction. That said

  1. I think there's some polishing needed, and
  2. I'm not confident as to exactly what "good enough" is here, and would defer to the dbt-bigquery community

here's direct links to preview pages that I'd recommend having your colleagues (and the community) review for content and readability.

Once you get sign off, I can ask our product docs team to do a copy review

website/docs/docs/build/python-models.md Outdated Show resolved Hide resolved
Comment on lines -688 to -699
Any user or service account that runs dbt Python models will need the following permissions(in addition to the required BigQuery permissions) ([docs](https://cloud.google.com/dataproc/docs/concepts/iam/iam)):
```
dataproc.batches.create
dataproc.clusters.use
dataproc.jobs.create
dataproc.jobs.get
dataproc.operations.get
dataproc.operations.list
storage.buckets.get
storage.objects.create
storage.objects.delete
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that this detailed information would live on somewhere else, like the new section you created within bigquery-configs?

storage.objects.create
storage.objects.delete
```
- Cluster Submission Method: Create or use an existing Dataproc Cluster [See example](/reference/resource-configs/bigquery-configs.md#submitting-a-python-model) within dbt_project.yml or yml file within the `models/` directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 665 to 670
- Cluster Submission Method: Create or use an existing Dataproc Cluster [See example](/reference/resource-configs/bigquery-configs.md#submitting-a-python-model) within dbt_project.yml or yml file within the `models/` directory

**Installing packages:** If you are using a Dataproc Cluster (as opposed to Dataproc Serverless), you can add third-party packages while creating the cluster.
- Serverless Submission Method: Dataproc Serverless does not require a ready cluster, but it can also mean the cluster is slower to start. [See example](/reference/resource-configs/bigquery-configs.md#submitting-a-python-model) submitting a job to a serverless cluster in the `.py` file

Google recommends installing Python packages on Dataproc clusters via initialization actions:
- [How initialization actions are used](https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/README.md#how-initialization-actions-are-used)
- [Actions for installing via `pip` or `conda`](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/python)

You can also install packages at cluster creation time by [defining cluster properties](https://cloud.google.com/dataproc/docs/tutorials/python-configuration#image_version_20): `dataproc:pip.packages` or `dataproc:conda.packages`.
**Installing packages**: If you are using a Dataproc Cluster (as opposed to Dataproc Serverless), you can add third-party packages while creating the cluster with the [Spark BigQuery connector initialization action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors#bigquery-connectors). If you are using Dataproc Serverless, you can build your own [custom container image](https://cloud.google.com/dataproc-serverless/docs/guides/custom-containers#python_packages) with the packages you need.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps something like this table is neater than what you have? and involves less repeated text?

submission method cluster serverless
notes can be pre-existing cluster. can also be used to create clusters Dataproc Serverless does not require a ready cluster, but it can also mean the cluster is slower to start.
packages you can add third-party packages while creating the cluster with the Spark BigQuery connector initialization action build your own custom container image with the packages you need

Co-authored-by: Anders <anders.swanson@dbtlabs.com>
@runleonarun runleonarun added the new contributor Label for first-time contributors label Oct 4, 2023
@github-actions github-actions bot added size: large This change will more than a week to address and might require more than one person and removed size: medium This change will take up to a week to address labels Oct 12, 2023
@github-actions github-actions bot added size: medium This change will take up to a week to address and removed size: large This change will more than a week to address and might require more than one person labels Oct 12, 2023
@mirnawong1
Copy link
Contributor

hey @wazi55 - thanks for opening this pr! Anders provided some feedback, wanted to check if you had any add'l questions?

@matthewshaver
Copy link
Contributor

Been over 8 months since this had any movement from the author. Closing due to age but we can re-open if this is something we need to visit again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content new contributor Label for first-time contributors size: medium This change will take up to a week to address
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants