make dbt-bigquery python model easier to use #4174

wazi55 · 2023-10-03T14:32:38Z

[Re:] #4150

What are you changing in this pull request and why?

Checklist

Review the Content style guide and About versioning so my content adheres to these guidelines.
Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch."

Adding new pages (delete if not applicable):

Add page to website/sidebars.js
Provide a unique filename for the new page

Removing or renaming existing pages (delete if not applicable):

Remove page from website/sidebars.js
Add an entry website/static/_redirects
Ran link testing to update the links that point to the deleted page

vercel · 2023-10-03T14:32:43Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
docs-getdbt-com	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Nov 3, 2023 10:45am

dataders

Thanks @wazi55! These docs have needed some TLC and your contributions are certainly in the right direction. That said

I think there's some polishing needed, and
I'm not confident as to exactly what "good enough" is here, and would defer to the dbt-bigquery community

here's direct links to preview pages that I'd recommend having your colleagues (and the community) review for content and readability.

Once you get sign off, I can ask our product docs team to do a copy review

website/docs/docs/build/python-models.md

dataders · 2023-10-03T14:55:08Z

website/docs/docs/build/python-models.md

-Any user or service account that runs dbt Python models will need the following permissions(in addition to the required BigQuery permissions) ([docs](https://cloud.google.com/dataproc/docs/concepts/iam/iam)):
-```
-dataproc.batches.create
-dataproc.clusters.use
-dataproc.jobs.create
-dataproc.jobs.get
-dataproc.operations.get
-dataproc.operations.list
-storage.buckets.get
-storage.objects.create
-storage.objects.delete
-```


I assumed that this detailed information would live on somewhere else, like the new section you created within bigquery-configs?

dataders · 2023-10-03T14:56:44Z

website/docs/docs/build/python-models.md

-storage.objects.create
-storage.objects.delete
-```
+- Cluster Submission Method: Create or use an existing Dataproc Cluster [See example](/reference/resource-configs/bigquery-configs.md#submitting-a-python-model) within dbt_project.yml or yml file within the `models/` directory


see our style guidance on link formatting/naming

dataders · 2023-10-03T15:07:04Z

website/docs/docs/build/python-models.md

+- Cluster Submission Method: Create or use an existing Dataproc Cluster [See example](/reference/resource-configs/bigquery-configs.md#submitting-a-python-model) within dbt_project.yml or yml file within the `models/` directory

-**Installing packages:** If you are using a Dataproc Cluster (as opposed to Dataproc Serverless), you can add third-party packages while creating the cluster.
+- Serverless Submission Method: Dataproc Serverless does not require a ready cluster, but it can also mean the cluster is slower to start. [See example](/reference/resource-configs/bigquery-configs.md#submitting-a-python-model) submitting a job to a serverless cluster in the `.py` file

-Google recommends installing Python packages on Dataproc clusters via initialization actions:
- [How initialization actions are used](https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/README.md#how-initialization-actions-are-used)
- [Actions for installing via `pip` or `conda`](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/python)

-You can also install packages at cluster creation time by [defining cluster properties](https://cloud.google.com/dataproc/docs/tutorials/python-configuration#image_version_20): `dataproc:pip.packages` or `dataproc:conda.packages`.
+**Installing packages**: If you are using a Dataproc Cluster (as opposed to Dataproc Serverless), you can add third-party packages while creating the cluster with the [Spark BigQuery connector initialization action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors#bigquery-connectors). If you are using Dataproc Serverless, you can build your own [custom container image](https://cloud.google.com/dataproc-serverless/docs/guides/custom-containers#python_packages) with the packages you need.


perhaps something like this table is neater than what you have? and involves less repeated text?

submission method cluster serverless

notes can be pre-existing cluster. can also be used to create clusters Dataproc Serverless does not require a ready cluster, but it can also mean the cluster is slower to start.

packages you can add third-party packages while creating the cluster with the Spark BigQuery connector initialization action build your own custom container image with the packages you need

Co-authored-by: Anders <anders.swanson@dbtlabs.com>

…ction_revamp

mirnawong1 · 2023-11-03T10:39:50Z

hey @wazi55 - thanks for opening this pr! Anders provided some feedback, wanted to check if you had any add'l questions?

matthewshaver · 2024-07-30T19:28:26Z

Been over 8 months since this had any movement from the author. Closing due to age but we can re-open if this is something we need to visit again.

make dbt-bigquery python model easier to use

fd91164

wazi55 requested review from dataders and a team as code owners October 3, 2023 14:32

github-actions bot added content Improvements or additions to content size: medium This change will take up to a week to address labels Oct 3, 2023

vercel bot deployed to Preview October 3, 2023 14:39 View deployment

dataders mentioned this pull request Oct 3, 2023

module WHCode should allow for persisting tab selection as queryString #4175

Open

1 task

dataders requested changes Oct 3, 2023

View reviewed changes

Update website/docs/docs/build/python-models.md

ace6e62

Co-authored-by: Anders <anders.swanson@dbtlabs.com>

vercel bot deployed to Preview October 3, 2023 21:48 View deployment

Merge branch 'current' into wazi55/dbt-bigquery_python_model_build_se…

40234c9

…ction_revamp

vercel bot deployed to Preview October 4, 2023 13:19 View deployment

runleonarun added the new contributor Label for first-time contributors label Oct 4, 2023

Merge branch 'current' into wazi55/dbt-bigquery_python_model_build_se…

f2a9a4b

…ction_revamp

vercel bot deployed to Preview October 4, 2023 15:56 View deployment

Merge branch 'current' into wazi55/dbt-bigquery_python_model_build_se…

ab391be

…ction_revamp

vercel bot deployed to Preview October 12, 2023 03:25 View deployment

incorporating feedback from dbt to style and clean the page

2079654

github-actions bot added size: large This change will more than a week to address and might require more than one person and removed size: medium This change will take up to a week to address labels Oct 12, 2023

revert back to the old package json files

3066273

github-actions bot added size: medium This change will take up to a week to address and removed size: large This change will more than a week to address and might require more than one person labels Oct 12, 2023

vercel bot deployed to Preview October 12, 2023 04:11 View deployment

Merge branch 'current' into wazi55/dbt-bigquery_python_model_build_se…

7cbe8e5

…ction_revamp

vercel bot deployed to Preview October 13, 2023 02:19 View deployment

Merge branch 'current' into wazi55/dbt-bigquery_python_model_build_se…

6a0add3

…ction_revamp

vercel bot deployed to Preview October 17, 2023 16:38 View deployment

Merge branch 'current' into wazi55/dbt-bigquery_python_model_build_se…

872a8df

…ction_revamp

vercel bot deployed to Preview October 18, 2023 18:47 View deployment

Merge branch 'current' into wazi55/dbt-bigquery_python_model_build_se…

aaa3964

…ction_revamp

vercel bot deployed to Preview November 3, 2023 10:45 View deployment

matthewshaver closed this Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make dbt-bigquery python model easier to use #4174

make dbt-bigquery python model easier to use #4174

wazi55 commented Oct 3, 2023 •

edited

Loading

vercel bot commented Oct 3, 2023 •

edited

Loading

dataders left a comment

dataders Oct 3, 2023

dataders Oct 3, 2023

dataders Oct 3, 2023

mirnawong1 commented Nov 3, 2023

matthewshaver commented Jul 30, 2024

submission method	`cluster`	`serverless`
notes	can be pre-existing cluster. can also be used to create clusters	Dataproc Serverless does not require a ready cluster, but it can also mean the cluster is slower to start.
packages	you can add third-party packages while creating the cluster with the Spark BigQuery connector initialization action	build your own custom container image with the packages you need

make dbt-bigquery python model easier to use #4174

make dbt-bigquery python model easier to use #4174

Conversation

wazi55 commented Oct 3, 2023 • edited Loading

What are you changing in this pull request and why?

Checklist

vercel bot commented Oct 3, 2023 • edited Loading

dataders left a comment

Choose a reason for hiding this comment

dataders Oct 3, 2023

Choose a reason for hiding this comment

dataders Oct 3, 2023

Choose a reason for hiding this comment

dataders Oct 3, 2023

Choose a reason for hiding this comment

mirnawong1 commented Nov 3, 2023

matthewshaver commented Jul 30, 2024

wazi55 commented Oct 3, 2023 •

edited

Loading

vercel bot commented Oct 3, 2023 •

edited

Loading