Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataproc config for serverless in profile and fix quoting #578

Merged
merged 24 commits into from
Mar 9, 2023

Conversation

colin-rogers-dbt
Copy link
Contributor

resolves #350

Description

Checklist

Torkjel Hongve and others added 16 commits November 9, 2022 10:16
This adds a `dataproc_batch` key for specifying the Dataproc Batch
configuration. At runtime this is used to populate the
google.cloud.dataproc_v1.types.Batch object before it is submitted to
the Dataproc service.

To avoid having to add explicit support for every option offered by the
service, and having to chase after a moving target as Google's API evolves,
this key accepts arbitrary yaml, which is mapped to the Batch object on
a best effort basis.

Signed-off-by: Torkjel Hongve <th@kinver.io>
- Make dataproc_batch key optional.
- Unit tests
- Move configuration of the `google.cloud.dataproc_v1.Batch` object
  to a separate function.

Signed-off-by: Torkjel Hongve <th@kinver.io>
@colin-rogers-dbt colin-rogers-dbt requested a review from a team as a code owner March 3, 2023 21:13
@cla-bot cla-bot bot added the cla:yes label Mar 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2023

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-bigquery contributing guide.

@@ -1,7 +1,7 @@
# install latest changes in dbt-core
# TODO: how to automate switching from develop to version branches?
git+https://github.com/dbt-labs/dbt-core.git#egg=dbt-core&subdirectory=core
git+https://github.com/dbt-labs/dbt-core.git#egg=dbt-tests-adapter&subdirectory=tests/adapter
git+https://github.com/dbt-labs/dbt-core.git@addPyRelationNameMacro#egg=dbt-core&subdirectory=core
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dbt-core pointer once approved

@@ -184,6 +221,25 @@ def test_acquire_connection_oauth_validations(self, mock_open_connection):
connection.handle
mock_open_connection.assert_called_once()

@patch('dbt.adapters.bigquery.connections.get_bigquery_defaults', return_value=('credentials', 'project_id'))
@patch('dbt.adapters.bigquery.BigQueryConnectionManager.open', return_value=_bq_conn())
def test_acquire_connection_dataproc_serverless(self, mock_open_connection, mock_get_bigquery_defaults):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does this go beyond flake8 and black line length?

@colin-rogers-dbt
Copy link
Contributor Author

colin-rogers-dbt commented Mar 9, 2023

@cla-bot check

1 similar comment
@colin-rogers-dbt
Copy link
Contributor Author

@cla-bot check

@cla-bot cla-bot bot added the cla:yes label Mar 9, 2023
@cla-bot
Copy link

cla-bot bot commented Mar 9, 2023

The cla-bot has been summoned, and re-checked this pull request!

@colin-rogers-dbt colin-rogers-dbt merged commit f2f0e42 into main Mar 9, 2023
@colin-rogers-dbt colin-rogers-dbt deleted the feature/configure-serverless-dataproc-batch branch March 9, 2023 17:31
@github-actions
Copy link
Contributor

The backport to 1.4.latest failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.4.latest 1.4.latest
# Navigate to the new working tree
cd .worktrees/backport-1.4.latest
# Create a new branch
git switch --create backport-578-to-1.4.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f2f0e4272c42b2c2b59847279373c36fb08e3e37
# Push it to GitHub
git push --set-upstream origin backport-578-to-1.4.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.4.latest

Then, create a pull request where the base branch is 1.4.latest and the compare/head branch is backport-578-to-1.4.latest.

colin-rogers-dbt added a commit that referenced this pull request Mar 10, 2023
colin-rogers-dbt added a commit that referenced this pull request Mar 10, 2023
mirnawong1 added a commit to dbt-labs/docs.getdbt.com that referenced this pull request Jul 3, 2023
#3661)

Resolves #3382

## What are you changing in this pull request and why?
The dataproc_batch config field for configuring Python models running on
GCP Dataproc Serverless was added in
[dbt-labs/dbt-bigquery#578](dbt-labs/dbt-bigquery#578),
but it isn't documented yet. This PR adds the missing documentation for
this field.

## Checklist
- [ ] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
and [About
versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version)
so my content adheres to these guidelines.
- [ ] Add a checklist item for anything that needs to happen before this
PR is merged, such as "needs technical review" or "change base branch."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1336] [Feature] allow setting networkUri and subnetworkUri for Dataproc Serverless batches
7 participants