Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First-class catalog and schema setting #1979

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

lennartkats-db
Copy link
Contributor

Changes

This is the latest, experimental way to add first-class 'catalog' and 'schema' notion.

The basic idea is that databricks.yml can say

targets:
  dev:
    ...
    presets:
      catalog: dev
      schema: ${workspace.current_user.short_name} # the user's name, e.g. lennart_kats

  prod:
    ...
    presets:
      catalog: prod
      schema: finance

which will then configure the default schema for all resources in the bundle (pipelines, jobs, model serving endpoints, etc.)

A caveat exists for notebooks, which need use parameters to configure the catalog and schema. While the catalog and schema parameter values are automatically passed to all job tasks, notebooks need to consume the parameter values. We check whether they do this, and otherwise show a recommendation:

Recommendation: Use the 'catalog' and 'schema' parameters provided via 'presets.catalog' and 'presets.schema' using

  dbutils.widgets.text('catalog')
  dbutils.widgets.text('schema')
  catalog = dbutils.widgets.get('catalog')
  schema = dbutils.widgets.get('schema')
  spark.sql(f'USE {catalog}.{schema}')

  in src/notebook.ipynb:1:1

Note that the code above also helps for interactive notebook development scenarios: users can use the parameter widgets to set the catalog and schema they use during development.

Similarly, for Python and Wheel tasks, users must add some extra code to process a catalog and schema parameter. For Python tasks we show a similar recommendation; for wheel tasks we can't directly check for this.

Tests

This PR has basic tests while in draft. We'll mainly want to use testing to help maintain this code as new resources are added: we should have a reflection-based test that verifies that any new resource types and/or resource properties that set the catalog and schema can be defaulted using presets.catalog/schema.

@lennartkats-db lennartkats-db changed the title [RFC] Presets catalog schema as params [RFC] First-class catalog and schema setting Dec 9, 2024
Copy link
Contributor

@fjakobs fjakobs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for having first-class support for catalog/schema. I don't have a strong opinion if it should be under presets or a global setting. I think both could work.

@eng-dev-ecosystem-bot
Copy link
Collaborator

Test Details: go/deco-tests/12253819228

@lennartkats-db lennartkats-db changed the title [RFC] First-class catalog and schema setting [WIP] First-class catalog and schema setting Dec 16, 2024
…esets-catalog-schema-as-params' by 67 commits. # (use "git push" to publish your

local commits) # # Changes to be committed: # modified:  dbt-sql/databricks_template_schema.json # modified:
default-python/databricks_template_schema.json # modified:  default-python/template/{{.project_name}}/databricks.yml.tmpl # modified:
default-python/template/{{.project_name}}/resources/{{.project_name}}.job.yml.tmpl # modified:
default-python/template/{{.project_name}}/resources/{{.project_name}}.pipeline.yml.tmpl # modified:
default-python/template/{{.project_name}}/scratch/exploration.ipynb.tmpl # modified:
default-python/template/{{.project_name}}/src/notebook.ipynb.tmpl # modified:
default-python/template/{{.project_name}}/src/{{.project_name}}/main.py.tmpl # modified:  default-sql/databricks_template_schema.json # # Untracked
files: # ../../../.cursorrules # ../../../bundle/config/resources/:tmp:tmp.py # ../../../delme.py # ../../../pr-cache-current-user-me #
../../../pr-cleanup-warnings.md # ../../../pr-contrib-templates.md # ../../../pr-cp-diag-ids-for-all.md # ../../../pr-cp-serverless-templates.md #
../../../pr-presets-catalog-schema-using-params.md # ../../../pr-update-sync-command-help.md #
Revert template changes for now
@lennartkats-db lennartkats-db changed the title [WIP] First-class catalog and schema setting First-class catalog and schema setting Dec 21, 2024
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

  • PR number: 1979
  • Commit SHA: 204d3b08d13b90b71f3236313412f72b5333908e

Checks will be approved automatically on success.

@lennartkats-db lennartkats-db marked this pull request as ready for review December 21, 2024 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants