Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an experimental default-sql template #1051

Merged
merged 35 commits into from
Feb 19, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c71aa4a
Add a default-sql template
lennartkats-db Dec 11, 2023
7abb3ad
Merge branch 'main' into sql-template
pietern Dec 12, 2023
2ce4ce3
Add missing file
lennartkats-db Dec 13, 2023
9899d16
Only do SQL files for now
lennartkats-db Dec 13, 2023
19e25f5
Merge branch 'sql-template' of github.com:lennartkats-db/cli into sql…
lennartkats-db Dec 13, 2023
bd1b78d
Use a template for VS Code settings
lennartkats-db Dec 13, 2023
7828c35
Add missing files
lennartkats-db Dec 13, 2023
4b7c4a0
Update libs/template/templates/default-sql/template/{{.project_name}}…
lennartkats-db Dec 14, 2023
78d22eb
Update cmd/bundle/init.go
lennartkats-db Dec 14, 2023
a9bdc64
Update libs/template/templates/default-sql/template/{{.project_name}}…
lennartkats-db Dec 14, 2023
ab43e1e
Process feedback
lennartkats-db Dec 15, 2023
c207991
Update description
lennartkats-db Dec 15, 2023
6124fbe
Merge branch 'sql-template' of github.com:lennartkats-db/cli into sql…
lennartkats-db Dec 15, 2023
74df04e
Remove workspace_host_override
lennartkats-db Dec 19, 2023
ace64dd
Merge remote-tracking branch 'databricks/main' into sql-template
lennartkats-db Dec 19, 2023
a502bb1
Add SQL extension configuration
lennartkats-db Jan 13, 2024
3aa501e
Merge remote-tracking branch 'databricks/main' into sql-template
lennartkats-db Jan 25, 2024
cc2f66d
Fix test
lennartkats-db Jan 25, 2024
93d7052
Support customizable catalog/schema
lennartkats-db Jan 25, 2024
9d6fb8c
Avoid using /Shared
lennartkats-db Jan 26, 2024
2becf55
Fix keyword
lennartkats-db Jan 26, 2024
18860ca
Fix parameter
lennartkats-db Jan 26, 2024
a74a19d
Improve setup DX, support non-UC workspaces
lennartkats-db Jan 27, 2024
e5fab2d
Remove from list of templates for now
lennartkats-db Jan 28, 2024
d46d247
Add README.md
lennartkats-db Jan 28, 2024
97ef8fc
Fix test
lennartkats-db Jan 28, 2024
12b77ab
Merge remote-tracking branch 'databricks/main' into sql-template
lennartkats-db Jan 28, 2024
22176b2
Mark as experimental
lennartkats-db Jan 29, 2024
d471666
Restore sql-default template in hidden form
lennartkats-db Feb 19, 2024
bf70431
Copy-editing
lennartkats-db Feb 19, 2024
dcc3cb2
Merge remote-tracking branch 'databricks/main' into sql-template
lennartkats-db Feb 19, 2024
bdbd7f7
Merge remote-tracking branch 'databricks/main' into sql-template
lennartkats-db Feb 19, 2024
1c8f9ff
Incorporate feedback
lennartkats-db Feb 19, 2024
8572b3d
Merge remote-tracking branch 'databricks/main' into sql-template
lennartkats-db Feb 19, 2024
64851e7
Incorporate feedback
lennartkats-db Feb 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions cmd/bundle/init.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ var nativeTemplates = []nativeTemplate{
name: "default-python",
description: "The default Python template",
},
{
name: "default-sql",
description: "The default SQL template for .sql files that run with Workflows",
},
{
name: "mlops-stacks",
gitUrl: "https://github.com/databricks/mlops-stacks",
Expand Down
8 changes: 0 additions & 8 deletions cmd/bundle/init_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,6 @@ func TestBundleInitRepoName(t *testing.T) {
assert.Equal(t, "www.github.com", repoName("https://www.github.com"))
}

func TestNativeTemplateOptions(t *testing.T) {
assert.Equal(t, []string{"default-python", "mlops-stacks"}, nativeTemplateOptions())
}

func TestNativeTemplateDescriptions(t *testing.T) {
assert.Equal(t, "- default-python: The default Python template\n- mlops-stacks: The Databricks MLOps Stacks template (https://github.com/databricks/mlops-stacks)", nativeTemplateDescriptions())
}

func TestGetUrlForNativeTemplate(t *testing.T) {
assert.Equal(t, "https://github.com/databricks/mlops-stacks", getUrlForNativeTemplate("mlops-stacks"))
assert.Equal(t, "https://github.com/databricks/mlops-stacks", getUrlForNativeTemplate("mlops-stack"))
Expand Down
22 changes: 17 additions & 5 deletions libs/template/renderer_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,10 @@ func assertFilePermissions(t *testing.T, path string, perm fs.FileMode) {
assert.Equal(t, perm, info.Mode().Perm())
}

func assertBuiltinTemplateValid(t *testing.T, settings map[string]any, target string, isServicePrincipal bool, build bool, tempDir string) {
func assertBuiltinTemplateValid(t *testing.T, template string, settings map[string]any, target string, isServicePrincipal bool, build bool, tempDir string) {
ctx := context.Background()

templatePath, err := prepareBuiltinTemplates("default-python", tempDir)
templatePath, err := prepareBuiltinTemplates(template, tempDir)
require.NoError(t, err)
libraryPath := filepath.Join(templatePath, "library")

Expand Down Expand Up @@ -98,7 +98,7 @@ func TestPrepareBuiltInTemplatesWithRelativePaths(t *testing.T) {
assert.Equal(t, "./default-python", dir)
}

func TestBuiltinTemplateValid(t *testing.T) {
func TestBuiltinPythonTemplateValid(t *testing.T) {
// Test option combinations
options := []string{"yes", "no"}
isServicePrincipal := false
Expand All @@ -114,7 +114,7 @@ func TestBuiltinTemplateValid(t *testing.T) {
"include_python": includePython,
}
tempDir := t.TempDir()
assertBuiltinTemplateValid(t, config, "dev", isServicePrincipal, build, tempDir)
assertBuiltinTemplateValid(t, "default-python", config, "dev", isServicePrincipal, build, tempDir)
}
}
}
Expand All @@ -136,10 +136,22 @@ func TestBuiltinTemplateValid(t *testing.T) {
require.NoError(t, err)
defer os.RemoveAll(tempDir)

assertBuiltinTemplateValid(t, config, "prod", isServicePrincipal, build, tempDir)
assertBuiltinTemplateValid(t, "default-python", config, "prod", isServicePrincipal, build, tempDir)
defer os.RemoveAll(tempDir)
}

func TestBuiltinSQLTemplateValid(t *testing.T) {
// Test prod mode + build
config := map[string]any{
"project_name": "my_project",
"workspace_host_override": "yes",
"http_path": "/sql/warehouses/123",
}
build := false
assertBuiltinTemplateValid(t, "default-sql", config, "dev", true, build, t.TempDir())
assertBuiltinTemplateValid(t, "default-sql", config, "prod", false, build, t.TempDir())
}

func TestRendererWithAssociatedTemplateInLibrary(t *testing.T) {
tmpDir := t.TempDir()

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"welcome_message": "\nWelcome to the default SQL template for Databricks Asset Bundles!",
"properties": {
"project_name": {
"type": "string",
"default": "my_sql_project",
"description": "Please provide the following details to tailor the template to your preferences.\n\nUnique name for this project",
"order": 1,
"pattern": "^[A-Za-z0-9_]+$",
"pattern_match_failure_message": "Name must consist of letters, numbers, and underscores."
},
"workspace_host_override": {
"comment": "We explicitly ask users for the workspace_host since we ask for a http_path below. A downside of doing this is that {{user_name}} may not be correct if they pick a different workspace than the one from the current profile.",
lennartkats-db marked this conversation as resolved.
Show resolved Hide resolved
"type": "string",
"pattern": "^https:\\/\\/[^/]+$",
"pattern_match_failure_message": "URL must be of the form https://my.databricks.host",
"description": "Workspace URL to use",
"default": "{{workspace_host}}",
"order": 3
},
"http_path": {
"type": "string",
"pattern": "^/sql/.\\../warehouses/[a-z0-9]+$",
"pattern_match_failure_message": "Path must be of the form /sql/1.0/warehouses/abcdef1234567890",
"description": "SQL warehouse path to use (find this path by clicking on \"Connection Details\" on a SQL warehouse",
"order": 4
}
},
"success_message": "✨ Your new project has been created in the '{{.project_name}}' directory!\n\nPlease refer to the README.md file for \"getting started\" instructions.\nSee also the documentation at https://docs.databricks.com/dev-tools/bundles/index.html."
}
7 changes: 7 additions & 0 deletions libs/template/templates/default-sql/library/versions.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{{define "latest_lts_dbr_version" -}}
13.3.x-scala2.12
fjakobs marked this conversation as resolved.
Show resolved Hide resolved
{{- end}}

{{define "latest_lts_db_connect_version_spec" -}}
>=13.3,<13.4
{{- end}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"recommendations": [
"databricks.databricks",
"redhat.vscode-yaml",
"databricks.sqltools-databricks-driver",
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"python.analysis.stubPath": ".vscode",
"databricks.python.envFile": "${workspaceFolder}/.env",
"jupyter.interactiveWindow.cellMarker.codeRegex": "^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])",
"jupyter.interactiveWindow.cellMarker.default": "# COMMAND ----------",
"python.testing.pytestArgs": [
"."
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"python.analysis.extraPaths": ["src"],
"files.exclude": {
"**/*.egg-info": true,
"**/__pycache__": true,
".pytest_cache": true,
},
pietern marked this conversation as resolved.
Show resolved Hide resolved
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# {{.project_name}}

The '{{.project_name}}' project was generated by using the default-sql template.

## Getting started

1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/install.html


2. Authenticate to your Databricks workspace:
```
$ databricks configure
```

3. To deploy a development copy of this project, type:
```
$ databricks bundle deploy --target dev
```
(Note that "dev" is the default target, so the `--target` parameter
is optional here.)

This deploys everything that's defined for this project.
For example, the default template would deploy a job called
`[dev yourname] {{.project_name}}_job` to your workspace.
You can find that job by opening your workpace and clicking on **Workflows**.

4. Similarly, to deploy a production copy, type:
```
$ databricks bundle deploy --target prod
```

5. To run a job or pipeline, use the "run" command:
```
$ databricks bundle run
lennartkats-db marked this conversation as resolved.
Show resolved Hide resolved
```

6. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
https://docs.databricks.com/dev-tools/vscode-ext.html.

7. For documentation on the Databricks Asset Bundles format used
for this project, and for CI/CD configuration, see
https://docs.databricks.com/dev-tools/bundles/index.html.
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# This is a Databricks asset bundle definition for {{.project_name}}.
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
bundle:
name: {{.project_name}}

include:
- resources/*.yml

# Variable declarations. These variables are assigned in the dev/prod targets below.
variables:
warehouse_id:
description: The warehouse to use

# Deployment targets.
targets:
# The 'dev' target, for development purposes. This target is the default.
dev:
# We use 'mode: development' to indicate this is a personal development copy:
# - Deployed resources get prefixed with '[dev my_user_name]'
# - Any job schedules and triggers are paused by default
# - The 'development' mode is used for Delta Live Tables pipelines
mode: development
default: true
workspace:
host: {{.workspace_host_override}}
variables:
warehouse_id: {{index ((regexp "[^/]+$").FindStringSubmatch .http_path) 0}}

## Optionally, there could be a 'staging' target here.
## (See Databricks docs on CI/CD at https://docs.databricks.com/dev-tools/bundles/index.html.)
lennartkats-db marked this conversation as resolved.
Show resolved Hide resolved
#
# staging:
# workspace:
# host: {{.workspace_host_override}}

# The 'prod' target, used for production deployment.
prod:
# We use 'mode: production' to indicate this is a production deployment.
# Doing so enables strict verification of the settings below.
mode: production
workspace:
host: {{.workspace_host_override}}
# We only have a single deployment copy for production, so we use a shared path.
root_path: /Shared/.bundle/prod/${bundle.name}
variables:
warehouse_id: {{index ((regexp "[^/]+$").FindStringSubmatch .http_path) 0}}
{{- if not is_service_principal}}
run_as:
# This runs as {{user_name}} in production. We could also use a service principal here
# using service_principal_name (see https://docs.databricks.com/en/dev-tools/bundles/permissions.html).
user_name: {{user_name}}
{{end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# A job running a SQL query on a SQL warehouse
resources:
jobs:
{{.project_name}}_sql_job:
name: {{.project_name}}_sql_job

schedule:
# Run every day at 7:17 AM
quartz_cron_expression: '44 17 7 * * ?'
timezone_id: Europe/Amsterdam

{{- if not is_service_principal}}

email_notifications:
on_failure:
- {{user_name}}

{{else}}

{{end -}}

tasks:
- task_key: sample_1
sql_task:
warehouse_id: ${var.warehouse_id}
file:
path: ../src/sample_1.sql

- task_key: sample_2
depends_on:
- task_key: sample_1
sql_task:
warehouse_id: ${var.warehouse_id}
file:
path: ../src/sample_2.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# scratch

This folder is reserved for personal, exploratory notebooks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notebooks and SQL files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just talk about SQL files here. If we wanted to support notebooks then we should be adding a requirements.txt to configure the python env.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added SQL files. And notebooks are actually very useful for SQL at this point. They can be used with warehouses too (though probably not in the IDE, yet?)

Re. requirements.txt: is that used for notebooks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notebooks in the IDE currently use spark.sql so it's technically not the same. I could imagine adding a SQL mode to the notebooks just like in the webapp but that's currently not scoped.

Notebooks can be used without virtual envs but that would clutter the global Python module space. If we recommend notebooks then using virtual envs and having a requirements.txt would be best practice.

By default these are not committed to Git, as 'scratch' is listed in .gitignore.
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {},
"inputWidgets": {},
"nuid": "dc8c630c-1ea0-42e4-873f-e4dec4d3d416",
lennartkats-db marked this conversation as resolved.
Show resolved Hide resolved
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"%sql\n",
"SELECT * FROM json.`/databricks-datasets/nyctaxi/sample/json/`"
]
}
],
"metadata": {
"application/vnd.databricks.v1+notebook": {
"dashboards": [],
"language": "python",
"notebookMetadata": {
"pythonIndentUnit": 2
},
"notebookName": "exploration",
"widgets": {}
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
-- This query is executed using Databricks Workflows as defined in resources/{{.project_name}}_sql_job.yml.

CREATE OR REPLACE VIEW taxis AS
SELECT * FROM json.`/databricks-datasets/nyctaxi/sample/json/`
lennartkats-db marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
-- This query is executed using Databricks Workflows as defined in resources/{{.project_name}}_sql_job.yml.

SELECT * FROM taxis