Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation for files with a .(resource-name).yml extension #1780

Merged
merged 38 commits into from
Oct 7, 2024

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Sep 19, 2024

Changes

We want to encourage a pattern of specifying only a single resource in a YAML file when the .(resource-type).yml extension is used (for example, .job.yml). This convention could allow us to bijectively map a resource YAML file to its corresponding resource in the Databricks workspace.

This PR:

  1. Emits a recommendation diagnostic when we detect this convention is being violated. We can promote this to a warning when we want to encourage this pattern more strongly.
  2. Visualises the recommendation diagnostics in the bundle validate command.

NOTE: While this PR also shows the recommendation for .yaml files, we do not encourage users to use this extension. We only support it here since it's part of the YAML standard and some existing users might already be using .yaml.

Tests

Unit tests and manually. Here's what an example output looks like:

Recommendation: define a single job in a file with the .job.yml extension.
  at resources.jobs.bar
     resources.jobs.foo
  in foo.job.yml:13:7
     foo.job.yml:5:7

The following resources are defined or configured in this file:
  - bar (job)
  - foo (job)

@shreyas-goenka shreyas-goenka changed the base branch from main to test-util-location September 24, 2024 15:32
@shreyas-goenka shreyas-goenka changed the title Add detection and info diagnostic for convention for .<resource-type>.yml files Add validation when a .(resource-name).yml extension is used Sep 24, 2024
@shreyas-goenka shreyas-goenka changed the title Add validation when a .(resource-name).yml extension is used Add validation for files with a.(resource-name).yml extensio Sep 24, 2024
@shreyas-goenka shreyas-goenka changed the title Add validation for files with a.(resource-name).yml extensio Add validation for files with a.(resource-name).yml extensios Sep 24, 2024
@shreyas-goenka shreyas-goenka changed the title Add validation for files with a.(resource-name).yml extensios Add validation for files with a.(resource-name).yml extension Sep 24, 2024
@shreyas-goenka shreyas-goenka changed the title Add validation for files with a.(resource-name).yml extension Add validation for files with a .(resource-name).yml extension Sep 24, 2024
@shreyas-goenka shreyas-goenka marked this pull request as ready for review September 24, 2024 15:54
@shreyas-goenka
Copy link
Contributor Author

Triggered nightlies against this PR...

@shreyas-goenka
Copy link
Contributor Author

Nightlies are green!

bundle/config/loader/process_include.go Outdated Show resolved Hide resolved

func validateFileFormat(r *config.Root, filePath string) diag.Diagnostics {
for _, typ := range resourceTypes {
for _, ext := range []string{fmt.Sprintf(".%s.yml", typ), fmt.Sprintf(".%s.yaml", typ)} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fjakobs Can you comment on why we should check both?

Useful context to include here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the plan is to have DABs in the workspace support both .yml and .yaml since asking users to change .yaml -> .yml is a bit arbitrary in a recommendation for DABs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree.

@fjakobs Can you provide a snippet we can include for future readers wrt the expected extensions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do recommend using yml but it feels a bit arbitrary to actively prohibit this.

It's a SHOULD in https://docs.google.com/document/d/1M4UqmNdpDgKMfuBVSghm_UtaDdYp4r-1SA0SuWsgG_k/edit#heading=h.bs9f118p0tig and not a must.

bundle/config/loader/process_include.go Outdated Show resolved Hide resolved
bundle/config/loader/process_include.go Outdated Show resolved Hide resolved
bundle/config/loader/process_include.go Show resolved Hide resolved
bundle/config/loader/process_include.go Show resolved Hide resolved
bundle/config/loader/process_include.go Outdated Show resolved Hide resolved
bundle/config/loader/process_include_test.go Outdated Show resolved Hide resolved
bundle/render/render_text_output.go Show resolved Hide resolved
bundle/render/render_text_output_test.go Outdated Show resolved Hide resolved
github-merge-queue bot pushed a commit that referenced this pull request Sep 25, 2024
…1788)

I plan to use this in #1780, to
set the line and column numbers as well for the locations.

gopatch file used:
```
@@
var x expression
var y expression
var z expression
@@
-bundletest.SetLocation(x, y, z)
+bundletest.SetLocation(x, y, []dyn.Location{{File: z}})
```
Base automatically changed from test-util-location to main September 25, 2024 16:21
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

Please update the PR summary to match the final version.

@fjakobs Could you suggest 1) a comment to include on the multiple extensions we apply this recommendation to, 2) consider including some kind of doc of text we can link to explaining why we make this recommendation. As is, we recommend it but don't explain why (although it could make sense in isolation).

Copy link
Contributor

@fjakobs fjakobs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about updating the templates? Is this going to happen in a follow up PR?

bundle/config/loader/process_include.go Show resolved Hide resolved
@shreyas-goenka
Copy link
Contributor Author

@fjakobs The templates have already been addressed in #1777. I'll also be letting mlops-stacks know about this.

Copy link
Contributor

@lennartkats-db lennartkats-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't offer a full review right now but a few nits

bundle/config/loader/process_include.go Outdated Show resolved Hide resolved
bundle/config/loader/process_include.go Outdated Show resolved Hide resolved
bundle/config/loader/process_include.go Outdated Show resolved Hide resolved
bundle/config/loader/process_include.go Outdated Show resolved Hide resolved
bundle/config/loader/process_include.go Outdated Show resolved Hide resolved
shreyas-goenka and others added 6 commits October 2, 2024 20:26
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
@shreyas-goenka shreyas-goenka added this pull request to the merge queue Oct 7, 2024
Merged via the queue into main with commit bca9c2e Oct 7, 2024
5 checks passed
@shreyas-goenka shreyas-goenka deleted the detect-convention branch October 7, 2024 09:23
andrewnester added a commit that referenced this pull request Oct 9, 2024
Important changes:

Workspace paths are automatically prefixed with `/Workspace`. In addition, all usage of path strings such as `/Workspace/${workspace.root_path}/...` in bundle configuration is automatically replaced with `${workspace.root_path}/...` and generates a warning as part of bundle validate.

If you have specified a custom `workspace.root_path`, `workspace.artifact_path`, or `workspace.file_path`, Databricks Asset Bundles automatically prefixes it with `/Workspace`, but if you use any of these as variables (for example, `my_config_path: /Workspace/${workspace.file_path}/config`), you need to update those entries to remove the /Workspace prefix to avoid the warning.

If you pass one of these as variables and prefix them in your code, you need to update your code to not do this.

This change is required because originally when the workspace file system was rooted at `/` and home directories were under `/Users`, to access workspace paths through the Databricks REST API you would use these paths directly. To access workspace paths from your code, you could use the `/Workspace` file path and home directories were also available under `/Workspace/Users`. To avoid this duality of workspace paths, as well as the ambiguity between workspace paths and Unity Catalog `/Volumes` paths, all workspace paths are prefixed with `/Workspace`.

Bundles:
 * Add an error if state files grow bigger than the export limit ([#1795](#1795)).
 * Always prepend bundle remote paths with /Workspace ([#1724](#1724)).
 * Add resource path field to bundle workspace configuration ([#1800](#1800)).
 * Add validation for files with a `.(resource-name).yml` extension ([#1780](#1780)).

Internal:
 * Remove deprecated or readonly fields from the bundle schema ([#1809](#1809)).

API Changes:
 * Changed `databricks git-credentials create` command . New request type is .
 * Changed `databricks git-credentials delete` command . New request type is .
 * Changed `databricks git-credentials delete` command to return .
 * Changed `databricks git-credentials get` command . New request type is .
 * Changed `databricks git-credentials get` command to return .
 * Changed `databricks git-credentials list` command to return .
 * Changed `databricks git-credentials update` command . New request type is .
 * Changed `databricks git-credentials update` command to return .
 * Changed `databricks repos create` command . New request type is .
 * Changed `databricks repos create` command to return .
 * Changed `databricks repos delete` command to return .
 * Changed `databricks repos get` command to return .
 * Changed `databricks repos update` command . New request type is .
 * Changed `databricks repos update` command to return .

OpenAPI commit 0c86ea6dbd9a730c24ff0d4e509603e476955ac5 (2024-10-02)
Dependency updates:
 * Upgrade TF provider to 1.53.0 ([#1815](#1815)).
 * Bump golang.org/x/term from 0.24.0 to 0.25.0 ([#1811](#1811)).
 * Bump golang.org/x/text from 0.18.0 to 0.19.0 ([#1812](#1812)).
 * Bump github.com/databricks/databricks-sdk-go from 0.47.0 to 0.48.0 ([#1810](#1810)).
github-merge-queue bot pushed a commit that referenced this pull request Oct 9, 2024
Notable changes for Databricks Asset Bundles:

Workspace paths are automatically prefixed with `/Workspace`. In
addition, all usage of path strings such as
`/Workspace/${workspace.root_path}/...` in bundle configuration is
automatically replaced with `${workspace.root_path}/...` and generates a
warning as part of bundle validate.

More details can be find here:
https://docs.databricks.com/en/release-notes/dev-tools/bundles.html#workspace-paths-will-be-automatically-prefixed

Bundles:
* Add an error if state files grow bigger than the export limit
([#1795](#1795)).
* Always prepend bundle remote paths with /Workspace
([#1724](#1724)).
* Add resource path field to bundle workspace configuration
([#1800](#1800)).
* Add validation for files with a `.(resource-name).yml` extension
([#1780](#1780)).

Internal:
* Remove deprecated or readonly fields from the bundle schema
([#1809](#1809)).

API Changes:
* Changed `databricks git-credentials create`, `databricks
git-credentials delete`, `databricks git-credentials get`, `databricks
git-credentials list`, `databricks git-credentials update` commands .
* Changed `databricks repos create`, `databricks repos delete`,
`databricks repos get`, `databricks repos update` command .

OpenAPI commit 0c86ea6dbd9a730c24ff0d4e509603e476955ac5 (2024-10-02)
Dependency updates:
* Upgrade TF provider to 1.53.0
([#1815](#1815)).
* Bump golang.org/x/term from 0.24.0 to 0.25.0
([#1811](#1811)).
* Bump golang.org/x/text from 0.18.0 to 0.19.0
([#1812](#1812)).
* Bump github.com/databricks/databricks-sdk-go from 0.47.0 to 0.48.0
([#1810](#1810)).

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants