Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added validate mutator to surface additional bundle warnings #1352

Merged
merged 7 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions bundle/config/validate/files_to_sync.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
package validate

import (
"context"

"github.com/databricks/cli/bundle"
"github.com/databricks/cli/bundle/deploy/files"
"github.com/databricks/cli/libs/diag"
)

func FilesToSync() bundle.Mutator {
return &filesToSync{}
}

type filesToSync struct {
}

func (v *filesToSync) Name() string {
return "validate:files_to_sync"
}

func (v *filesToSync) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
andrewnester marked this conversation as resolved.
Show resolved Hide resolved
sync, err := files.GetSync(ctx, b)
if err != nil {
return diag.FromErr(err)
}

diags := diag.Diagnostics{}
fl, err := sync.GetFileList(ctx)
if err != nil {
return diag.FromErr(err)
}

if len(fl) == 0 {
andrewnester marked this conversation as resolved.
Show resolved Hide resolved
if len(b.Config.Sync.Exclude) == 0 {
diags = diags.Append(diag.Diagnostic{
Severity: diag.Warning,
Summary: "There are no files to sync, please check your .gitignore",
lennartkats-db marked this conversation as resolved.
Show resolved Hide resolved
})
} else {
loc := location{path: "sync.exclude", b: b}
diags = diags.Append(diag.Diagnostic{
Severity: diag.Warning,
Summary: "There are no files to sync, please check your .gitignore and sync.exclude configuration",
Location: loc.Location(),
Path: loc.Path(),
})
}
andrewnester marked this conversation as resolved.
Show resolved Hide resolved
}

return diags
}
66 changes: 66 additions & 0 deletions bundle/config/validate/job_cluster_key_defined.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
package validate

import (
"context"
"fmt"

"github.com/databricks/cli/bundle"
"github.com/databricks/cli/libs/diag"
)

func JobClusterKeyDefined() bundle.Mutator {
return &jobClusterKeyDefined{}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this one! Question, no immediate action needed.

So we have a few more cases of references throughout our APIs. I suspect the ones that customers would hit most often are job_cluster_key, then task_key, and then some small long tails including the new environment_key. Makes me wonder how far we should go with these checks? And maybe whether it's worthwhile making this pattern more generic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are able to define to which fields attributes like task_key, job_cluster_key and etc are referencing to in some general way (like key value map of config path and etc.) we can make these generic. But I like the very explicit nature of it and would rather prefer add separate explicit mutator for each type of these checks


type jobClusterKeyDefined struct {
}

func (v *jobClusterKeyDefined) Name() string {
return "validate:job_cluster_key_defined"
}

func (v *jobClusterKeyDefined) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
// Collect all job_cluster_key references from defined tasks
jobClusterKeys := make(map[string]location)
for i, job := range b.Config.Resources.Jobs {
andrewnester marked this conversation as resolved.
Show resolved Hide resolved
for j, task := range job.Tasks {
if task.JobClusterKey != "" {
andrewnester marked this conversation as resolved.
Show resolved Hide resolved
jobClusterKeys[task.JobClusterKey] = location{
path: fmt.Sprintf("resources.jobs.%s.tasks[%d].job_cluster_key", i, j),
b: b,
}
}
}
}

if len(jobClusterKeys) == 0 {
return nil
}

diags := diag.Diagnostics{}

// Check if all job_cluster_keys are defined
for key, loc := range jobClusterKeys {
if !isJobClusterKeyDefined(key, b) {
andrewnester marked this conversation as resolved.
Show resolved Hide resolved
diags = diags.Append(diag.Diagnostic{
Severity: diag.Warning,
Summary: fmt.Sprintf("job_cluster_key %s is not defined", key),
Location: loc.Location(),
Path: loc.Path(),
})
}
andrewnester marked this conversation as resolved.
Show resolved Hide resolved
}

return diags
}

func isJobClusterKeyDefined(key string, b *bundle.Bundle) bool {
for _, job := range b.Config.Resources.Jobs {
for _, cluster := range job.JobClusters {
if cluster.JobClusterKey == key {
andrewnester marked this conversation as resolved.
Show resolved Hide resolved
return true
}
}
}
return false
}
42 changes: 42 additions & 0 deletions bundle/config/validate/validate.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
package validate

import (
"context"

"github.com/databricks/cli/bundle"
"github.com/databricks/cli/libs/diag"
"github.com/databricks/cli/libs/dyn"
)

type validate struct {
}

type location struct {
path string
b *bundle.Bundle
}

func (l location) Location() dyn.Location {
return l.b.Config.GetLocation(l.path)
}

func (l location) Path() dyn.Path {
return dyn.MustPathFromString(l.path)
}

// Apply implements bundle.Mutator.
func (v *validate) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
return bundle.Apply(ctx, b, bundle.Parallel(
JobClusterKeyDefined(),
FilesToSync(),
))
}

// Name implements bundle.Mutator.
func (v *validate) Name() string {
return "validate"
}

func Validate() bundle.Mutator {
return &validate{}
}
37 changes: 37 additions & 0 deletions bundle/parallel.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
package bundle

import (
"context"
"sync"

"github.com/databricks/cli/libs/diag"
)

type parallel struct {
mutators []Mutator
}

func (m *parallel) Name() string {
return "parallel"
}

func (m *parallel) Apply(ctx context.Context, b *Bundle) diag.Diagnostics {
var wg sync.WaitGroup
diags := diag.Diagnostics{}
wg.Add(len(m.mutators))
for _, mutator := range m.mutators {
go func(mutator Mutator) {
defer wg.Done()
diags = diags.Extend(mutator.Apply(ctx, b))
andrewnester marked this conversation as resolved.
Show resolved Hide resolved
}(mutator)
}
wg.Wait()
return diags
}
andrewnester marked this conversation as resolved.
Show resolved Hide resolved

// Parallel runs the given mutators in parallel.
func Parallel(mutators ...Mutator) Mutator {
lennartkats-db marked this conversation as resolved.
Show resolved Hide resolved
return &parallel{
mutators: mutators,
}
}
67 changes: 67 additions & 0 deletions bundle/parallel_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
package bundle

import (
"context"
"testing"

"github.com/databricks/cli/bundle/config"
"github.com/databricks/cli/libs/diag"
"github.com/stretchr/testify/require"
)

type addToContainer struct {
container *[]int
err bool
}

func (m *addToContainer) Apply(ctx context.Context, b *Bundle) diag.Diagnostics {
if m.err {
return diag.Errorf("error")
}

c := *m.container
c = append(c, 1)
*m.container = c
return nil
}

func (m *addToContainer) Name() string {
return "addToContainer"
}

func TestParallelMutatorWork(t *testing.T) {
b := &Bundle{
Config: config.Root{},
}

container := []int{}
m1 := &addToContainer{container: &container}
andrewnester marked this conversation as resolved.
Show resolved Hide resolved
m2 := &addToContainer{container: &container}
m3 := &addToContainer{container: &container}

m := Parallel(m1, m2, m3)

// Apply the mutator
diags := m.Apply(context.Background(), b)
require.Empty(t, diags)
require.Len(t, container, 3)
}

func TestParallelMutatorWorkWithErrors(t *testing.T) {
b := &Bundle{
Config: config.Root{},
}

container := []int{}
m1 := &addToContainer{container: &container}
m2 := &addToContainer{container: &container, err: true}
m3 := &addToContainer{container: &container}

m := Parallel(m1, m2, m3)

// Apply the mutator
diags := m.Apply(context.Background(), b)
require.Len(t, diags, 1)
require.Equal(t, "error", diags[0].Summary)
require.Len(t, container, 2)
}
27 changes: 27 additions & 0 deletions bundle/tests/job_cluster_key/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
bundle:
name: job_cluster_key

workspace:
host: https://acme.cloud.databricks.com/

targets:
default:
resources:
jobs:
foo:
name: job
tasks:
- task_key: test
job_cluster_key: key
andrewnester marked this conversation as resolved.
Show resolved Hide resolved
development:
resources:
jobs:
foo:
job_clusters:
- job_cluster_key: key
new_cluster:
node_type_id: i3.xlarge
num_workers: 1
tasks:
- task_key: test
job_cluster_key: key
28 changes: 28 additions & 0 deletions bundle/tests/job_cluster_key_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
package config_tests

import (
"context"
"testing"

"github.com/databricks/cli/bundle"
"github.com/databricks/cli/bundle/config/validate"
"github.com/databricks/cli/libs/diag"
"github.com/stretchr/testify/require"
)

func TestJobClusterKeyNotDefinedTest(t *testing.T) {
b := loadTarget(t, "./job_cluster_key", "default")

diags := bundle.Apply(context.Background(), b, validate.JobClusterKeyDefined())
require.Len(t, diags, 1)
require.NoError(t, diags.Error())
require.Equal(t, diags[0].Severity, diag.Warning)
require.Equal(t, diags[0].Summary, "job_cluster_key key is not defined")
}

func TestJobClusterKeyDefinedTest(t *testing.T) {
b := loadTarget(t, "./job_cluster_key", "development")

diags := bundle.Apply(context.Background(), b, validate.JobClusterKeyDefined())
require.Len(t, diags, 0)
}
2 changes: 2 additions & 0 deletions cmd/bundle/validate.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"text/template"

"github.com/databricks/cli/bundle"
"github.com/databricks/cli/bundle/config/validate"
"github.com/databricks/cli/bundle/phases"
"github.com/databricks/cli/cmd/bundle/utils"
"github.com/databricks/cli/cmd/root"
Expand Down Expand Up @@ -140,6 +141,7 @@ func newValidateCommand() *cobra.Command {
}

diags = diags.Extend(bundle.Apply(ctx, b, phases.Initialize()))
diags = diags.Extend(bundle.Apply(ctx, b, validate.Validate()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these also be applied during deploy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't for the set of validation checks that perform I/O.

We can choose to run some of them at deploy time later if 1) they don't add latency, 2) the deploy would fail anyway if the validation fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken about performance! Alternatively, we group all validations that affect performance together? Rather than have one folder simply called /validate/ we could have a separate folder like /slow_validations/ or something. Then other validations like JobClusterKeyDefined and most future validations are still applied.

It's important that we include these warnings in the deploy path too since that is the common CUJ for CLI users right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, indeed. Would fit well in a change of what we display on deploy.

if err := diags.Error(); err != nil {
return err
}
Expand Down
Loading