Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DBConnect support in VS Code #1253

Merged
merged 4 commits into from
Mar 5, 2024
Merged

Fix DBConnect support in VS Code #1253

merged 4 commits into from
Mar 5, 2024

Conversation

fjakobs
Copy link
Contributor

@fjakobs fjakobs commented Mar 4, 2024

Changes

With the current template, we can't execute the Python file and the jobs notebook using DBConnect from VSCode because we import from pyspark.sql import SparkSession, which doesn't support Databricks unified auth. This PR fixes this by passing spark into the library code and by explicitly instantiating a spark session where the spark global is not available.

Other changes:

  • add auto-reload to notebooks
  • add DLT typings for code completion

try:
from databricks.connect import DatabricksSession
return DatabricksSession.builder.getOrCreate()
except ImportError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really doesn't seem like it should be how we recommend customers write their code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but until we have something better I'd rather be explicit and verbose than not supporting old DBRs or hiding it in non-standard libraries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem like a very specific factory pattern we want users to follow. How about moving it inside the main function and not having the get_spark function? That should make it very clear that this is not intended to be used everywhere and is here to only enable per file runs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct to understand that the only case that DatabricksSession.builder.getOrCreate() will fail is if the user is targeting this at DBR <13?

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 51.74%. Comparing base (0839e6f) to head (111484d).
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1253      +/-   ##
==========================================
- Coverage   52.52%   51.74%   -0.78%     
==========================================
  Files         308      314       +6     
  Lines       17589    17864     +275     
==========================================
+ Hits         9238     9244       +6     
- Misses       7657     7919     +262     
- Partials      694      701       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fjakobs fjakobs added this pull request to the merge queue Mar 5, 2024
@pietern pietern removed this pull request from the merge queue due to a manual request Mar 5, 2024
@pietern pietern changed the title Fix DBConnect support in VS Code #2 (fallback) Fix DBConnect support in VS Code Mar 5, 2024
@pietern pietern added this pull request to the merge queue Mar 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 5, 2024
@pietern pietern added this pull request to the merge queue Mar 5, 2024
Merged via the queue into main with commit e61f0e1 Mar 5, 2024
4 checks passed
@pietern pietern deleted the spark-session2 branch March 5, 2024 14:37
pietern added a commit that referenced this pull request Mar 6, 2024
CLI:
* The SDK update fixes `fs cp` calls timing out when copying large files.

Bundles:
* Fix summary command when internal Terraform config doesn't exist ([#1242](#1242)).
* Configure cobra.NoArgs for bundle commands where applicable ([#1250](#1250)).
* Fixed building Python artifacts on Windows with WSL ([#1249](#1249)).
* Add `--validate-only` flag to run validate-only pipeline update ([#1251](#1251)).
* Only transform wheel libraries when using trampoline ([#1248](#1248)).
* Return `application_id` for service principal lookups ([#1245](#1245)).
* Support relative paths in artifact files source section and always upload all artifact files ([#1247](#1247)).
* Fix DBConnect support in VS Code ([#1253](#1253)).

Internal:
* Added test to verify scripts.Execute mutator works correctly ([#1237](#1237)).

API Changes:
* Added `databricks permission-migration` command group.
* Updated nesting of the `databricks settings` and `databricks account settings commands`
* Changed `databricks vector-search-endpoints delete-endpoint` command with new required argument order.
* Changed `databricks vector-search-indexes create-index` command with new required argument order.
* Changed `databricks vector-search-indexes delete-data-vector-index` command with new required argument order.
* Changed `databricks vector-search-indexes upsert-data-vector-index` command with new required argument order.

OpenAPI commit d855b30f25a06fe84f25214efa20e7f1fffcdf9e (2024-03-04)

Dependency updates:
* Bump github.com/stretchr/testify from 1.8.4 to 1.9.0 ([#1252](#1252)).
* Update Go SDK to v0.34.0 ([#1256](#1256)).
@pietern pietern mentioned this pull request Mar 6, 2024
github-merge-queue bot pushed a commit that referenced this pull request Mar 6, 2024
CLI:
* The SDK update fixes `fs cp` calls timing out when copying large
files.

Bundles:
* Fix summary command when internal Terraform config doesn't exist
([#1242](#1242)).
* Configure cobra.NoArgs for bundle commands where applicable
([#1250](#1250)).
* Fixed building Python artifacts on Windows with WSL
([#1249](#1249)).
* Add `--validate-only` flag to run validate-only pipeline update
([#1251](#1251)).
* Only transform wheel libraries when using trampoline
([#1248](#1248)).
* Return `application_id` for service principal lookups
([#1245](#1245)).
* Support relative paths in artifact files source section and always
upload all artifact files
([#1247](#1247)).
* Fix DBConnect support in VS Code
([#1253](#1253)).

Internal:
* Added test to verify scripts.Execute mutator works correctly
([#1237](#1237)).

API Changes:
* Added `databricks permission-migration` command group.
* Updated nesting of the `databricks settings` and `databricks account
settings commands`
* Changed `databricks vector-search-endpoints delete-endpoint` command
with new required argument order.
* Changed `databricks vector-search-indexes create-index` command with
new required argument order.
* Changed `databricks vector-search-indexes delete-data-vector-index`
command with new required argument order.
* Changed `databricks vector-search-indexes upsert-data-vector-index`
command with new required argument order.

OpenAPI commit d855b30f25a06fe84f25214efa20e7f1fffcdf9e (2024-03-04)

Dependency updates:
* Bump github.com/stretchr/testify from 1.8.4 to 1.9.0
([#1252](#1252)).
* Update Go SDK to v0.34.0
([#1256](#1256)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants