-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DABs Template: Fix DBConnect support in VS Code #1239
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1239 +/- ##
==========================================
- Coverage 52.52% 52.51% -0.01%
==========================================
Files 308 308
Lines 17589 17603 +14
==========================================
+ Hits 9238 9245 +7
- Misses 7657 7664 +7
Partials 694 694 ☔ View full report in Codecov by Sentry. |
libs/template/templates/default-python/template/{{.project_name}}/tests/main_test.py.tmpl
Show resolved
Hide resolved
return spark.read.table("samples.nyctaxi.trips") | ||
|
||
def main(): | ||
get_taxis().show(5) | ||
from databricks.connect import DatabricksSession as SparkSession | ||
spark = SparkSession.builder.getOrCreate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unfortunately won't work in any older runtime, so we can't use it in main.py. It wouldn't be a good customer experience if our template didn't work out of the box on older runtimes. So I think we can only apply this workaround in the testing code.
An alternative might be to rely on the existence of the spark
global but that's going to be another ugly workaround.
Ideally the standard SparkSession
would just work here 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work with older runtimes. We can either chose not to support them or have a more verbose workaround
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed this even breaks DLT!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DLT example would still work since we are calling get_taxis()
directly from the DLT notebook but it's correct that this specific code is not supported by DLT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wanted to do a try/except here right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we'll go for the try/catch
libs/template/templates/default-python/template/{{.project_name}}/tests/main_test.py.tmpl
Show resolved
Hide resolved
libs/template/templates/default-python/template/{{.project_name}}/tests/main_test.py.tmpl
Show resolved
Hide resolved
...plate/templates/default-python/template/{{.project_name}}/src/{{.project_name}}/main.py.tmpl
Show resolved
Hide resolved
yield spark | ||
spark.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those two lines required? I've noticed that our docs here simply return the spark from the fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need them. I'll take them out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean to stop a SC session?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixtures for pytest belong in conftest.py
, also see docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I didn't know about conftest.py
. We'll probably go with a solution that doesn't need a fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also update the PR summary such that the resulting commit message has a proper summary/record of the change.
yield spark | ||
spark.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean to stop a SC session?
yield spark | ||
spark.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixtures for pytest belong in conftest.py
, also see docs.
return spark.read.table("samples.nyctaxi.trips") | ||
|
||
def main(): | ||
get_taxis().show(5) | ||
from databricks.connect import DatabricksSession as SparkSession | ||
spark = SparkSession.builder.getOrCreate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wanted to do a try/except here right?
closing in favor of #1253 |
Changes
With the current template we can't execute the python file and the jobs notebook using DBConnect from VSCode because we import
from spark.sql import SparkSession
, which doesn't support Databricks unified auth. This PR fixes this passingspark
into the library code and by explicitly instantiatingDatabricksSession
where thespark
global is not available.Other changes:
spark
in the unit testsAlternatives
I created two alternatives:
SparkSession
ifDB Connect
is not available