Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DABs Template: Fix DBConnect support in VS Code #1239

Closed
wants to merge 2 commits into from
Closed

Conversation

fjakobs
Copy link
Contributor

@fjakobs fjakobs commented Feb 27, 2024

Changes

With the current template we can't execute the python file and the jobs notebook using DBConnect from VSCode because we import from spark.sql import SparkSession, which doesn't support Databricks unified auth. This PR fixes this passing spark into the library code and by explicitly instantiating DatabricksSession where the spark global is not available.

Other changes:

  • add auto-reload to notebooks
  • add DLT typings for code completion
  • Use fixture for spark in the unit tests

Alternatives

I created two alternatives:

  1. Fix DBConnect support in VS Code #1253 Fallback to SparkSession if DB Connect is not available
  2. Fix DBConnect support in VS Code #3 (SDK) #1254 Use Databricks SDK to get a spark session

@codecov-commenter
Copy link

codecov-commenter commented Feb 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 52.51%. Comparing base (0839e6f) to head (32328c1).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1239      +/-   ##
==========================================
- Coverage   52.52%   52.51%   -0.01%     
==========================================
  Files         308      308              
  Lines       17589    17603      +14     
==========================================
+ Hits         9238     9245       +7     
- Misses       7657     7664       +7     
  Partials      694      694              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

return spark.read.table("samples.nyctaxi.trips")

def main():
get_taxis().show(5)
from databricks.connect import DatabricksSession as SparkSession
spark = SparkSession.builder.getOrCreate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately won't work in any older runtime, so we can't use it in main.py. It wouldn't be a good customer experience if our template didn't work out of the box on older runtimes. So I think we can only apply this workaround in the testing code.

An alternative might be to rely on the existence of the spark global but that's going to be another ugly workaround.

Ideally the standard SparkSession would just work here 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work with older runtimes. We can either chose not to support them or have a more verbose workaround

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed this even breaks DLT!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DLT example would still work since we are calling get_taxis() directly from the DLT notebook but it's correct that this specific code is not supported by DLT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wanted to do a try/except here right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we'll go for the try/catch

Comment on lines +19 to +20
yield spark
spark.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those two lines required? I've noticed that our docs here simply return the spark from the fixture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need them. I'll take them out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to stop a SC session?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixtures for pytest belong in conftest.py, also see docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't know about conftest.py. We'll probably go with a solution that doesn't need a fixture

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update the PR summary such that the resulting commit message has a proper summary/record of the change.

Comment on lines +19 to +20
yield spark
spark.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to stop a SC session?

Comment on lines +19 to +20
yield spark
spark.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixtures for pytest belong in conftest.py, also see docs.

return spark.read.table("samples.nyctaxi.trips")

def main():
get_taxis().show(5)
from databricks.connect import DatabricksSession as SparkSession
spark = SparkSession.builder.getOrCreate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wanted to do a try/except here right?

@fjakobs
Copy link
Contributor Author

fjakobs commented Mar 5, 2024

closing in favor of #1253

@fjakobs fjakobs closed this Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants