Add server side parameters to session connection method #823

JCZuurmond · 2023-07-07T08:44:26Z

resolves #690

Description

Pass existing server_side_parameters to session connection wrapper and use to configure SparkSession.

Checklist

I have read the contributing guide and understand what's expected of me
I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
I have run changie new to create a changelog entry

…d use to configure SparkSession.

…d pass to cursor from there.

JCZuurmond · 2023-07-07T08:56:23Z

Prefer this PR over #691. @alarocca-apixio : We took your commits and touched up the code so that it can be merged.

Fokko · 2023-07-07T09:14:48Z

Works on my end:

cat ~/.dbt/profiles.yml 
dbt_tabular:
  outputs:
    dev:
      method: session
      schema: dbt_tabular
      type: spark
      host: NA
      server_side_parameters:
        "spark.driver.memory": "2g"
  target: dev

I can see that this is being picked up by the process:

ps aux | grep -i spark                          
fokkodriesprong  11191 150.0  0.4 413414240 269024 s008  S+   11:14AM   0:01.99 /opt/homebrew/Cellar/openjdk@11/11.0.19/libexec/openjdk.jdk/Contents/Home/bin/java -cp /opt/homebrew/lib/python3.9/site-packages/pyspark/conf:/opt/homebrew/lib/python3.9/site-packages/pyspark/jars/* -Xmx2g -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit --conf spark.driver.memory=2g --conf spark.sql.catalogImplementation=hive pyspark-shell
fokkodriesprong  11203   0.0  0.0 408626896   1312 s010  S+   11:14AM   0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox -i spark

We can see that --conf spark.driver.memory=2g is set.

JCZuurmond · 2023-07-07T11:33:44Z

@Fleid: This PR is ready to be merged

JCZuurmond · 2023-07-07T11:53:02Z

@colin-rogers-dbt: Could you fix the CI?

colin-rogers-dbt · 2023-07-12T18:40:59Z

This looks ready to merge but I think we should add a functional test case (will need to think where/how we can) and update our unit tests like #577

colin-rogers-dbt · 2023-07-12T18:41:50Z

also is this dependent on #577 ?

dbt/adapters/spark/session.py

dbt/adapters/spark/connections.py

mikealfare · 2023-07-21T19:56:21Z

dbt/adapters/spark/session.py

@@ -24,9 +24,10 @@ class Cursor:
    https://github.com/mkleehammer/pyodbc/wiki/Cursor
    """

-    def __init__(self) -> None:
+    def __init__(self, *, server_side_parameters: Optional[Dict[str, Any]] = None) -> None:


I'm unfamiliar with this mechanic. What does the *, do in the signature? Is it similar to unpacking with my_arg, *_? Do we need it?

Yes, it is similar to the unpacking you mentioned. I guess that in a signature this mechanic is called "packing" as it is the opposite of unpacking. (I could not quickly find a PEP about this.)

What it does: it packs all positional argument. In this case, all positional arguments are packed into nothing. If you add a parameter after the * you pack all positional arguments into that parameter (as a tuple):

def sum(*numbers): total = 0 for number in numbers: total += number return total sum(1, 2, 3) sum(5, 7, 9, 11)

The trick here is that the * without an argument forces all arguments after it to become key-word arguments. I like to use that to improve readability:

Connection(foo)

vs

Connection(server_side_parameters=foo)

The later is more readable.

foo is of course a badly chosen variable name. But, I expect the first positional argument of a Connection to be a connection_string, like conn = pyodbc.connect(connection_str, autocommit=True), or something other than server_side_paramters which is an (optional) additional parameter.

The * is not required, I added it to improve code readability.

mikealfare · 2023-07-21T19:56:39Z

dbt/adapters/spark/session.py

@@ -159,6 +165,9 @@ class Connection:
    https://github.com/mkleehammer/pyodbc/wiki/Connection
    """

+    def __init__(self, *, server_side_parameters: Optional[Dict[Any, str]] = None) -> None:


Same question as above.

JCZuurmond requested a review from a team as a code owner July 7, 2023 08:44

JCZuurmond requested a review from mikealfare July 7, 2023 08:44

cla-bot bot added the cla:yes label Jul 7, 2023

alarocca-apixio and others added 3 commits July 7, 2023 10:49

Pass existing server_side_parameters to session connection wrapper an…

7ef8084

…d use to configure SparkSession.

Incorporating feedback. Moved server side parameters to Connection an…

6b5f460

…d pass to cursor from there.

Add changie

ecec5c7

JCZuurmond force-pushed the ADAP-405 branch from 3be6858 to ecec5c7 Compare July 7, 2023 08:53

JCZuurmond added 3 commits July 7, 2023 10:53

Add type hint

871696a

Write out loop

7f95e3b

Add type hint

942bf46

JCZuurmond mentioned this pull request Jul 7, 2023

Pass existing server_side_parameters to session connection wrapper and use to configure SparkSession. #691

Closed

6 tasks

Remove server_side_parameters from connection wrapper

948e733

JCZuurmond changed the title ~~Pass existing server_side_parameters to session connection wrapper and use to configure SparkSession. #691~~ Add server side parameters to session connection method Jul 7, 2023

Add handle type hint

0c3a172

JCZuurmond mentioned this pull request Jul 7, 2023

[CT-1561] [Feature] Support defining server_side_parameters inside of dbt_project.yml #529

Closed

3 tasks

mikealfare reviewed Jul 17, 2023

View reviewed changes

dbt/adapters/spark/session.py Outdated Show resolved Hide resolved

dbt/adapters/spark/connections.py Show resolved Hide resolved

JCZuurmond force-pushed the ADAP-405 branch 2 times, most recently from 9e6834e to ac402e7 Compare July 19, 2023 08:19

Make server_side_parameters optional

0d0c543

JCZuurmond force-pushed the ADAP-405 branch from ac402e7 to 0d0c543 Compare July 19, 2023 08:47

Merge branch 'main' into ADAP-405

fd831a8

mikealfare reviewed Jul 21, 2023

View reviewed changes

colin-rogers-dbt approved these changes Jul 24, 2023

View reviewed changes

colin-rogers-dbt merged commit d91bd17 into dbt-labs:main Jul 24, 2023

Fokko mentioned this pull request Aug 17, 2023

Add example of server_side_parameters dbt-labs/docs.getdbt.com#3622

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add server side parameters to session connection method #823

Add server side parameters to session connection method #823

JCZuurmond commented Jul 7, 2023 •

edited

Loading

JCZuurmond commented Jul 7, 2023

Fokko commented Jul 7, 2023

JCZuurmond commented Jul 7, 2023

JCZuurmond commented Jul 7, 2023

colin-rogers-dbt commented Jul 12, 2023

colin-rogers-dbt commented Jul 12, 2023

mikealfare Jul 21, 2023

JCZuurmond Jul 22, 2023 •

edited

Loading

mikealfare Jul 21, 2023

Add server side parameters to session connection method #823

Add server side parameters to session connection method #823

Conversation

JCZuurmond commented Jul 7, 2023 • edited Loading

Description

Checklist

JCZuurmond commented Jul 7, 2023

Fokko commented Jul 7, 2023

JCZuurmond commented Jul 7, 2023

JCZuurmond commented Jul 7, 2023

colin-rogers-dbt commented Jul 12, 2023

colin-rogers-dbt commented Jul 12, 2023

mikealfare Jul 21, 2023

Choose a reason for hiding this comment

JCZuurmond Jul 22, 2023 • edited Loading

Choose a reason for hiding this comment

mikealfare Jul 21, 2023

Choose a reason for hiding this comment

JCZuurmond commented Jul 7, 2023 •

edited

Loading

JCZuurmond Jul 22, 2023 •

edited

Loading