-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add server side parameters to session connection method #823
Conversation
…d use to configure SparkSession.
…d pass to cursor from there.
Prefer this PR over #691. @alarocca-apixio : We took your commits and touched up the code so that it can be merged. |
Works on my end: cat ~/.dbt/profiles.yml
dbt_tabular:
outputs:
dev:
method: session
schema: dbt_tabular
type: spark
host: NA
server_side_parameters:
"spark.driver.memory": "2g"
target: dev I can see that this is being picked up by the process: ps aux | grep -i spark
fokkodriesprong 11191 150.0 0.4 413414240 269024 s008 S+ 11:14AM 0:01.99 /opt/homebrew/Cellar/openjdk@11/11.0.19/libexec/openjdk.jdk/Contents/Home/bin/java -cp /opt/homebrew/lib/python3.9/site-packages/pyspark/conf:/opt/homebrew/lib/python3.9/site-packages/pyspark/jars/* -Xmx2g -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit --conf spark.driver.memory=2g --conf spark.sql.catalogImplementation=hive pyspark-shell
fokkodriesprong 11203 0.0 0.0 408626896 1312 s010 S+ 11:14AM 0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox -i spark We can see that |
@Fleid: This PR is ready to be merged |
@colin-rogers-dbt: Could you fix the CI? |
This looks ready to merge but I think we should add a functional test case (will need to think where/how we can) and update our unit tests like #577 |
also is this dependent on #577 ? |
9e6834e
to
ac402e7
Compare
@@ -24,9 +24,10 @@ class Cursor: | |||
https://github.com/mkleehammer/pyodbc/wiki/Cursor | |||
""" | |||
|
|||
def __init__(self) -> None: | |||
def __init__(self, *, server_side_parameters: Optional[Dict[str, Any]] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unfamiliar with this mechanic. What does the *,
do in the signature? Is it similar to unpacking with my_arg, *_
? Do we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is similar to the unpacking you mentioned. I guess that in a signature this mechanic is called "packing" as it is the opposite of unpacking. (I could not quickly find a PEP about this.)
What it does: it packs all positional argument. In this case, all positional arguments are packed into nothing. If you add a parameter after the *
you pack all positional arguments into that parameter (as a tuple):
def sum(*numbers):
total = 0
for number in numbers:
total += number
return total
sum(1, 2, 3)
sum(5, 7, 9, 11)
The trick here is that the *
without an argument forces all arguments after it to become key-word arguments. I like to use that to improve readability:
Connection(foo)
vs
Connection(server_side_parameters=foo)
The later is more readable.
foo
is of course a badly chosen variable name. But, I expect the first positional argument of a Connection
to be a connection_string
, like conn = pyodbc.connect(connection_str, autocommit=True)
, or something other than server_side_paramters
which is an (optional) additional parameter.
The *
is not required, I added it to improve code readability.
@@ -159,6 +165,9 @@ class Connection: | |||
https://github.com/mkleehammer/pyodbc/wiki/Connection | |||
""" | |||
|
|||
def __init__(self, *, server_side_parameters: Optional[Dict[Any, str]] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above.
resolves #690
Description
Pass existing server_side_parameters to session connection wrapper and use to configure SparkSession.
Checklist
changie new
to create a changelog entry