Extract Spark configuration, secrets, init_scripts and libraries into the config #318

tanya-borisova · 2023-08-07T17:10:54Z

What is being addressed

Extract Spark configuration and libraries into a separate section of the config.

How is this addressed

Add spark_config section to the config and move the single-node configuration there
Add databricks_libraries section to the config and propagate all types of libraries supported by databricks terraform provider: https://registry.terraform.io/providers/databricks/databricks/0.4.2/docs/resources/cluster#library-configuration-block
Add databricks_cluster section to be able to configure the size of the cluster
Add init_scripts section to the config and upload a dbfs file from local filesystem
Add databricks_secrets section to the config and create corresponding secrets in the secret scope

…ansform config

tanya-borisova · 2023-08-07T17:17:35Z

/test

jjgriff93 · 2023-08-07T17:17:52Z

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5788025632 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

stefpiatek · 2023-08-08T16:19:08Z

config.sample.yaml

+  databricks_libraries:  # Optional
+    pypi:
+      - package: opencensus-ext-azure==1.1.9
+      - package: opencensus-ext-logging==0.1.1
+        repo: "custom-mirror"
+    maven:
+      - coordinates: "com.amazon.deequ:deequ:1.0.4"
+        repo: "custom-mirror"
+        exclusions: ["org.apache.avro:avro"]
+    cran:
+      - package: "rkeops"
+        repo: "my-awesome-repo"
+    whl:
+      - "dbfs:/FileStore/baz.whl"
+    egg:
+      - "dbfs:/FileStore/foo.egg"
+    jar:
+      - "dbfs:/FileStore/app-0.0.1.jar"


Wanted to check if these would work currently or if they are stubs? From what I can tell only the pypi is being tested as working in a deployment

These all should work, yes. I have tested that the configuration for all of them is propagated to the cluster

We currently only use pypi though (we have plans to use Maven as well), but I felt that it would be better to add all of the supported libraries now, in case they are needed in the future

seems reasonable overall, only concern I would have is that if we had a user who's expecting these to be used out of the box. I'd be tempted to add a comment so its clear the config values aren't used as of yet or just keep the pypi and maven as those are the ones we have specific plans for

tanya-borisova · 2023-08-08T16:54:28Z

@jjgriff93 @stefpiatek Re-requesting your reviews please as I've made significant changes (added support for init_scripts and databricks_secrets)

tanya-borisova · 2023-08-08T17:01:26Z

/test

jjgriff93 · 2023-08-08T17:01:38Z

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5799823200 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

tanya-borisova · 2023-08-08T17:09:38Z

/test

jjgriff93 · 2023-08-08T17:09:52Z

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5799903763 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

tanya-borisova · 2023-08-08T17:24:02Z

/test

jjgriff93 · 2023-08-08T17:24:14Z

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5800044005 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

tanya-borisova · 2023-08-08T18:10:24Z

/test

jjgriff93 · 2023-08-08T18:10:42Z

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5800447936 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

jjgriff93

Great work, thanks for adding the secrets and Maven stuff!

jjgriff93 · 2023-08-08T19:38:32Z

config.infra-test.yaml

+    - key: spark.databricks.cluster.profile
+      value: singleNode


nit could simplify and use a yaml map instead like so:

spark_config: spark.databricks.cluster.profile: singleNode another_key: another_value

jjgriff93 · 2023-08-08T19:39:47Z

config.sample.yaml

+    - key: spark.master
+      value: local[*]
+  databricks_secrets:  # Optional
+    - key: cog_services_key


Same suggestion as prev comment re key values

tanya-borisova · 2023-08-09T08:20:48Z

/test

jjgriff93 · 2023-08-09T08:21:02Z

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5806622753 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

stefpiatek

Looks good, thanks! Opinions aren't strongly held on adding in extra config that currently isn't being used so will leave that up to you

stefpiatek · 2023-08-09T08:14:17Z

config.sample.yaml

+  databricks_libraries:  # Optional
+    pypi:
+      - package: opencensus-ext-azure==1.1.9
+      - package: opencensus-ext-logging==0.1.1
+        repo: "custom-mirror"
+    maven:
+      - coordinates: "com.amazon.deequ:deequ:1.0.4"
+        repo: "custom-mirror"
+        exclusions: ["org.apache.avro:avro"]
+    cran:
+      - package: "rkeops"
+        repo: "my-awesome-repo"
+    whl:
+      - "dbfs:/FileStore/baz.whl"
+    egg:
+      - "dbfs:/FileStore/foo.egg"
+    jar:
+      - "dbfs:/FileStore/app-0.0.1.jar"


seems reasonable overall, only concern I would have is that if we had a user who's expecting these to be used out of the box. I'd be tempted to add a comment so its clear the config values aren't used as of yet or just keep the pypi and maven as those are the ones we have specific plans for

tanya-borisova · 2023-08-09T09:15:33Z

@stefpiatek I am happy to remove these, as you said, they aren't currently used. I've kept the jar as I just realised we'll need it for the MSSQL JDBC driver.

tanya-borisova · 2023-08-09T09:50:55Z

/test

jjgriff93 · 2023-08-09T09:51:09Z

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5807523748 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

tanya-borisova · 2023-08-09T13:11:45Z

/test

jjgriff93 · 2023-08-09T13:12:02Z

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5809501246 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

tanya-borisova · 2023-08-09T13:22:29Z

/test

jjgriff93 · 2023-08-09T13:22:46Z

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5809625616 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

tanya-borisova added 2 commits August 7, 2023 17:06

Add optional spark_config and databricks_libraries sections to the tr…

b499e48

…ansform config

Update configs

1ef1a85

jjgriff93 approved these changes Aug 7, 2023

View reviewed changes

anastasiakuzn requested a review from dram1964 August 8, 2023 08:54

tanya-borisova added 2 commits August 8, 2023 10:53

Add all supported types of Databricks libraries

23b5ccd

Forgotten blocks in databricks.tf

e014e93

skeating requested a review from stefpiatek August 8, 2023 14:56

stefpiatek reviewed Aug 8, 2023

View reviewed changes

Add support for additional databricks secrets and for init scripts

2977444

tanya-borisova changed the title ~~Extract Spark configuration and libraries into the config~~ Extract Spark configuration, secrets, init_scripts and libraries into the config Aug 8, 2023

tanya-borisova removed the request for review from dram1964 August 8, 2023 16:51

tanya-borisova requested review from stefpiatek and jjgriff93 August 8, 2023 16:54

Update config

3a7b5a1

jjgriff93 approved these changes Aug 8, 2023

View reviewed changes

tanya-borisova added 3 commits August 8, 2023 20:33

Make cluster size configurable

f40b465

renames

a7f58d1

Add autoscale config

242415e

stefpiatek approved these changes Aug 9, 2023

View reviewed changes

Remove cran, whl and egg libraries for the time being

9f620a7

Use a map instead of a list of objects

5f3de08

tanya-borisova enabled auto-merge (squash) August 9, 2023 13:13

Fix infra-test config

bd61972

tanya-borisova merged commit d8188f2 into main Aug 9, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract Spark configuration, secrets, init_scripts and libraries into the config #318

Extract Spark configuration, secrets, init_scripts and libraries into the config #318

tanya-borisova commented Aug 7, 2023 •

edited

Loading

tanya-borisova commented Aug 7, 2023

jjgriff93 commented Aug 7, 2023

stefpiatek Aug 8, 2023

tanya-borisova Aug 8, 2023 •

edited

Loading

stefpiatek Aug 9, 2023

tanya-borisova commented Aug 8, 2023

tanya-borisova commented Aug 8, 2023

jjgriff93 commented Aug 8, 2023

tanya-borisova commented Aug 8, 2023

jjgriff93 commented Aug 8, 2023

tanya-borisova commented Aug 8, 2023

jjgriff93 commented Aug 8, 2023

tanya-borisova commented Aug 8, 2023

jjgriff93 commented Aug 8, 2023

jjgriff93 left a comment

jjgriff93 Aug 8, 2023

jjgriff93 Aug 8, 2023

tanya-borisova commented Aug 9, 2023

jjgriff93 commented Aug 9, 2023

stefpiatek left a comment •

edited

Loading

stefpiatek Aug 9, 2023

tanya-borisova commented Aug 9, 2023

tanya-borisova commented Aug 9, 2023

jjgriff93 commented Aug 9, 2023

tanya-borisova commented Aug 9, 2023

jjgriff93 commented Aug 9, 2023

tanya-borisova commented Aug 9, 2023

jjgriff93 commented Aug 9, 2023

Extract Spark configuration, secrets, init_scripts and libraries into the config #318

Extract Spark configuration, secrets, init_scripts and libraries into the config #318

Conversation

tanya-borisova commented Aug 7, 2023 • edited Loading

What is being addressed

How is this addressed

tanya-borisova commented Aug 7, 2023

jjgriff93 commented Aug 7, 2023

stefpiatek Aug 8, 2023

Choose a reason for hiding this comment

tanya-borisova Aug 8, 2023 • edited Loading

Choose a reason for hiding this comment

stefpiatek Aug 9, 2023

Choose a reason for hiding this comment

tanya-borisova commented Aug 8, 2023

tanya-borisova commented Aug 8, 2023

jjgriff93 commented Aug 8, 2023

tanya-borisova commented Aug 8, 2023

jjgriff93 commented Aug 8, 2023

tanya-borisova commented Aug 8, 2023

jjgriff93 commented Aug 8, 2023

tanya-borisova commented Aug 8, 2023

jjgriff93 commented Aug 8, 2023

jjgriff93 left a comment

Choose a reason for hiding this comment

jjgriff93 Aug 8, 2023

Choose a reason for hiding this comment

jjgriff93 Aug 8, 2023

Choose a reason for hiding this comment

tanya-borisova commented Aug 9, 2023

jjgriff93 commented Aug 9, 2023

stefpiatek left a comment • edited Loading

Choose a reason for hiding this comment

stefpiatek Aug 9, 2023

Choose a reason for hiding this comment

tanya-borisova commented Aug 9, 2023

tanya-borisova commented Aug 9, 2023

jjgriff93 commented Aug 9, 2023

tanya-borisova commented Aug 9, 2023

jjgriff93 commented Aug 9, 2023

tanya-borisova commented Aug 9, 2023

jjgriff93 commented Aug 9, 2023

tanya-borisova commented Aug 7, 2023 •

edited

Loading

tanya-borisova Aug 8, 2023 •

edited

Loading

stefpiatek left a comment •

edited

Loading