Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract Spark configuration, secrets, init_scripts and libraries into the config #318

Merged
merged 12 commits into from
Aug 9, 2023

Conversation

tanya-borisova
Copy link
Member

@tanya-borisova tanya-borisova commented Aug 7, 2023

What is being addressed

Extract Spark configuration and libraries into a separate section of the config.

How is this addressed

  • Add spark_config section to the config and move the single-node configuration there
  • Add databricks_libraries section to the config and propagate all types of libraries supported by databricks terraform provider: https://registry.terraform.io/providers/databricks/databricks/0.4.2/docs/resources/cluster#library-configuration-block
  • Add databricks_cluster section to be able to configure the size of the cluster
  • Add init_scripts section to the config and upload a dbfs file from local filesystem
  • Add databricks_secrets section to the config and create corresponding secrets in the secret scope

@tanya-borisova
Copy link
Member Author

/test

@jjgriff93
Copy link
Member

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5788025632 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

Comment on lines 37 to 54
databricks_libraries: # Optional
pypi:
- package: opencensus-ext-azure==1.1.9
- package: opencensus-ext-logging==0.1.1
repo: "custom-mirror"
maven:
- coordinates: "com.amazon.deequ:deequ:1.0.4"
repo: "custom-mirror"
exclusions: ["org.apache.avro:avro"]
cran:
- package: "rkeops"
repo: "my-awesome-repo"
whl:
- "dbfs:/FileStore/baz.whl"
egg:
- "dbfs:/FileStore/foo.egg"
jar:
- "dbfs:/FileStore/app-0.0.1.jar"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to check if these would work currently or if they are stubs? From what I can tell only the pypi is being tested as working in a deployment

Copy link
Member Author

@tanya-borisova tanya-borisova Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These all should work, yes. I have tested that the configuration for all of them is propagated to the cluster

We currently only use pypi though (we have plans to use Maven as well), but I felt that it would be better to add all of the supported libraries now, in case they are needed in the future

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable overall, only concern I would have is that if we had a user who's expecting these to be used out of the box. I'd be tempted to add a comment so its clear the config values aren't used as of yet or just keep the pypi and maven as those are the ones we have specific plans for

@tanya-borisova tanya-borisova changed the title Extract Spark configuration and libraries into the config Extract Spark configuration, secrets, init_scripts and libraries into the config Aug 8, 2023
@tanya-borisova tanya-borisova removed the request for review from dram1964 August 8, 2023 16:51
@tanya-borisova
Copy link
Member Author

@jjgriff93 @stefpiatek Re-requesting your reviews please as I've made significant changes (added support for init_scripts and databricks_secrets)

@tanya-borisova
Copy link
Member Author

/test

@jjgriff93
Copy link
Member

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5799823200 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

@tanya-borisova
Copy link
Member Author

/test

@jjgriff93
Copy link
Member

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5799903763 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

@tanya-borisova
Copy link
Member Author

/test

@jjgriff93
Copy link
Member

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5800044005 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

@tanya-borisova
Copy link
Member Author

/test

@jjgriff93
Copy link
Member

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5800447936 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

Copy link
Member

@jjgriff93 jjgriff93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks for adding the secrets and Maven stuff!

Comment on lines 17 to 18
- key: spark.databricks.cluster.profile
value: singleNode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit could simplify and use a yaml map instead like so:

spark_config:
    spark.databricks.cluster.profile: singleNode
    another_key: another_value

- key: spark.master
value: local[*]
databricks_secrets: # Optional
- key: cog_services_key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion as prev comment re key values

@tanya-borisova
Copy link
Member Author

/test

@jjgriff93
Copy link
Member

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5806622753 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

Copy link
Collaborator

@stefpiatek stefpiatek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! Opinions aren't strongly held on adding in extra config that currently isn't being used so will leave that up to you

Comment on lines 37 to 54
databricks_libraries: # Optional
pypi:
- package: opencensus-ext-azure==1.1.9
- package: opencensus-ext-logging==0.1.1
repo: "custom-mirror"
maven:
- coordinates: "com.amazon.deequ:deequ:1.0.4"
repo: "custom-mirror"
exclusions: ["org.apache.avro:avro"]
cran:
- package: "rkeops"
repo: "my-awesome-repo"
whl:
- "dbfs:/FileStore/baz.whl"
egg:
- "dbfs:/FileStore/foo.egg"
jar:
- "dbfs:/FileStore/app-0.0.1.jar"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable overall, only concern I would have is that if we had a user who's expecting these to be used out of the box. I'd be tempted to add a comment so its clear the config values aren't used as of yet or just keep the pypi and maven as those are the ones we have specific plans for

@tanya-borisova
Copy link
Member Author

@stefpiatek I am happy to remove these, as you said, they aren't currently used. I've kept the jar as I just realised we'll need it for the MSSQL JDBC driver.

@tanya-borisova
Copy link
Member Author

/test

@jjgriff93
Copy link
Member

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5807523748 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

@tanya-borisova
Copy link
Member Author

/test

@jjgriff93
Copy link
Member

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5809501246 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

@tanya-borisova tanya-borisova enabled auto-merge (squash) August 9, 2023 13:13
@tanya-borisova
Copy link
Member Author

/test

@jjgriff93
Copy link
Member

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/UCLH-Foundry/FlowEHR/actions/runs/5809625616 (with refid 0a055ddf)

(in response to this comment from @tanya-borisova)

@tanya-borisova tanya-borisova merged commit d8188f2 into main Aug 9, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants