-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs databricks asset bundles #3744
Add docs databricks asset bundles #3744
Conversation
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
@cilopezs also is helping! |
Thank you @erwinpaillacan and @cilopezs for this PR! This addresses #3360 in part. In your opinion, do you think it still makes sense to keep the DBX docs around? I was thinking that we should remove them, and replace them by what you did here. |
Yes, in our opinion we could update the page https://docs.kedro.org/en/stable/deployment/databricks/databricks_ide_development_workflow.html with databricks connect, which for some time was deprecated but now it is live again and being recommended for development https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html and for deployment we can rely on asset bundle which is the main purpose. I think we need to update the decision plot, right? |
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Heads up! @astrojuanlu @cilopezs
We are removing dbx |
docs/source/deployment/databricks/databricks_deployment_workflow.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing @erwinpaillacan thank you for the hard work, only challenge is if we should make the hook part of the addon/starter and not require a manual change
docs/source/deployment/databricks/databricks_ide_development_workflow.md
Show resolved
Hide resolved
docs/source/deployment/databricks/databricks_ide_development_workflow.md
Outdated
Show resolved
Hide resolved
docs/source/deployment/databricks/databricks_ide_development_workflow.md
Outdated
Show resolved
Hide resolved
docs/source/deployment/databricks/databricks_ide_development_workflow.md
Outdated
Show resolved
Hide resolved
The docs errors are legitimate, please address them so RTD can render the new docs 👍🏽 |
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi folks! @erwinpaillacan I understand that there aren't really any major outstanding comments here, and that #3744 (comment) can be addressed as a separate PR. Is that right?
Could you have a look at the vale
check, which flagged some spelling and style errors? And hopefully we can get this merged.
docs/source/deployment/databricks/databricks_deployment_workflow.md
Outdated
Show resolved
Hide resolved
…ow.md Co-authored-by: Juan Luis Cano Rodríguez <hello@juanlu.space> Signed-off-by: erwinpaillacan <43705290+erwinpaillacan@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a clear improvement over what we have now 💯 and just needs some style fixes reported by Vale for all the checks to pass. Thanks @erwinpaillacan for your patience!
Erwin confirmed internally that this PR will need an update to work on Kedro 0.19 👍🏼 |
Hey @erwinpaillacan, can I help you update this PR to work with Kedro 0.19.x? |
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
…-asset-bundles-deployment Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
@astrojuanlu |
Hey! I'm a bit late to the party, but I just wanted to let you know that I have previously made a databricks bundle template to illustrate how one could get started with Kedro on databricks. I'm in the process of converting the logic introduced in the template into a kedro plugin - see more here. I think the plugin would be very helpful as it makes it easier to deploy existing projects to Databricks, whereas both the template made by me or the Please note that the plugin is still in early development, so if you have any suggestions to align with your vision please let me know! |
Looks great!! |
@erwinpaillacan it maps pipelines to workflows with nodes as tasks. That is to say, I did my best to mimic the view of Kedro-viz in the workflow tab of the Databricks UI |
I just updated the readme to shed some light on the functionality that I'm intending to implement. I say intend as the 'deploy' command isn't ready yet. I also published it as 'kedro-databricks-dev' as the other name is already taken by an empty project. I will reach out to the author of the other project so that we can hopefully get a sensible name for the package 😊 |
cc @em-pe :) |
package_name: iris #<package_name> | ||
entry_point: databricks_run | ||
named_parameters: | ||
--conf-source: /Workspace${workspace.file_path}/conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about passing the project-path
instead and the change the working directory in the databricks_run.py
script? I do not know if this a good option but sometimes I have more files that I need in the project path, like sql files that are not reachable in databricks unless I change the working directory or I specify the full path.
named_parameters: | ||
--conf-source: /Workspace${workspace.file_path}/conf | ||
--package-name: iris #<package_name> | ||
--env: ${var.environment} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know why it only worked for me without the two --
at the start
new_cluster: | ||
#Azure nodes | ||
node_type_id: Standard_DS3_v2 | ||
spark_version: 14.3.x-scala2.12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say I did some experiments with @JenspederM using the Beyond this point, I leave it on the hands of @ankatiyar, who will be looking at this soon 😄 |
@JenspederM I'm happy to pass |
I see you found it without my help. But thank you for transferring the project, @em-pe! I have now published the first release to I will make an announcement on Slack as soon as I have a working example with the |
Description
This pull request was initiated to assist in establishing a project utilizing asset bundles on Databricks, as the use of DBX is deprecated and no longer recommended.
https://www.databricks.com/blog/announcing-general-availability-databricks-asset-bundles
Development notes
make build-docs
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file