Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python code is not updated when using Python wheel tasks with existing Cluster ID in the task definition #1050

Closed
FrancoisLem opened this issue Dec 8, 2023 · 2 comments
Labels
DABs DABs related issues

Comments

@FrancoisLem
Copy link

Describe the issue

When working with a workflow using python_wheel_task (built with poetry), modifications of the python code of the package wheel are not deployed when re-deploying the bundle with databricks bundle deploy.

Configuration

`bundle:
name: my_bundle
include:

  • ./resources/.yml # Jobs Models and clusters descriptions
    artifacts:
    my-wheel:
    type: whl
    build: poetry build
    and the job resources:
    jobs:
    my_job:
    name: my_workflow
    tasks:
    ################## ETL TASK ##########################
    - task_key: "etl_task"
    # job_cluster_key: basic-spark-cluster
    existing_cluster_id: XXXX-YYYYY-2jtbhpqj
    max_retries: 0
    python_wheel_task:
    package_name: my_package
    entry_point: my_entrypoint
    parameters:
    [
    "--config-file-path",
    "/Workspace/${workspace.file_path}/conf/tasks/databricks/main_dbx_config.yml",
    "--mode", "train"
    ]
    libraries:
    - whl: ./dist/my_wheel-
    .whl
    `

Steps to reproduce the behavior

Please list the steps required to reproduce the issue, for example:

  1. Run databricks bundle deploy ...
  2. Run databricks bundle run ...
  3. Modify the source code of the package, like adding a log or a sys.exit()
  4. Run databricks bundle deploy ...
  5. Run databricks bundle run ...
  6. Observe that your modifications are not deployed

Expected Behavior

Since my package is built in the bundle deploy step, the modifications should be included and deployed on the cluster

Actual Behavior

Modifications are not deployed on the existing cluster.

OS and CLI version

WSL Ubuntu 20.04.6 LTS -- Databricks CLI v0.209.1

Is this a regression?

I don't think so

When we are using an existing cluster ID in our bundle is that we are in a development phase, and want to iterate fastly, not waiting at each deploy that a new cluster for our job is deployed, so upgrading the version of our python package is not really an option.

@FrancoisLem FrancoisLem added the DABs DABs related issues label Dec 8, 2023
@andrewnester
Copy link
Contributor

Hi @FrancoisLem ! This is a limitation on cluster libraries side which requires cluster restart when the wheel is updated for changes to pick it up.
You have 2 options to work this around:

  1. Set experimental -> python_wheel_wrapper option in your yaml config to true. See details here: Make a notebook wrapper for Python wheel tasks optional #797 and Added transformation mutator for Python wheel task for them to work on DBR <13.1 #635
  2. In your Python project, setup the package version to be auto updated on each build, for example, like here:
    https://github.com/databricks/bundle-examples/blob/main/default_python/setup.py#L18-L20

Hope this helps.

@JonasDev1
Copy link

We managed it with an automatic poetry version update:

artifacts:
  default:
    type: whl
    build: poetry version patch && poetry build
    path: .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DABs DABs related issues
Projects
None yet
Development

No branches or pull requests

3 participants