Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Flyte task keeps running forever when running a Databricks job #3855

Closed
2 tasks done
rambrus opened this issue Jul 10, 2023 · 2 comments
Closed
2 tasks done

[BUG] Flyte task keeps running forever when running a Databricks job #3855

rambrus opened this issue Jul 10, 2023 · 2 comments
Labels
bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers

Comments

@rambrus
Copy link

rambrus commented Jul 10, 2023

Describe the bug

BACKGROUND:
I'm trying to run a simplified Flyte task using Databricks plugin.

PREREQUISITES:

@task(
    task_config=Databricks(
        databricks_conf={
           "run_name": "dbx simplified example",
           "existing_cluster_id": "<my-existing-cluster-id>",
           "timeout_seconds": 3600,
           "max_retries": 1,
       }
    ),
    limits=Resources(mem="2000M"),
    cache_version="1",
)
def print_spark_config():
    spark = flytekit.current_context().spark_session
    print(spark.sparkContext.getConf().getAll())

@workflow
def my_databricks_job():
    print_spark_config()

STEPS:

  • Run workflow: pyflyte --verbose run --remote --destination-dir . dbx_simplified_example.py my_databricks_job

ISSUE:
Databricks job run triggered and successfully completed, but the Flyte job status is not updated, it is stuck in RUNNING state.

Expected behavior

Flyte job status keeps updated.

Additional context to reproduce

Please note that we tried to upgrade Flyte to v1.7.0:

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@rambrus rambrus added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Jul 10, 2023
@rambrus rambrus changed the title [BUG] [BUG] Flyte task keeps running forever when running a Databricks job Jul 10, 2023
@rambrus
Copy link
Author

rambrus commented Jul 11, 2023

Tested Databricks API response on this:

curl --netrc --request GET --header "Authorization: Bearer $DATABRICKS_TOEN" \
'https://dbc-32fcad04-13c2.cloud.databricks.com/api/2.0/jobs/runs/get?run_id=306'

Response:

{
    "attempt_number": 0,
    "cleanup_duration": 0,
    "cluster_instance": {
        "cluster_id": "<my-cluster-id>",
        "spark_context_id": "<my-spark-context-id>"
    },
    "cluster_spec": {
        "existing_cluster_id": "<my-cluster-id>"
    },
    "creator_user_name": "<my-username>",
    "end_time": 1688987784820,
    "execution_duration": 223000,
    "format": "SINGLE_TASK",
    "job_id": 1060720031042619,
    "number_in_job": 574539,
    "run_id": 574539,
    "run_name": "dbx simplified example",
    "run_page_url": "<my-run-page-url>",
    "run_type": "SUBMIT_RUN",
    "setup_duration": 41000,
    "start_time": 1688987520036,
    "state": {
        "life_cycle_state": "TERMINATED",
        "result_state": "SUCCESS",
        "state_message": "",
        "user_cancelled_or_timedout": false
    },
    "task": {
        "spark_python_task": {
            "parameters": [
                "pyflyte-fast-execute",
                "--additional-distribution",
                "s3://<my-s3-bucket>/flytesnacks/development/UMZ6XPNM4L6KL4YALV56QDMSX4======/script_mode.tar.gz",
                "--dest-dir",
                ".",
                "--",
                "pyflyte-execute",
                "--inputs",
                "s3://<my-s3-bucket>/metadata/propeller/flytesnacks-development-ff83ea058624d44ddbe9/n0/data/inputs.pb",
                "--output-prefix",
                "s3://<my-s3-bucket>/metadata/propeller/flytesnacks-development-ff83ea058624d44ddbe9/n0/data/0",
                "--raw-output-data-prefix",
                "s3://<my-s3-bucket>/raw_data/sh/ff83ea058624d44ddbe9-n0-0",
                "--checkpoint-path",
                "s3://<my-s3-bucket>/raw_data/sh/ff83ea058624d44ddbe9-n0-0/_flytecheckpoints",
                "--prev-checkpoint",
                "\"\"",
                "--resolver",
                "flytekit.core.python_auto_container.default_task_resolver",
                "--",
                "task-module",
                "dbx_simplified_example",
                "task-name",
                "print_spark_config"
            ],
            "python_file": "dbfs:/tmp/flyte/entrypoint.py"
        }
    }
}

@eapolinario
Copy link
Contributor

Fixed in #4206.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers
Projects
None yet
Development

No branches or pull requests

2 participants