Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error description cannot be shown #25286

Closed
1 of 2 tasks
Lazloo opened this issue Jul 25, 2022 · 13 comments · Fixed by #25427
Closed
1 of 2 tasks

Error description cannot be shown #25286

Lazloo opened this issue Jul 25, 2022 · 13 comments · Fixed by #25427

Comments

@Lazloo
Copy link

Lazloo commented Jul 25, 2022

Apache Airflow version

2.3.3 (latest released)

What happened

Unfortunately, I cannot get further information about my actual error because of the following KeyError

Traceback (most recent call last):
  File "/usr/local/airflow/dags/common/databricks/operator.py", line 59, in execute
    _handle_databricks_operator_execution(self, hook, self.log, context)
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/databricks/operators/databricks.py", line 64, in _handle_databricks_operator_execution
    notebook_error = run_output['error']
KeyError: 'error'

What you think should happen instead

No response

How to reproduce

No response

Operating System

I Assume some Linux distribution

Versions of Apache Airflow Providers

Astronomer Certified: v2.3.3.post1 based on Apache Airflow v2.3.3
Git Version: .release:2.3.3+astro.1+4446ad3e6781ad048c8342993f7c1418db225b25

Deployment

Astronomer

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Lazloo Lazloo added area:core kind:bug This is a clearly a bug labels Jul 25, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 25, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@Lazloo Lazloo changed the title Error descript cannot be shown Error description cannot be shown Jul 25, 2022
@potiuk
Copy link
Member

potiuk commented Jul 25, 2022

Hey @alexott - maybe you or someone from your team can help ?

@alexott
Copy link
Contributor

alexott commented Jul 25, 2022

What is the version of the Databricks provider?

@alexott
Copy link
Contributor

alexott commented Jul 25, 2022

What is the version of the Databricks provider? Can you also provide an example of the code - what operator are you using, etc.?

@Lazloo
Copy link
Author

Lazloo commented Jul 25, 2022

Here the code that run the deployment:

from airflow import AirflowException
from airflow.models import Variable
from airflow.providers.databricks.operators.databricks import (
    DatabricksRunNowOperator,
    XCOM_RUN_ID_KEY,
    XCOM_RUN_PAGE_URL_KEY,
    _handle_databricks_operator_execution
)

class DatabricksRunNowAttachOperator(DatabricksRunNowOperator):
    """
    Custom operator extending the existing DatabricksRunNowOperator.

    The difference is, that it stores the the run_id in a airflow variable and
    uses this to attach to the run if the operator is restarted

    This pattern avoids newly started jobs in case the operator is restarted, e.g. because of a scheduler restart.
    """

    def execute(self, context):
        variable_key = f"{context['task_instance_key_str']}_{XCOM_RUN_ID_KEY}"
        existing_job_run = Variable.get(key=variable_key, default_var=None)
        hook = self._get_hook()
        if existing_job_run is None:
            self.log.info("No job run found for this task, starting new run in Databricks.")
            self.run_id = hook.run_now(self.json)
            self.log.debug(f"Job started. Writing run id to job variable {variable_key}")
            Variable.set(key=variable_key, value=self.run_id)
        else:
            self.log.info(f"Found existing job run {existing_job_run} for this task. Attaching to this run.")
            self.run_id = existing_job_run

        try:
            _handle_databricks_operator_execution(self, hook, self.log, context)
        except AirflowException as e:
            if "failed with terminal state" in str(e):
                self.log.debug(f"Databricks job failed. Cleaning job variable {variable_key}")
                Variable.delete(key=variable_key)
            raise e
        self.log.debug(f"Databricks job terminated. Cleaning job variable {variable_key}")
        Variable.delete(key=variable_key)

DatabricksRunNowAttachOperator(
                task_id=job_config.name,
                job_id=job_config.id,
                polling_period_seconds=job_config.polling_period_seconds,
                retries=job_config.retries
            )

@Lazloo
Copy link
Author

Lazloo commented Jul 25, 2022

Regarding the Deployment. I currently do not find the concrete version but it is a legacy version (not E2)

@alexott
Copy link
Contributor

alexott commented Jul 25, 2022

what is the version of apache-airflow-providers-databricks ? latest?

@Lazloo
Copy link
Author

Lazloo commented Jul 25, 2022

it is based on quay.io/astronomer/ap-airflow:2.3.3-onbuild

Does this help?

@Lazloo
Copy link
Author

Lazloo commented Jul 25, 2022

Before we used quay.io/astronomer/ap-airflow:2.2.5-onbuild. After the update we got this error

@alexott
Copy link
Contributor

alexott commented Jul 25, 2022

it looks like it's caused by this PR: https://github.com/apache/airflow/pull/21709/files - I'll investigate it

@Lazloo
Copy link
Author

Lazloo commented Jul 26, 2022

@alexott Could you already find something out?

@alexott
Copy link
Contributor

alexott commented Jul 26, 2022

I know the culprit (linked PR), but I need to get time to fix it (most probably around weekend...)

@potiuk
Copy link
Member

potiuk commented Jul 28, 2022

assigned you then :)

alexott added a commit to alexott/airflow that referenced this issue Jul 31, 2022
In the Jobs API 2.1, we can't call `get_run_output` on the top-level Run ID because it's
not supported by API - we need to call this function on specific sub-run of the job, even
if it consists of the single task

closes: apache#25286
potiuk pushed a commit that referenced this issue Aug 3, 2022
In the Jobs API 2.1, we can't call `get_run_output` on the top-level Run ID because it's
not supported by API - we need to call this function on specific sub-run of the job, even
if it consists of the single task

closes: #25286
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants