Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logic to cancel the external job if the TaskInstance is not in a running or deferred state for DataprocCreateClusterOperator #39446

Merged
merged 4 commits into from
May 8, 2024

Conversation

sunank200
Copy link
Collaborator

PR #39130 introduces a method for handling asyncio.CancelledError in a try/except block. However, this method is deemed unsafe, and it affects DataprocCreateClusterOperator operators, which enables external job cancellation if the triggerer restarts or crashes. This can cause weird behaviour like rescheduling deferred operators, as Airflow remains unaware of job cancellations.

As a workaround, capturing asyncio.CancelledError cancels the job only if the TaskInstance is not in a running or deferred state. This prevents premature external job termination.

More details at: #36090 (comment)


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels May 6, 2024
@sunank200 sunank200 changed the title Workaround for asyncio cancelled error for DataprocCreateClusterOperator Fix logic to cancel the external job if the TaskInstance is not in a running or deferred state for DataprocCreateClusterOperator May 6, 2024
@Lee-W
Copy link
Member

Lee-W commented May 7, 2024

Just have a quick discussion with @sunank200. We'll yield the event if CancelError was raised , and handle the TaskInstance state check and cancelation in execute_complete

@sunank200 sunank200 force-pushed the DataprocCreateClusterOperatorFix branch from 47f4408 to f30e854 Compare May 7, 2024 10:22
@sunank200
Copy link
Collaborator Author

Just have a quick discussion with @sunank200. We'll yield the event if CancelError was raised , and handle the TaskInstance state check and cancelation in execute_complete

That won't work as the execute_on_complete won't be called when the task is cancelled. I have used task_instance from BaseTrigger instead.

Copy link
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should add a test to this. But others look good to me

@sunank200 sunank200 force-pushed the DataprocCreateClusterOperatorFix branch from f30e854 to 64975f2 Compare May 7, 2024 10:31
Copy link
Contributor

@dirrao dirrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the test case for this change?

Copy link
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a short nitpick. We could create an issue and work on that later, but we'll need test cases for this PR

@sunank200
Copy link
Collaborator Author

Can you add the test case for this change?

Added the test.

@sunank200 sunank200 requested a review from dirrao May 7, 2024 16:29
@sunank200 sunank200 force-pushed the DataprocCreateClusterOperatorFix branch from 87bda99 to b32e305 Compare May 8, 2024 06:47
@Lee-W Lee-W merged commit 3d575fe into apache:main May 8, 2024
39 checks passed
@Lee-W Lee-W deleted the DataprocCreateClusterOperatorFix branch May 8, 2024 07:42
pateash pushed a commit to pateash/airflow that referenced this pull request May 13, 2024
…running or deferred state for DataprocCreateClusterOperator (apache#39446)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants