Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deferrable param in EmrContainerSensor #30945

Merged
merged 2 commits into from
Jun 19, 2023

Conversation

pankajastro
Copy link
Member

Add the deferrable param in EmrContainerSensor.
This will allow running EmrContainerSensor in an async way
that means we only submit a job from the worker to run a job
then defer to the trigger for polling and wait for a job the job status
and the worker slot won't be occupied for the whole period of
task execution.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Apr 28, 2023
@pankajastro pankajastro marked this pull request as ready for review April 28, 2023 20:26
Copy link
Contributor

@syedahsn syedahsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few minor comments.

airflow/providers/amazon/aws/sensors/emr.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/triggers/emr.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/triggers/emr.py Show resolved Hide resolved
airflow/providers/amazon/aws/waiters/emr-containers.json Outdated Show resolved Hide resolved
@pankajastro pankajastro marked this pull request as draft May 26, 2023 08:49
@pankajastro pankajastro force-pushed the async_emr_container_sensor branch 2 times, most recently from 40c947f to 619ab51 Compare May 26, 2023 09:37
@pankajastro pankajastro marked this pull request as ready for review May 26, 2023 12:32
@pankajastro pankajastro force-pushed the async_emr_container_sensor branch 2 times, most recently from e25f107 to fd0db14 Compare June 5, 2023 10:00
async with self.hook.async_conn as client:
waiter = self.hook.get_waiter("container_job_complete", deferrable=True, client=client)
attempt = 0
while attempt < self.max_attempts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I have seen it elsewhere, to me, putting "max_attempts" as a trigger parameter doesn't make sense.

The "right" way to set a time limit on a trigger is the deferral timeout. IMO, if an existing operator has max_attempts param, then we should just calculate the deferral timeout based on that number and use that.

This avoids the added complexity in the trigger, the extra signature param, and it avoids the odd fact that, if max attempts is used, then if a triggerer dies and the trigger is picked up again on another machine, it will start from zero and then get killed by the timeout anyway.

What do you think about this @pankajastro ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. thank you for raising it.

honestly, I didn't think about the possibility that a trigger can restart while writing this. I'll fix this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a point that was brought up in a previous PR (by @dstandish :D), and the solution we went with was to essentially use both. Set a timeout on the operator level that is computed from the given parameters (as well as a 60 second buffer), but also use the number of attempts as a metric. I think it is beneficial to have the number of attempts used as a metric in the Triggers, but it is definitely a good idea to have the timeout set on the operator level in case of a Trigger restart, as mentioned above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adeed the timeout while deferring to handle

if a triggerer dies and the trigger is picked up again on another machine

removed max_attempt to reduce the code maintenance

@pankajastro pankajastro force-pushed the async_emr_container_sensor branch 5 times, most recently from 1244bd9 to b1edbf7 Compare June 8, 2023 10:17
@pankajastro pankajastro force-pushed the async_emr_container_sensor branch 5 times, most recently from a56de48 to 1bcbf8e Compare June 19, 2023 10:15
@pankajastro pankajastro reopened this Jun 19, 2023
@pankajastro pankajastro reopened this Jun 19, 2023
Add the deferrable param in EmrContainerSensor.
This will allow running EmrContainerSensor in an async way
that means we only submit a job from the worker to run a job
then defer to the trigger for polling and wait for a job the job status
and the worker slot won't be occupied for the whole period of
task execution.
@pankajastro pankajastro merged commit f0b91ac into apache:main Jun 19, 2023
@pankajastro pankajastro deleted the async_emr_container_sensor branch June 19, 2023 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants