Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-aws] Pipes AWS Glue Dagster run interruption handler #23354

Merged
merged 2 commits into from
Aug 8, 2024

Conversation

danielgafni
Copy link
Contributor

@danielgafni danielgafni commented Jul 31, 2024

Summary & Motivation

I decided it was a good idea to add an automatic Glue job cleanup handler as there is a change for Spark jobs to spend a lot of unnecessary resources otherwise.

Had to rewrite the fake Glue clients to implement non-blocking job execution via subprocess.Popen

Strong Inception vibes in this PR

How I Tested These Changes

  • Added a test with subprocess interruption. The test runs Dagster's materialize inside a multiprocessing.Process. Inside this process, the fake Glue client runs the Glue job in a subprocess.Popen. Once the Dagster process receives termination signal, the PipesGlueClient invokes the fake glue client to terminate the subprocess.Popen "job". We can register this call in the fake client and test for it.

Copy link
Contributor Author

danielgafni commented Jul 31, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @danielgafni and the rest of your teammates on Graphite Graphite

@danielgafni danielgafni changed the title Pipes aws glue interruption handler [dagster-aws] Pipes AWS Glue Dagster run interruption handler Jul 31, 2024
@danielgafni danielgafni force-pushed the pipes-cloudwatch-message-reader branch from 747dce0 to 18aafe3 Compare July 31, 2024 22:06
@danielgafni danielgafni force-pushed the pipes-aws-glue-interruption-handler branch from 8d7e787 to 7650c6c Compare July 31, 2024 22:06
@danielgafni
Copy link
Contributor Author

@schrockn should I be using try: ... except DagsterExecutionInterruptedError: ... instead of the current approach? I found it in other places in the codebase after submitting this PR.

@schrockn
Copy link
Member

schrockn commented Aug 2, 2024

Adding @alangenfeld as he is generally more up-to-speed on this type of issues. Should we be terminating external processes launched by a run if that run is aborted?

Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I be using try: ... except DagsterExecutionInterruptedError: ... instead of the current approach?

yea i think its a bit easier to reason about and you don't need to worry about removing the handler

Should we be terminating external processes launched by a run if that run is aborted?

The feedback we got when we initially did not clean them up is that we should since they are orphaned and unreported if they continue. We do make the behavior optional in the subprocess pipes client.

you can reference https://github.com/dagster-io/dagster/pull/18685/files for some inspiration for how to put this under test

@danielgafni danielgafni force-pushed the pipes-cloudwatch-message-reader branch from 2e31171 to d31a46d Compare August 5, 2024 19:26
@danielgafni danielgafni changed the base branch from pipes-cloudwatch-message-reader to master August 5, 2024 19:28
@graphite-app graphite-app bot added the area: docs Related to documentation in general label Aug 5, 2024
@danielgafni danielgafni changed the base branch from master to pipes-cloudwatch-message-reader August 5, 2024 19:29
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managing my queue. @danielgafni let me know when you have incorporate alex's feedback and want me to look again.

@danielgafni danielgafni force-pushed the pipes-cloudwatch-message-reader branch 3 times, most recently from cf82b96 to 6021a9f Compare August 7, 2024 13:47
@cmpadden cmpadden force-pushed the pipes-cloudwatch-message-reader branch from 6021a9f to 89ec8ce Compare August 7, 2024 15:21
@danielgafni danielgafni force-pushed the pipes-cloudwatch-message-reader branch from 89ec8ce to 3cf6d49 Compare August 7, 2024 15:39
@danielgafni danielgafni force-pushed the pipes-aws-glue-interruption-handler branch from ca55365 to 2f62c83 Compare August 7, 2024 16:21
@danielgafni danielgafni changed the base branch from pipes-cloudwatch-message-reader to master August 7, 2024 16:22
@danielgafni danielgafni force-pushed the pipes-aws-glue-interruption-handler branch from 2f62c83 to 34ebd87 Compare August 8, 2024 07:38
@danielgafni danielgafni changed the base branch from master to pipes-cloudwatch-message-reader August 8, 2024 08:41
@danielgafni danielgafni force-pushed the pipes-cloudwatch-message-reader branch from af7fce7 to afe07d1 Compare August 8, 2024 09:01
@danielgafni danielgafni force-pushed the pipes-aws-glue-interruption-handler branch from 34ebd87 to b54917f Compare August 8, 2024 09:01
@danielgafni danielgafni force-pushed the pipes-aws-glue-interruption-handler branch from 71a11ab to e7965e3 Compare August 8, 2024 14:01
@danielgafni danielgafni force-pushed the pipes-cloudwatch-message-reader branch from afe07d1 to b9a5c95 Compare August 8, 2024 14:04
@danielgafni danielgafni force-pushed the pipes-aws-glue-interruption-handler branch from e7965e3 to bc1edff Compare August 8, 2024 14:04
@danielgafni
Copy link
Contributor Author

Hey @alangenfeld @schrockn, I added a test for Glue run interruption


if response["JobRun"]["JobRunState"] == "FAILED":
if status := response["JobRun"]["JobRunState"] != "SUCCEEDED":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

congrats for introducing only the third instance of the walrus to the codebase! I'm so trained not to use new features given how we have to support old Python version I totally forget they exist.

@danielgafni danielgafni force-pushed the pipes-aws-glue-interruption-handler branch from bc1edff to 7f6049c Compare August 8, 2024 14:30
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code seems fine to me. Please get the nod from @alangenfeld before merging as he is more conversant with these issues than I.

@danielgafni danielgafni force-pushed the pipes-aws-glue-interruption-handler branch from 7f6049c to 9068fd5 Compare August 8, 2024 15:03
@danielgafni danielgafni force-pushed the pipes-cloudwatch-message-reader branch from e22ac4e to 430d2bf Compare August 8, 2024 16:02
@danielgafni danielgafni force-pushed the pipes-aws-glue-interruption-handler branch from 9068fd5 to 7356891 Compare August 8, 2024 16:03
Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nod

Base automatically changed from pipes-cloudwatch-message-reader to master August 8, 2024 17:25
Exploring if it's possible to rewrite the fake testing Glue client to
run execute scripts in a background process so we can test interruption
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: docs Related to documentation in general
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants