This repository has been archived by the owner on Oct 9, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 59
correct propagation of launchplan start error #598
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Daniel Rammer <daniel@union.ai>
hamersaw
changed the title
Fixed correct propagation of launchplan start error
correct propagation of launchplan start error
Jul 31, 2023
Codecov Report
|
So, instead of seeing "workflow not found" the side effect of this change is that users will see the underlying error in flyteconsole? |
Exactly. |
eapolinario
approved these changes
Aug 4, 2023
Signed-off-by: Daniel Rammer <daniel@union.ai>
gvashishtha
pushed a commit
to gvashishtha/flytepropeller
that referenced
this pull request
Aug 5, 2023
Signed-off-by: Daniel Rammer <daniel@union.ai> Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com>
eapolinario
pushed a commit
that referenced
this pull request
Aug 6, 2023
* go mod Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * updating go mod Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * bumping version Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * some commenting Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * make singular unions castable to their underlying type (#599) Signed-off-by: Daniel Rammer <daniel@union.ai> Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * fixed correct propagation of launchplan start error (#598) Signed-off-by: Daniel Rammer <daniel@union.ai> Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * bumping flytestdlib Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> --------- Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> Signed-off-by: Daniel Rammer <daniel@union.ai> Co-authored-by: Dan Rammer <daniel@union.ai>
eapolinario
pushed a commit
to eapolinario/flytepropeller
that referenced
this pull request
Aug 9, 2023
Signed-off-by: Daniel Rammer <daniel@union.ai>
eapolinario
pushed a commit
to eapolinario/flytepropeller
that referenced
this pull request
Aug 9, 2023
* go mod Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * updating go mod Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * bumping version Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * some commenting Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * make singular unions castable to their underlying type (flyteorg#599) Signed-off-by: Daniel Rammer <daniel@union.ai> Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * fixed correct propagation of launchplan start error (flyteorg#598) Signed-off-by: Daniel Rammer <daniel@union.ai> Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> * bumping flytestdlib Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> --------- Signed-off-by: Gopal K. Vashishtha <gvashishtha@anduril.com> Signed-off-by: Daniel Rammer <daniel@union.ai> Co-authored-by: Dan Rammer <daniel@union.ai>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR
Correctly fails a workflow node where the launchplan fails to start on admin.
Type
Are all requirements met?
Complete description
Launchplans are executing in FlytePropeller as
WorkflowNodes
. Basically, a launchplan is executed by FlytePropeller sending an execution request admin, which then starts the launchplan, and FlytePropeller stores the execution ID in theWorkflowNode
state. At each iteration FlytePropeller checks the status of the FlyteWorkflow CR represented by the execution ID and updates theWorkflowNode
state accordingly.What is happening in the issue linked below is FlyteAdmin is failing to start the launchplan. FlytePropeller detects this failure and in doing so maintains the proposed execution ID in the
WorkflowNode
state (here) and transitions the node to a failed state. When FlytePropeller attempts to event this state to FlyteAdmin, it checks whether the execution ID exists(here). Of course since FlyteAdmin failed to start the launchplan the execution ID does not exist. This failure results in theWorkflow does not exist
error that we see. And ultimately, FlytePropeller proceeds with aborting theWorkflowNode
, which is entirely unnecessary.To fix this, there are two possible solutions:
(1) If a launchplan fails to start by a user error (ex.invalid type interface), we do not set the execution ID on the
WorkflowNode
state because the execution ID was never started. Of course, this means that we trust FlyteAdmin to report user errors only when the launchplan was not able to execute -- I think this is reasonable. This is implemented in this PR.(2) Allow FlyteAdmin to fail checking the existence of an execution ID for events that report a failed state.
Tracking Issue
https://github.com/unionai/cloud/issues/4172
Follow-up issue
NA