-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: quick fail after pod termination #1865
Conversation
Addressing @jessesuen's concern - what happens after setting
a.) If it's set in workflow level, when it comes to the deadline, the b.) If In summary, there's no impact for existing features with this change. |
@jessesuen do you want to review this? |
I've taken the liberty of syncing you with master so you have the new test infra. Would you like to add an e2e test for this? |
Codecov Report
@@ Coverage Diff @@
## master #1865 +/- ##
=========================================
Coverage ? 11.14%
=========================================
Files ? 35
Lines ? 23536
Branches ? 0
=========================================
Hits ? 2624
Misses ? 20576
Partials ? 336
Continue to review full report at Codecov.
|
When a k8s node where the POD runs has an issue, the POD will go to "terminating" state - which is actually "running" phase but with "DeletionTimestamp", and stuck there. This fix quick fails it when this kind of situation is detected.
@alexec - e2e test added, could you please review it again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Workflow("@expectedfailures/pod-termination-failure.yaml"). | ||
When(). | ||
SubmitWorkflow(). | ||
WaitForWorkflow(120 * time.Second). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor - change to 60s
Thanks to everyone involved! |
Fixes: #1832
When a k8s node where the POD runs has an issue, the POD will go to "terminating" state - which is actually "running" phase but with "DeletionTimestamp", and stuck there. This fix quick fails it when this kind of situation is detected.