-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle idle timeouts more gracefully #4524
Comments
I think we have addressed some of this in v2.11.7 and v2.12. Could you try latest? |
Yep, looks like you're right - v2.12.0-rc2 retries automatically after 10s and provides a way to explicitly reload. Wonder if we can improve the UX on that though since for low traffic instances, this might be a relatively common error to run into. Maybe we can retry seamlessly without interrupting the UI, or make the error not completely replace the workflow listing? |
Do you want to suggest something? |
Sure - I think a pretty straightforward change to make would be when a disconnect occurs:
|
This makes sense. Each page is different and needs different disconnect logic. |
Signed-off-by: Alex Collins <alex_collins@intuit.com>
I'm fixing in the v3 UI |
@alexec can you please provide a link of the PR/issue to track the fix? |
Summary
When you're using an ingress controller that has an idle timeout configured, it's possible that there are no events that occur within that period of time which results in the UI throwing an error since the
workflow-events
stream is closed. In the case of my cluster, I use ingress-nginx which has a default idle timeout of 60s.Since it's expected that these streams are very long lived connections, we should consider one of the following:
Send a piece of data periodically if none has been sent. This is not optimal imo since we'd need to filter this out on the client, and it still may not solve the problem if the user configures an idle timeout shorter than the interval that we sent data.
Retry the connection on the front-end at least once. If the connection is successfully re-established, then it's a candidate to retry again if/when the error occurs. If the connection fails to be re-established, throw our existing error since that might indicate a loss of network connectivity or problems with the
argo-server
pod.In both cases, we should also provide a nicer way to retry when this occurs rather than reloading the page.
Diagnostics
What Kubernetes provider are you using?
EKS 1.17 with Ingress NGINX
What version of Argo Workflows are you running?
v2.11.1
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: