-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Malformed WorkflowTemplate crashes whole UI #3666
Comments
Looking into this |
Was able to reprodcue |
Seems like this error is propagated directly from the Kubernetes client when we make our This might limit our ability to respond gracefully to this error, such as being able to display non-malformed Workflow templates or provide a more graceful explanation of the issue. Investigating more. |
The UI fails to this error page. Think is a known issue, see #3454. |
If the 500 error originates in the Kuberenetes API , then the 500 is probably not fixable. I. do think we should be more graceful in handling these errors. I.e. we shouldn't dump the user to and unusable UI. |
Haven't used the k8s go-client myself, so I don't know if the CRD yaml is generated from go classes or something, but is it possible to make the CRD spec stricter, so it isn't possible to submit a malformed workflow template in the first place? It seems to me this particular 500 arises because what the K8s API server accepts and what the go client accepts, by way of its json parser, doesn't quite match. To be honest, I wouldn't expect to be allowed to insert |
@SebastianGoeb in theory - yes -in practise - no. We provide CRDs with validation, but we don't use them by default. Why?
Essentially, if you need a robust system, you must lint your manifests. The |
If it isn't fixable by reconfiguration, maybe it's something that should be bumped up to the go-client itself? Because I would expect it to deserialize anything that the server accepts, i.e. anything that is allowed by the CRD's OpenAPI spec. Or, conversely, if it refuses to deserialize a CR, then I would expect the server to refuse it too. |
Ah, I see. That's a shame. But maybe that could be built into our CI workflow. |
Available for testing in v2.11.0-rc1. |
Checklist:
What happened:
Deployed a WorkflowTemplate with kubectl (really helm) that was malformed, but still accepted by K8s API server (fit CRD apparently). This broke the WorkflowTemplate UI list view (http://localhost:2746/workflow-templates), see screenshot below.
argo cli would have caught it:
but we will be deploying our workflow (-template) together with our application as a single helm chart because we want the declarative resource management, so argo cli isn't really an option.
What you expected to happen:
The WorkflowTemplate UI's list view should as least load and show me any WorkflowTemplates that are indeed syntactically correct. Put another way, I expect not to be prevented from working with my valid WorkflowTemplates, just because yesterday's newly deployed WorkflowTemplate is broken.
Particularly troubling is that the entire UI, including sidebar, disappears. Since it's only an ajax api request (
http://localhost:2746/api/v1/workflow-templates/argo
) that fails, I would expect to at least get a sensible error within the WorkflowTemplate UI about what has gone wrong. Ideally I would even be allowed to see the valid WorkflowTemplates and be informed about the broken one separately, maybe with a warning icon on the list item. While this could presumably be fixed by making the api and frontend smarter, another 500 will crop up in the future and break the UI somewhere else, and I would love not to be kicked out of the UI entirely whenever the server fails to provide some data.On that note, the error page seems to have a double-redirect which breaks the browser back button, so I can't even get back to the main page that I was on previously. Should I post a separate issue about that?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
Docker for Mac 2.3.0.4 (46911) with Kubernetes enabled
Logs
surprisingly, the server doesn't log the stacktrace seen in the UI.
but the workflow-controller does (partly at least)
Message from the maintainers:
If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
The text was updated successfully, but these errors were encountered: