-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for error codes #590
Add support for error codes #590
Comments
Hm, how would MCM do it? It cannot do it better compared to how Gardener/the extension do it, or?
As discussed earlier, a more streamlined approach could be a condition on the Generally, I'm not sure if this complexity in MCM would be helpful. The error codes are a Gardener thing only and I don't see a strong reason to introduce them in other components as well. |
MCM interacts with the cloud provider sdk and it can check the type of the returned error. Based on the returned error type from the cloud provider sdk, it can evaluate whether it is authentication/authorization/quota error and properly set the error code in the status. I believe this approach is more stable than the pattern matching from provider extension controller side over the Machine
The PDB case is different from authentication/authorization/quota errors and will check how to properly handle and indicate these type of errors. |
I agree that the error code mapping can be improved because it's somewhat fragile at the moment. We have an open issue about moving the error code mapping out of g/g and to the provider extensions, but I doubt that this would change the way how the mapping is done (string-based). |
We discussed internally and propose the following machineStatus:
MCM already has a well maintained collection of different kinds of error codes. We prefer having a This issue is requirement for solving a more urgent issue on CA |
While introducing this mapping introduced in AWS we should also try to see if changes in this PR gardener/machine-controller-manager-provider-aws#59 can be reverted. cc @rishabh-11 |
What would you like to be added:
Under the gardener org various CRs and API resources support error codes -
Shoot
,Etcd
, extension CRs.It will be very helpful if the Machine status also supports error codes.
Currently a Machine that fails to be created for example because of invalid credentials has the following example status:
There is nothing wrong with the above status but it is not very helpful when another component needs to interpret the error and to mark it properly (whether it is
ERR_INFRA_UNAUTHORIZED
,ERR_INFRA_QUOTA_EXCEEDED
or any other error code).We could add lastError to the Machine status that can hold the error code related to the last failed operation:
Initially this enhancement is requested in gardener/gardener#3020 where the concrete requirement for machine-controller-manager is to properly detect and flag misconfigured PodDisruptionBudgets that require zero voluntary Pod evictions and prevent graceful Node drain.
Why is this needed:
/area ops-productivity
The text was updated successfully, but these errors were encountered: