This repository has been archived by the owner on Jul 30, 2021. It is now read-only.
Will not recover if you delete the last api-server pod #267
Labels
kind/regression
Categorizes issue or PR as related to a regression from a prior release.
priority/P0
We should be able to recover from deletion of the only api-server pod (as long as it is ultimately managed by a higher-level object like a daemonset).
However, with the change to the new checkpointer in v0.3.1 there is an issue where the local kubelet state will only be updated if it can contact an api-server. So checkpointer asks kubelet "is the api-server running" and it responds "yes". If that is the last api-server pod that was just deleted -- the local state will never be updated to reflect that it is not in fact running.
The old checkpointer would work around this because it would just try and hit "localhost:8080" to determine if the api-server is running -- but that has downsides that it means the checkpointer is no longer a generic tool (has to know about certain pods). And it also isn't actually accurate (it could be that the checkpoint copy is running and listening on :8080).
But it would be a short term option to keep allowing that same functionality.
/cc @Quentin-M
The text was updated successfully, but these errors were encountered: