-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: stop networkd before leaving etcd on 'reset' path #3590
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The problem is with VIP and `reset` sequence: the order of operations was that `etcd` was stopped first while `networkd` was still running, and if the node owned the VIP at the time of the reset action, the lease will be lost (as client connection is gone), so VIP will be unassigned for a pretty long time. This PR changes the order of operations: first, stop `networkd` and other pods, and leave `etcd` last, so that VIP is released, and `kube-apiserver` for example isn't left hanging on the node while `etcd` is gone. Fixes siderolabs#3500 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
/approve |
rsmitty
approved these changes
May 7, 2021
This comment has been minimized.
This comment has been minimized.
/promote integration-qemu-encrypted-vip |
/lgtm |
smira
added a commit
to smira/talos
that referenced
this pull request
May 14, 2021
The change is essentially same as siderolabs#3590, but applied to the upgrade path which is very similar to the reset path. We have to stop networkd (and remove the VIP/lease on the VIP) before we leave and stop etcd. Plus we stop the kube-apiserver before the etcd is stopped, so that we don't have unhealthy kube-apiserver. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
talos-bot
pushed a commit
that referenced
this pull request
May 14, 2021
The change is essentially same as #3590, but applied to the upgrade path which is very similar to the reset path. We have to stop networkd (and remove the VIP/lease on the VIP) before we leave and stop etcd. Plus we stop the kube-apiserver before the etcd is stopped, so that we don't have unhealthy kube-apiserver. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
smira
added a commit
to smira/talos
that referenced
this pull request
May 20, 2021
The change is essentially same as siderolabs#3590, but applied to the upgrade path which is very similar to the reset path. We have to stop networkd (and remove the VIP/lease on the VIP) before we leave and stop etcd. Plus we stop the kube-apiserver before the etcd is stopped, so that we don't have unhealthy kube-apiserver. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com> (cherry picked from commit 0825cf1)
smira
added a commit
that referenced
this pull request
May 20, 2021
The change is essentially same as #3590, but applied to the upgrade path which is very similar to the reset path. We have to stop networkd (and remove the VIP/lease on the VIP) before we leave and stop etcd. Plus we stop the kube-apiserver before the etcd is stopped, so that we don't have unhealthy kube-apiserver. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com> (cherry picked from commit 0825cf1)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The problem is with VIP and
reset
sequence: the order of operationswas that
etcd
was stopped first whilenetworkd
was still running,and if the node owned the VIP at the time of the reset action, the lease
will be lost (as client connection is gone), so VIP will be unassigned
for a pretty long time.
This PR changes the order of operations: first, stop
networkd
andother pods, and leave
etcd
last, so that VIP is released, andkube-apiserver
for example isn't left hanging on the node whileetcd
is gone.
Fixes #3500
Signed-off-by: Andrey Smirnov smirnov.andrey@gmail.com