fix: stop networkd before leaving etcd on 'reset' path #3590

smira · 2021-05-07T20:27:47Z

The problem is with VIP and reset sequence: the order of operations
was that etcd was stopped first while networkd was still running,
and if the node owned the VIP at the time of the reset action, the lease
will be lost (as client connection is gone), so VIP will be unassigned
for a pretty long time.

This PR changes the order of operations: first, stop networkd and
other pods, and leave etcd last, so that VIP is released, and
kube-apiserver for example isn't left hanging on the node while etcd
is gone.

Fixes #3500

Signed-off-by: Andrey Smirnov smirnov.andrey@gmail.com

The problem is with VIP and `reset` sequence: the order of operations was that `etcd` was stopped first while `networkd` was still running, and if the node owned the VIP at the time of the reset action, the lease will be lost (as client connection is gone), so VIP will be unassigned for a pretty long time. This PR changes the order of operations: first, stop `networkd` and other pods, and leave `etcd` last, so that VIP is released, and `kube-apiserver` for example isn't left hanging on the node while `etcd` is gone. Fixes siderolabs#3500 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>

smira · 2021-05-07T20:28:47Z

/approve

smira · 2021-05-07T20:48:37Z

/promote integration-qemu-encrypted-vip

smira · 2021-05-07T23:47:35Z

/lgtm

The change is essentially same as siderolabs#3590, but applied to the upgrade path which is very similar to the reset path. We have to stop networkd (and remove the VIP/lease on the VIP) before we leave and stop etcd. Plus we stop the kube-apiserver before the etcd is stopped, so that we don't have unhealthy kube-apiserver. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>

The change is essentially same as #3590, but applied to the upgrade path which is very similar to the reset path. We have to stop networkd (and remove the VIP/lease on the VIP) before we leave and stop etcd. Plus we stop the kube-apiserver before the etcd is stopped, so that we don't have unhealthy kube-apiserver. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>

The change is essentially same as siderolabs#3590, but applied to the upgrade path which is very similar to the reset path. We have to stop networkd (and remove the VIP/lease on the VIP) before we leave and stop etcd. Plus we stop the kube-apiserver before the etcd is stopped, so that we don't have unhealthy kube-apiserver. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com> (cherry picked from commit 0825cf1)

The change is essentially same as #3590, but applied to the upgrade path which is very similar to the reset path. We have to stop networkd (and remove the VIP/lease on the VIP) before we leave and stop etcd. Plus we stop the kube-apiserver before the etcd is stopped, so that we don't have unhealthy kube-apiserver. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com> (cherry picked from commit 0825cf1)

smira added this to the 0.11 milestone May 7, 2021

talos-bot added the status/approved label May 7, 2021

rsmitty approved these changes May 7, 2021

View reviewed changes

This comment has been minimized.

Sign in to view

talos-bot added the status/lgtm label May 7, 2021

talos-bot merged commit 4ffd7c0 into siderolabs:master May 7, 2021

smira mentioned this pull request May 13, 2021

backports: for 0.10.2 release #3614

Merged

smira mentioned this pull request May 14, 2021

fix: stop networkd and pods before leaving etcd on upgrade #3619

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stop networkd before leaving etcd on 'reset' path #3590

fix: stop networkd before leaving etcd on 'reset' path #3590

smira commented May 7, 2021

smira commented May 7, 2021

This comment has been minimized.

smira commented May 7, 2021

smira commented May 7, 2021

fix: stop networkd before leaving etcd on 'reset' path #3590

fix: stop networkd before leaving etcd on 'reset' path #3590

Conversation

smira commented May 7, 2021

smira commented May 7, 2021

This comment has been minimized.

smira commented May 7, 2021

smira commented May 7, 2021