-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change RKE upgrade logic for zero downtime #1800
Conversation
Does this include the ability to drain nodes before an upgrade to do a graceful rolling cluster upgrade? |
d17cb97
to
18849c3
Compare
18849c3
to
8993b8b
Compare
3bba464
to
956c67d
Compare
2a47873
to
8a69634
Compare
0e842b9
to
eca47c6
Compare
df73c49
to
6f4ec3d
Compare
services/workerplane.go
Outdated
} | ||
} | ||
|
||
func startWorkerPlane(ctx context.Context, kubeClient *kubernetes.Clientset, allHosts []*hosts.Host, localConnDialerFactory hosts.DialerFactory, prsMap map[string]v3.PrivateRegistry, workerNodePlanMap map[string]v3.RKEConfigNodePlan, certMap map[string]pki.CertificatePKI, updateWorkersOnly bool, alpineImage string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe the method name could be "startWorkerPlaneUpgrade" to make it clear that this is a part of the upgrade process?
The outcome of upgrading workers with one node that never gets to
|
8b20529
to
2d7d4ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
2d7d4ad
to
bff8d54
Compare
bff8d54
to
11678a3
Compare
#1772
Change worker plane components upgrade strategy for zero downtime upgrades
TunnelHosts
). If number of unreachable hosts = maxUnavailable, stop upgradex
out of 10 are done, start upgrading the nextx
nodesFor clusters with a large number of nodes, upgrading a percentage of them based on maxUnavailable will lead to multiple goroutines and errors due to that. This issue has details about it and why RKE switched to worker pool.
So maxUnavailable will be respected as long as it's not too big and capped at 50 which is the current worker threads RKE uses
#1734
Upgrade controlplane components one by one for zero downtime upgrades
Types PR for drain input: rancher/types#1069