Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

modules/bootkube: remove critical pod annotation #1702

Merged
merged 1 commit into from
Oct 3, 2017

Conversation

squat
Copy link
Contributor

@squat squat commented Aug 16, 2017

Remove the critical pod annotation and toleration from the
controller-manager and scheduler, since this can result in dangerous
behavior as discussed in
kubernetes-retired/bootkube#519.

fixes: #734

cc @aaronlevy

@@ -73,8 +71,6 @@ spec:
pod-anti-affinity: kube-controller-manager-${kubernetes_version}
tier: control-plane
tolerations:
- key: "CriticalAddonsOnly"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if we want to remove this toleration though. Might need to look into the upstream behaviors this ties into.

Copy link
Contributor Author

@squat squat Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we leave the toleration then it is possible that a controller-manager or scheduler pod could get scheduled into a slot that was designated for a different critical pod. Maybe this is not a big issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Well we probably do want the controller-manager to have this toleration. it is critical it's just unsafe with the behavior of the rescheduler. But I'm not immediately sure what the interaction would be like. The rescheduler might just keep kicking it off, and then it keeps getting rescheduled.

It's probably just not safe to run the rescheduler on the master nodes at all -- maybe that's the end goal we get to (and then we also remove the annotation / taint). And priority / preemption will deprecate rescheduler anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be less of a big deal now that I'm thinking of it. We should just say "do not run the rescheduler on master nodes, your master nodes should be provisioned such that they have a minimum of X resources and nothing else is scheduled there".

@aaronlevy
Copy link
Contributor

aaronlevy commented Aug 16, 2017

@robszumski @sym3tri If this is done without a the same being done in an update payload, please track the skew that this introduces and the version it is introduced in.

To update this currently it would require custom migration code (annotations are not handled for scheduler/controller-manager until the patch support is finalized).

@squat
Copy link
Contributor Author

squat commented Aug 16, 2017

ok to test

@s-urbaniak
Copy link
Contributor

ping @squat, is this ready to merge?

@squat
Copy link
Contributor Author

squat commented Aug 31, 2017

@s-urbaniak, this fell off my radar. I need to check back in with KVO team to see if patch support was finalized; otherwise I need to write a migration to fix the skew.

@diegs
Copy link
Contributor

diegs commented Sep 13, 2017

@s-urbaniak @squat Patch support is merged in so we can trivially replicate this in the KVO for this release.

@diegs
Copy link
Contributor

diegs commented Sep 13, 2017

Should we also remove it from the apiserver? It sounds like the rescheduler is going away anyway.

(looks like we also have it in kube-proxy too...).

@aaronlevy
Copy link
Contributor

ref (priority/preemption): kubernetes/enhancements#268

@squat
Copy link
Contributor Author

squat commented Sep 14, 2017

@diegs I'll make the adjustments in apiserver and kube-proxy as well

@diegs
Copy link
Contributor

diegs commented Sep 19, 2017

@squat is this in or out for 1.7.5? Just want to know if I should land my KVO PR or not. Thanks!

@squat
Copy link
Contributor Author

squat commented Oct 2, 2017

@diegs PTAL

diegs
diegs previously approved these changes Oct 2, 2017
Copy link
Contributor

@diegs diegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@squat
Copy link
Contributor Author

squat commented Oct 2, 2017

@diegs sorry, I just rebased on master to fix a test failure. PTAL once more 😅

Remove the critical pod annotation and toleration from the
controller-manager and scheduler, since this can result in dangerous
behavior as discussed in
kubernetes-retired/bootkube#519.
Copy link
Contributor

@diegs diegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aaronlevy
Copy link
Contributor

Sorry to be late to the review - but are we sure we want to remove the toleration?

@aaronlevy
Copy link
Contributor

Hmm. Maybe ignore me. The behavior with just the toleration would be weird too (pod would be deleted, then re-created). I guess my concern was if the taint would be used elsewhere (not just re-scheduler), and we actually don't want these pods messed with.

I skimmed the priority / preemption discussion and I don't think this taint will be re-used (we can always revisit if we need to later).

@diegs
Copy link
Contributor

diegs commented Oct 2, 2017

@aaronlevy that taint is added by the rescheduler, which this is removing support for (and apparently the rescheduler is deprecated anyway, right)?

https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/

@squat
Copy link
Contributor Author

squat commented Oct 2, 2017

ok to test

@aaronlevy
Copy link
Contributor

Yeah, I retract my concern

@squat
Copy link
Contributor Author

squat commented Oct 3, 2017

One AWS flake. Merging

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants