Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to the handling of cluster addons #1091

Closed
neolit123 opened this issue Sep 3, 2018 · 27 comments
Closed

Improvements to the handling of cluster addons #1091

neolit123 opened this issue Sep 3, 2018 · 27 comments
Assignees
Labels
area/UX help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/design Categorizes issue or PR as related to design. kind/tracking-issue priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@neolit123
Copy link
Member

neolit123 commented Sep 3, 2018

this is a tracking issue for addon improvements.
it has a proposal that possibly needs discussion.

a couple of problems:

  1. addons rely on the same image repository as the control plane images.

addons don't follow the k8s versioning and the users don't have control of what version of an addon is pulled.

  1. addon manifests are hardcoded in kubeadm and the users need to patch the cluster if they need to apply different specs - (anti-)affinity rules or nodeSelectors etc.

these complexities can be solved if addons are introduced in a separate structure in the kubeadm configuration.

rough pseudo example:

ClusterConfiguration:
...
  addons:
  - name: coredns
    image: <some-image-url>
    manifestFile: <some-path-to-manifest>
  - name: kube-proxy
    image: <some-image-url>
    manifestFile: <some-path-to-manifest> # inlining
  - name: <some-other-addon?>
    image: <some-image-url>
    manifestFile: <some-path-to-manifest> # inlining

notes:

  • core addons can have special handling if needed, possibly reserving some names for such.
  • manifest files can contain --- separated list of objects. Deployment, ConfigMap etc.
  • if minifestFile is not defined use the defaults (hardcoded) values.
  • if image is not defined use defaults.

/cc @chuckha @fabriziopandini

edit1: it was also proposed to have restore-default-at-upgrade as a possible knob per addon element.
#1091 (comment)

@neolit123 neolit123 added kind/tracking-issue area/UX kind/design Categorizes issue or PR as related to design. labels Sep 3, 2018
@neolit123 neolit123 added this to the GA milestone Sep 3, 2018
@rosti
Copy link

rosti commented Sep 3, 2018

I like the idea. A couple of notes though:

  • image should probably be images and have an array of images. This covers the kube-dns case where 3 images are needed.
  • Stuff stored in the manifest files should be templated. A standard structure for addon templates should be created (probably containing stuff like the images, addon name, ClusterConfiguration, node stuff). This way we can standardize most of the addon specific code and reduce (or even remove) that code.

@chuckha
Copy link

chuckha commented Sep 4, 2018

Should the images/repository field go in the manifest file instead of outside the custom configurations? If not, what makes the images/repository more special than the configuration of the plugin?

@rosti
Copy link

rosti commented Sep 4, 2018

If we use an array of images, we can handle the case with non-obvious dependencies (like in the kube-dns case) and still have a fully functional image pre-pulling. However, if we can use only a single image there, then I agree with you, that this is pointless.

@neolit123
Copy link
Member Author

currently the manifests include types like DeamonSet, Service, Deployment, which are API types.
if we want the image list to be defined in a file, then we need to define a custom structure.

@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 4, 2018

@neolit123 thanks for kicking the ball on this topic
Some personal thoughts about this thread

  • kubeadm should manage only two addons: a proxy and a dns (currently in two variants). For all the other addons there are already plenty of alternatives kubectl apply, helm, kustomize etc
  • kubeadm should support add-on customizations up to a certain limits (e.g images, cluster wide component config + eventually node specific extra-args). Those limits defines also the perimeter of kubeadm test grids
  • For use cases behind above limits (e.g. manifest customizations) it should be possible to skip the addon phase and "do it by yourself"

@neolit123
Copy link
Member Author

kubeadm should manage only two addons: a proxy and a dns (currently in two variants). For all the other addons there are already plenty of alternatives kubectl apply, helm, kustomize etc

kubeadm should support add-on customizations up to a certain limits (e.g images, cluster wide component config + eventually node specific extra-args). Those limits defines also the perimeter of kubeadm test grids

ok, these two are some good points.

For use cases behind above limits (e.g. manifest customizations) it should be possible to skip the addon phase and "do it by yourself"

this also makes perfect sense, except all the reports i've seen about this feature were about using custom changes in addon manifests with kubeadm init. phases are powerful, but it seems that the users want this from kubeadm init, because it's the simpler UX.

@fabriziopandini
Copy link
Member

it seems that the users want this from kubeadm init, because it's the simpler UX.

That's true and I'm fine in pushing the limit of kubeadm supported configurations as far as we can follow with a good test grid coverage and/or we can reasonably ensure stable and predictable workflows.

@xlgao-zju
Copy link

xlgao-zju commented Sep 5, 2018

@neolit123 @fabriziopandini I think we should support customization of add-on version to let users use the version which they want. This is useful and simple. A newer version can include many fixes...

@chrisohaver
Copy link

chrisohaver commented Sep 5, 2018

A caveat with version selection: CoreDNS for example typically has a new default configuration with every release. This configuration is not built in the binary - it's essentially hard coded in kubeadm. Therefore (theoretical example) if installing CoreDNS 1.2.2 with kubeadm 1.11, you'd get the CoreDNS 1.1.3 default k8s deployment configuration. So this would be a "use at your own risk" option.

@neolit123
Copy link
Member Author

i see the common use case being:

addons:
  - name: coredns
  - name: kube-proxy

this would install the two addons, based on hardcoded configuration.
which we probably need to expose on the command line AKA config --print-default [1]

changing the kube-proxy image might not make a lot of sense because it's bound to k8s versioning, but this still allows manifestFile for tweaks.

@chrisohaver

A caveat with version selection: CoreDNS for example typically has a new default configuration with every release. This configuration is not built in the binary - it's essentially hard coded in kubeadm. Therefore (theoretical example) if installing CoreDNS 1.2.2 with kubeadm 1.11, you'd get the CoreDNS 1.1.3 default k8s deployment configuration. So this would be a "use at your own risk" option.

we have the option to make it so that if the user sets a different image for an addon a custom manifest file will be mandatory. but yes, they would be on their own if they want to use a custom image/config and we probably need to expose the default one as outlined in [1].

@rosti
Copy link

rosti commented Sep 5, 2018

We should also consider, that allowing the user to have a custom config map (for example), might make upgrade impossible, to the point of breaking it.

The user may be required to do manual upgrade of the addons if he/she used some config map option, that was changed and could not be upgraded automatically by kubeadm.

On the other hand, allowing the use of the latest patch release might be desirable, since those are not expected to change configs or APIs, but fix bugs.

@neolit123
Copy link
Member Author

i can definitely see the complication with upgrades, but again the users are on their own if they modify something beyond recognition. this is the case in most software upgrade scenarios.

we can potentially diff out deviations from the default addon manifests, print them out, but leave it to the users to manually patch.

@chrisohaver
Copy link

On the other hand, allowing the use of the latest patch release might be desirable, since those are not expected to change configs or APIs, but fix bugs.

Yes, CoreDNS policy is that 3rd dot releases wont introduce any config incompatibilities (e.g. 1.0.0 -> 1.0.1). 2nd dot release (e.g. 1.0.1 -> 1.1.0) means there is the possibility of config incompatibilities - although the default config may not be affected.

Perhaps off topic: As it is now, how are we handling coredns corefile/configmap during upgrades? Do we try to detect if the corefile is at a prior default, and if so update it to the new default? Or do we always replace (clobbering customizations). Or do we never replace? I recall discussing ways to solve, such as putting a comment in the corefile, or label in configmap (e.g. restore-default-at-upgrade=true) - leaving it to the user to control. I suppose we could also test the existing corefile to see if it's valid on the new version, and abort/warn user if it is not.

@neolit123
Copy link
Member Author

@chrisohaver

Perhaps off topic: As it is now, how are we handling coredns corefile/configmap during upgrades?

as in my previous comment:

we can potentially diff out deviations from the default addon manifests, print them out, but leave it to the users to manually patch.

my initial idea for this would be to tell the user something in the lines of "hey, we see that you have deviated your addon configuration from the default one like so..." and pretty much diff it.

on upgrades we show this message "kubadm is now upgrading addon X to the default configuration
for this new version and you need to perform manual actions to patch your manifest".

automatic handling might be out of scope / hard to do.

@rajansandeep
Copy link

@chrisohaver @neolit123

Perhaps off topic: As it is now, how are we handling coredns corefile/configmap during upgrades?

Currently, the corefile/configmap remains untouched during an upgrade if an existing coredns configmap is detected.

@chrisohaver
Copy link

@rajansandeep thanks, thats what I was asking.

So, if left unaddressed, we will eventually have issues of config incompatibles during normal upgrades, where users never change their config from the default.

@neolit123
Copy link
Member Author

@chrisohaver @rajansandeep
sorry, i've missed the context of the off-topic.

the current handling is not optimal (AKA missing).

I recall discussing ways to solve, such as putting a comment in the corefile, or label in configmap (e.g. restore-default-at-upgrade=true) - leaving it to the user to control

i may have missed these discussions.
this should be per addon configuration and not only for coredns, it seems.

proposal:

  • by default always hard-replace the old config with the new default?
    restore-default-at-upgrade=false is a possible knob here.
  • look for deviations in the old config, warn about them on upgrade.
  • require manual actions from users.

ideas are welcome.

I suppose we could also test the existing corefile to see if it's valid on the new version, and abort/warn user if it is not.

would this validation process, require vendoring coredns in kubeadm?

@fabriziopandini
Copy link
Member

@chrisohaver @rajansandeep
For kubelet and kube-prkxy kubeadm relies on component config for having a clean migration path for configuration files. This assumes that the owner of the config file, e.g. the. kubelet team, provides conversion functions from config vX to config vY
Are you aware of any config conversion functions provided by the coreDNS team?

@chrisohaver
Copy link

Are you aware of any config conversion functions (config vX to config vY) provided by the coreDNS team?

This doesn't exist yet. The complexity/flexibility of the CoreDNS config makes this challenging to do 100% cleanly.

We had some discussions about this a few months ago, and recently starting talking about it again.

@fabriziopandini
Copy link
Member

@chrisohaver ack.If there is a issue to follow such discussion I will add some comments about how to make this usable from kubeadm.

@neolit123
Copy link
Member Author

neolit123 commented Sep 5, 2018

in the meantime, if the proposal for addon customization is integrated in kubeadm before coredns gets the conversation and validation code in place we might have to go with the approach of instructing the users to do manual patching. this is extra handling, but it might be a good idea to diff the user deviations for addons regardless.

i would like to remind that the demand for this feature seems high and if anyone notices a kubeadm issue in the lines of allow kubeadm to set [custom image | manifest feature X] for addon Y just close -> redirect here.

@timothysc timothysc modified the milestones: GA, v1.13 Sep 24, 2018
@timothysc timothysc self-assigned this Sep 24, 2018
@timothysc timothysc added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Sep 24, 2018
@timothysc timothysc added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Oct 11, 2018
@timothysc
Copy link
Member

In the office hours yesterday we discussed need to figure out the balance here with bundles. https://github.com/GoogleCloudPlatform/k8s-cluster-bundle

I don't want to overfit the config, only to change it at a later date.

@timothysc timothysc added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Oct 11, 2018
@neolit123
Copy link
Member Author

sorry for not attending - meetings and travel this week.

what we know about bundles at the moment is only the theory.
we can only have an estimate that they suit our needs once we test them.

i'm considering doing that, but the project doesn't have documentation and examples:
https://github.com/GoogleCloudPlatform/k8s-cluster-bundle
or at least i cannot find them.

@roberthbailey @justinsb do you have any examples for usage of the c-bundle or perhaps i should log an issue about that?

@timothysc timothysc removed this from the v1.13 milestone Oct 31, 2018
@timothysc timothysc added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Oct 31, 2018
@neolit123
Copy link
Member Author

we already have coredns and kubeproxy addons in the config.
it was decided that we should not expose the manifest directly as the OP suggests.
this renders this tracking issue misleading and i'm closing it.

some related topics:

  • we still need to investigate cluster bundles
  • we have plans for a feature for modifying running clusters
  • impact on upgrades

as a side note kubectl now has kustomize built in.

@rdxmb
Copy link

rdxmb commented Nov 22, 2021

May I ask: What is the recommended way to patch those addons like

  • Add anti-affinity to coredns to run it on different nodes by default?

Are there any docs for doing that? Thanks and kind regards.

EDITED: fix typo above

@neolit123
Copy link
Member Author

neolit123 commented Nov 22, 2021 via email

@rdxmb
Copy link

rdxmb commented Nov 22, 2021

Ok, thanks to @neolit123 .

I will try to get the manifest with kubectl get and modify/deploy it again with kustomize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/UX help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/design Categorizes issue or PR as related to design. kind/tracking-issue priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

10 participants