Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a v1.27.2 cluster on swap enabled host fails #8766

Closed
chrischdi opened this issue May 30, 2023 · 8 comments · Fixed by #8767
Closed

Creating a v1.27.2 cluster on swap enabled host fails #8766

chrischdi opened this issue May 30, 2023 · 8 comments · Fixed by #8767
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@chrischdi
Copy link
Member

What steps did you take and what happened?

  • Run tilt-setup with CAPD
  • Create cluster with Kubernetes v1.27.2
  • See the kubelet not starting due to active swap

What did you expect to happen?

  • kubelet runs without issues

Cluster API version

main

Kubernetes version

v1.27.2

Anything else you would like to add?

CI is good because there's no swap there.

For kind node images <= v1.27.2, the fail-swap-on flag for the kubelet was always said.
This flag got removed in https://github.com/kubernetes-sigs/kind/pull/3240/files in favor of setting it via the KubeletConfiguration, which kind was already doing.

There are multiple ways to fix this for CAPD:

Technically it comes down to either:

  • Setting the --fail-swap-on=false flag on the kubelet (the flag itself is deprecated for usage, I don't know when it gets removed)
  • Adding a KubeletConfiguration and configuring the failSwapOn option (like kind does here)

And there are multiple ways to do that for CAPD, high-level:

  • Adjust the ClusterClass (templates or add a json patch)
  • Inject the setting in some way in go code, but only for CAPD

Label(s) to be applied

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 30, 2023
@killianmuldoon
Copy link
Contributor

/triage accepted

Thanks for filing this - I'm probably in favor of adding an inline patch to manage this for our CAPD templates as that's the way we tended to handle issues like this in the past - e.g. cgroup stuff.

I don't think we should worry too much about the arg being deprecated until we have some idea about when it will be removed.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 30, 2023
@chrischdi
Copy link
Member Author

chrischdi commented May 30, 2023

I totally agree on this.

I can file that PR to update if we agree on doing it that way: https://github.com/kubernetes-sigs/cluster-api/blob/main/test/infrastructure/docker/templates/clusterclass-quick-start.yaml and test/e2e/data/infrastructure-docker/<version>/clusterclass-quick-start.yaml.

Two options there:

  • ClusterClass JSON Patch (similar to cgroupDriver patches)
  • Directly at the kubeletExtraArgs in KubeadmConfigTemplate and KubeadmControlPlaneTemplate (similar to eviction-hard argument)

@killianmuldoon
Copy link
Contributor

Let's do the ClusterClass JSON Patch IMO

@killianmuldoon
Copy link
Contributor

Should probably update our other ClusterClasses - those used for e2e tests etc. Even though we don't see this error in CI running those locally probably doesn't work right now.

@chrischdi
Copy link
Member Author

Let's do the ClusterClass JSON Patch IMO

👍 on that.

Just wanted to add that we can make it version dependent on the first variant, so we don't break possible old versions which don't have the flag.

@chrischdi
Copy link
Member Author

/assign

@chrischdi
Copy link
Member Author

Note: feature got introduced as alpha in kubernetes v1.22 kubernetes/enhancements#2400

@sbueringer
Copy link
Member

sbueringer commented May 30, 2023

+1 to going with a ClusterClass JSON patch + keeping our ClusterClasses as similar as possible

Potentially we can replace this later if kubernetes/kubeadm#1682 is implemented before the command line flag is dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants