Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd start failure when nodes container restart #300

Closed
wants to merge 1 commit into from

Conversation

aoxn
Copy link

@aoxn aoxn commented Feb 14, 2019

Introduction

systemd start failure when nodes container restart
use host cgroup for systemd initializing to avoid this failure

Problems

docker create a default cgroup mount for every node container instead of using host cgroup. systemd would start failure when nodes container restart (after SIGUSR1 was triggered) which due to some systemd cgroup behavior problems. Mount host cgroup to nodes container would resolve this problem, see https://www.freedesktop.org/wiki/Software/systemd/ContainerInterface/.

Reproduce

  1. kind create cluster kind-xx
  2. stop node containers by docker stop kind-xx
  3. start node containers by docker start kind-xx
  4. send SIGUSR1 signal by docker kill -s SIGUSR1 kind-xx
  5. exec into kind-xx node containers with ps -ef command.
    root@kind-55-control-plane:/sys/fs/cgroup/unified# ps -eaf UID PID PPID C STIME TTY TIME CMD root 1 0 0 18:23 ? 00:00:00 /sbin/init root 13 0 0 18:23 pts/0 00:00:00 bash root 56 13 0 20:00 pts/0 00:00:00 ps -eaf
    you would find systemd started without unit / service up.

Signed-off-by: Aoxn yaoyao.aoxn@hotmail.com

use host cgroup for systemd initializing to avoid this failure

Signed-off-by: Aoxn <yaoyao.aoxn@hotmail.com>
@k8s-ci-robot
Copy link
Contributor

Welcome @aoxn! It looks like this is your first PR to kubernetes-sigs/kind 🎉

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 14, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aoxn
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: bentheelder

If they are not already assigned, you can assign the PR to them by writing /assign @bentheelder in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 14, 2019
@k8s-ci-robot
Copy link
Contributor

@aoxn: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kind-conformance-parallel-1-11 17c0296 link /test pull-kind-conformance-parallel-1-11
pull-kind-conformance-parallel-1-12 17c0296 link /test pull-kind-conformance-parallel-1-12
pull-kind-conformance-parallel-1-13 17c0296 link /test pull-kind-conformance-parallel-1-13
pull-kind-conformance-parallel 17c0296 link /test pull-kind-conformance-parallel

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@BenTheElder
Copy link
Member

Thanks for the PR but:

  • restarting the containers is not currently supported and needs more than this, see Cluster doesn't restart when docker restarts #148 (we would like to solve this though!)
  • this won't work in CI currently
  • making assumptions about the host cgroup path may be a bad idea
  • the host may not use systemd

Note we are aware of https://www.freedesktop.org/wiki/Software/systemd/ContainerInterface (see the base image contents), but:

  • that does not say you must mount the host cgroups, you either can or can let the container do it
  • we're also running a CRI inside the container, and in CI we nest this again

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2019
@neolit123
Copy link
Member

neolit123 commented Feb 14, 2019

on a related note we are adding strong assumptions in the CRI install docs, that systemd is used as the init system:
https://github.com/kubernetes/website/pull/12638/files

@aoxn
Copy link
Author

aoxn commented Feb 22, 2019

@BenTheElder Thanks, Looking forward to this new feature!

@aoxn aoxn closed this Feb 22, 2019
stg-0 pushed a commit to stg-0/kind that referenced this pull request Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants