Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman Restart Kind But SIGKILL may occur #3325

Closed
hangscer8 opened this issue Aug 9, 2023 · 6 comments
Closed

Podman Restart Kind But SIGKILL may occur #3325

hangscer8 opened this issue Aug 9, 2023 · 6 comments
Assignees
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@hangscer8
Copy link
Member

hangscer8 commented Aug 9, 2023

What happened:

I have modified the config files like this issue (#3071) to avoid the effect of inner IP changing , So the kind in podman can restart successfully.

But I found there's some probability that podman restart will end with the signal sending of SIGKILL. The default timeout setting of podman restart is 10 seconds. And if SIGKILL occurs, the kind container can not work normally, for example, it will get error by kubectl get pods.
image

And if the restart of kind cotnainer finishes in time, the kind container will work successfully in less than 10 seconds, there is no SIGKILL shown in the following figure, it got no error by kubectl get pods.
image

Finally, If setting timeout of podman restart to 120 seconds, i found it would spend 90 seconds to finish the restart action if the action did not complete in 10 seconds.
image
image
image

What you expected to happen:

The restart action can complete in 10 seconds.

How to reproduce it (as minimally and precisely as possible):

kind create cluster and then podman restrat the id of kind container

Anything else we need to know?:

Environment:

  • kind version: (use kind version): kind version 0.17.0 kindest/node:v1.26.2
  • Runtime info: (use docker info or podman info):
Client:       Podman Engine
Version:      4.4.1
API Version:  4.4.1
Go Version:   go1.18.10
Built:        Thu Jan  1 08:00:00 1970
OS/Arch:      linux/amd64
  • OS (e.g. from /etc/os-release):
NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.6"
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.2", GitCommit:"fc04e732bb3e7198d2fa44efa5457c7c6f8c0f5b", GitTreeState:"clean", BuildDate:"2023-03-10T21:12:33Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.2", GitCommit:"fc04e732bb3e7198d2fa44efa5457c7c6f8c0f5b", GitTreeState:"clean", BuildDate:"2023-03-10T21:12:33Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}
  • Any proxies or other special environment settings?: None
@hangscer8 hangscer8 added the kind/bug Categorizes issue or PR as related to a bug. label Aug 9, 2023
@hangscer8 hangscer8 changed the title Podman Restart Kinder Podman Restart Kinder But SIGKILL may occur Aug 9, 2023
@hangscer8 hangscer8 changed the title Podman Restart Kinder But SIGKILL may occur Podman Restart Kind But SIGKILL may occur Aug 9, 2023
@BenTheElder
Copy link
Member

Why do you think this is a kind bug?

We not only don't advertise support for podman restart but we're also not responsible for the behavior of podman restart sending a SIGKILL despite the container having a configured exit signal.

@BenTheElder
Copy link
Member

#2272 is the existing tracking issue for podman restart support.

@hangscer8
Copy link
Member Author

hangscer8 commented Aug 10, 2023

I guess that it is possible that the graceful shutdown in the software component which belongs to the kind container blocks the process of signal SIGTERM which the podman send to the kind container.

I noticed that it always spends about 90 seconds to restart the kind container every time if the restart action does not finish in 10 seconds. I am just wondering why and how this happends. I have checked out the source code of podman and containerd and kind and kubeadm, and i did not find where 90 seconds is literally writtern in the source code of these.

@BenTheElder
Copy link
Member

podman should not be sending sigkill, the exit signal is configured for the container and cannot be sigkill.

the process is systemd, which we can't control and must be PID1 and expects a different signal, which we configure in the image metadata

@BenTheElder BenTheElder added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Aug 10, 2023
@BenTheElder
Copy link
Member

BenTheElder commented Aug 10, 2023

kind/images/base/Dockerfile

Lines 231 to 233 in 80a64d9

# systemd exits on SIGRTMIN+3, not SIGTERM (which re-executes it)
# https://bugzilla.redhat.com/show_bug.cgi?id=1201657
STOPSIGNAL SIGRTMIN+3

https://systemd.io/CONTAINER_INTERFACE/

@BenTheElder BenTheElder self-assigned this Aug 25, 2023
@ehdis
Copy link

ehdis commented Mar 23, 2024

As I am getting the same delay - let me rephrase the issue. The configured StopSignal is honored, so the signal 37 is used to get the container down but the service in the container (that is in the scope of kind), is just taking (for some reason) his long time to shutdown.

[build@r9k8sdev ~]$ time podman stop --time 120  kind-control-plane 
kind-control-plane

real    1m30,563s
user    0m0,336s
sys     0m0,448s
[build@r9k8sdev ~]$ echo $?
0

kind could when creating the kind-control-plane container with podman use "podman create --stop-timeout=120" to alter the default timeout for stopping (stop process: use SIGTERM or configured StopSignal -> timeout -> SIGKILL).

Better would be to find the root cause of the delay but I'm not there yet ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

3 participants