Podman restart support #2272

alvinsw · 2021-05-24T01:49:04Z

What happened: Cluster does not work anymore after podman container is restarted (eg after host OS boot). The issue is fixed foe docker (#148). Is there a plan to support restart for podman in the near future?

What you expected to happen: Cluster should run again after restarting podman container

How to reproduce it (as minimally and precisely as possible):

kind create cluster
podman stop kind-control-plane
podman start kind-control-plane

Anything else we need to know?:

Environment:

kind version: (use kind version): 0.11.0
Kubernetes version: (use kubectl version): kindest/node:v1.21.1
Docker version: (use docker info): podman version 3.1.2
OS (e.g. from /etc/os-release): Latest ArchLinux

The text was updated successfully, but these errors were encountered:

aojea · 2021-05-24T07:03:59Z

podman doesn't handle restarts by design, it needs to use systemd files for managing containers on restarts.

https://github.com/containers/podman/blob/master/docs/source/markdown/podman-generate-systemd.1.md

Bear in mind that KIND wraps these containers technologies, if docker supports something out of the box and podman doesn't, is not likely that KIND is going to workaround it, it is by far out of scope of the project, however we work close and have a good relationship with both projects, collaborating and opening bugs if necessary.

Are you running podman as rootless?
If podman support is "experimental", rootless is even "more experimental", so all the "advance" features may have bugs or simply not be supported at all ...

BenTheElder · 2021-05-24T09:27:36Z

Podman also lacks a stable container network identifier which makes managing Kubernetes nodes across restarts problematic.

I don't think anyone is planning to work on this feature or has a plan for how it might be possible.

alvinsw · 2021-05-25T00:36:04Z

No, I am running podman as root. that is kind create cluster is run by root user.
Minikube supports podman and it can still do cluster start and stop using podman.
What makes kind different in this case?
After executing podman start kind-control-plane, can we just manually run a script on the running kind-control-plane container to start everything all over again?
Or would it be easier to add feature where all user data on kind-control-plane container is persisted in the host machine? This means if you delete and create cluster again, the new cluster will still have all the k8s objects from the previous cluster.

BenTheElder · 2021-05-25T07:19:07Z

Minikube supports podman and it can still do cluster start and stop using podman.

Minikube supports podman and docker using a fork of the kind image yes.

What makes kind different in this case?

We don't work on that project. I don't work on podman support either. I can't tell you.

But I can tell you that podman lacks automatic restart for containers and lacks sufficient networking features to design robust restart. Node addresses will be random and restart support will be a roll of the dice. Stop and start is not what we mean when we say docker has restart support and has a different tracking issue that nobody has contributed to investigating this far. #1867

After executing podman start kind-control-plane, can we just manually run a script on the running kind-control-plane container to start everything all over again?

You're welcome to try but we have no such script.

Or would it be easier to add feature where all user data on kind-control-plane container is persisted in the host machine? This means if you delete and create cluster again, the new cluster will still have all the k8s objects from the previous cluster

Kubeadm doesn't support this AIUI. You can't just persist all data and then start a new cluster with it.

When stopping and starting or in docker restarting the data is persisted on any anonymous volume already. But not across clusters.

We are focused on making starting clusters cheap and quick so tests can be run from a clean state. We don't recommend keeping clusters permanently.

vugardzhamalov · 2021-09-15T23:09:08Z

Thank you @BenTheElder for explaining things in the earlier post!

Do you think it will be (or maybe it is already) possible to declare required parameters in the config YAML file? Say if I wan to restart a multi-node cluster running on podman - in addition to the number of nodes I could declare static IP addresses per node... and so on. In other words if podman doesn't provide this functionality is there any way to allow users to make further configuration changes in order to compensate?

secustor · 2021-10-21T16:23:27Z

You have to use podman restart kind-control-plane.

podman start does not reattach the port forwarding.
Interestingly after an implicit stop, like rebooting, you have to start it and then restarting to make it work.

benoitf · 2022-12-09T14:01:17Z

Hi @BenTheElder could you explain the "Node addresses will be random and restart support will be a roll of the dice"

I created an issue in Podman repository to be able to handle kind requirements but it's not clear what Kind is expecting from Podman side.
containers/podman#16797

BenTheElder · 2022-12-09T23:47:24Z

Podman networking has changed a lot over the past few years but historically container IPs are random on startup and podman lacked an equivalent mechanism to docker's embedded DNS resolver with resolvable container names.

I don't think it's appropriate to file a bug against podman for kind unless there's a specific bug.

As you saw in #2998 the other reason we have't had a restart policy for podman is podman didn't support them meaningfully. That has changed a bit.

tppalani · 2024-01-09T12:11:19Z

Hi @alvinsw

Even i'm also facing same error, after creating kind cluster using podman, when we are restring podman stop and start kind cluster not able to reach target endpoint. Almost we migrated docker podman around 1000 developers machine, this is something high priority. please let me if you get any workaround for this.

this is my support ticket - #3473

BenTheElder · 2024-01-11T19:27:53Z

Almost we migrated docker podman around 1000 developers machine, this is something high priority.

Unfortunately podman and docker are NOT direct substitutes and we don't have the bandwidth to spend on this ourselves currently.

In your issue, the containers are failing to start outright, at which point no kind code is even running, only podman/crun.

We'll continue to review suggested approaches to improving podman implementation in kind and the subsequent PRs.

Related: I think podman has had optional support for resolving container names for a while now, we could consider making this a pre-requesite and matching the docker behavior more closely.

ehdis · 2024-05-31T15:29:49Z

I noticed that on a current podman setup the stop command goes over to a SIGKILL of the container. The systemd in the control-plane container waits by it self for a process (in my case containerd) that do not get stopped 1m30s - but the above podman stop command sends SIGKILL after 10s. Its obvious what that means.

The args when creating the cluster/container could change the default of 10s for instance to 120s with the argument --stop-timeout=120. This would allow to shutdown the cluster gracefully ...

Better would be, to check the cause of containerd not returning immediately when stopped.

BenTheElder · 2024-05-31T16:09:08Z

but the above podman stop command sends SIGKILL after 10s. Its obvious what that means.

That's not obvious to me, SIGKILL is not even the right signal to tell systemd to exit. https://systemd.io/CONTAINER_INTERFACE/

The args when creating the cluster/container could change the default of 10s for instance to 120s with the argument --stop-timeout=120. This would allow to shutdown the cluster gracefully ...

We could do that, it seems like a behavioral gap versus docker and we should investigate what the actual behavior difference is and try to align them.

Help would be welcome identifying what is happening with docker nodes that isn't happening with podman nodes (or perhaps you're running a workload that inhibits shutdown?)

ehdis · 2024-05-31T19:31:48Z

Just to clarify, podman stop sends the signal that the container has configured (StopSignal) or the default SIGTERM. After the default timeout of 10s it sends the SIGKILL.

You are right, systemd/init containers should receive a different signal (37/SIGRTMIN+3). Therefore the container creation (e.g. control-plane) should have a --stop-signal= argument. Looking into my control-plane container it looks like that kind (v0.23.0) does not set the right signal (--stop-signal=37) to stop systemd. But the systemd process does the shutdown also with the used SIGTERM signal, so far. Not sure if it would make a difference. A quick test with podman kill --signal=37 control-plane does not show one.

My current problem is that the shutdown hangs here, and continues after the systemd internal timeout (1min 30s):

...
[  OK  ] Removed slice kubelet-kubepods-burstable-pod2954d591_64df_47ec_ac40_236a…ntainer kubelet-kubepods-burstable-pod2954d591_64df_47ec_ac40_236a244177b6.slice.
[  OK  ] Removed slice kubelet-kubepods-burstable-pod50dc3cdf_24ed_44a0_9d5d_9881…ntainer kubelet-kubepods-burstable-pod50dc3cdf_24ed_44a0_9d5d_988129d2591e.slice.
[ ***  ] (2 of 2) Job cri-containerd-3f1ea75a93823c1ffaece11518a124ec8950fcbc7cf9cdaac6fd00c2a415e8dd.scope/stop running (47s / 1min 30s)

And this is just a kind test cluster (single node) with a deployment of httpd:latest (replicas: 2) - thats all.

Sum up: The --stop-timeout wouldn't hurt and provide a better experience from the point of the user. The --stop-signal=37 would made a contribution to comply with systemd. Missing part is the cause of the shutdown delay ...

BenTheElder · 2024-05-31T19:34:08Z

Therefore the container creation (e.g. control-plane) should have a --stop-signal= argument.

We set this in the image.

BenTheElder · 2024-05-31T19:38:00Z

Sum up: The --stop-timeout wouldn't hurt and provide a better experience from the point of the user. The --stop-signal=37 would made a contribution to comply with systemd. Missing part is the cause of the shutdown delay ...

It might not, but we should not set different flags in podman versus docker without understanding if we're working around a difference in functionality, on the surface they're supposed to be compatible and kind is less useful as a test tool when the behavior isn't consistent.

.. so before doing that, we want to understand if this is an expected difference in behavior, or if we're only working around a podman bug, or if it affects both and we're only mitigating podman but not docker.

So far, I have not seen clusters fail to terminate, which suggests a difference in behavior that is possibly a bug OR it's because of something you're running in the cluster (or something different with your host).

Ideally we'd reproduce and isolate which aspect (your config, your host, your workload, docker vs podman) is causing the nodes to not exit and deal with the root issue instead of changing the behavior of kind podman nodes to work around an issue we don't understand and haven't seen before.

ehdis · 2024-06-01T18:08:05Z

Therefore the container creation (e.g. control-plane) should have a --stop-signal= argument.

We set this in the image.

Ooh, I don't know where I looked, definitely not at the right place ... its set :-)

alvinsw added the kind/bug Categorizes issue or PR as related to a bug. label May 24, 2021

aojea added area/provider/podman Issues or PRs related to podman kind/feature Categorizes issue or PR as related to a new feature. kind/external upstream bugs and removed kind/bug Categorizes issue or PR as related to a bug. labels May 24, 2021

BenTheElder mentioned this issue Nov 11, 2022

Add container restart policy when using podman #2998

Open

benoitf mentioned this issue Dec 9, 2022

Automatic restart for kind containers containers/podman#16797

Closed

BenTheElder mentioned this issue Jan 17, 2023

single node cluster crashes when node reboot (container IP changing makes kubeconf out of date) #3071

Closed

aojea mentioned this issue Feb 3, 2023

Cluster doesn't restart when docker restarts #148

Closed

BenTheElder mentioned this issue Aug 9, 2023

Podman Restart Kind But SIGKILL may occur #3325

Closed

rquinio1A mentioned this issue Oct 4, 2023

Can't restart kind-control-plane container on Windows containers/podman#20254

Closed

BenTheElder mentioned this issue Dec 27, 2023

(Option to) create podman containers with --restart=always to restart cluster after reboots #3463

Open

aojea mentioned this issue Jan 7, 2024

Kind cluster not coming up when the podman is stop and start #3473

Closed

BenTheElder mentioned this issue Jun 14, 2024

Inhibitor support for graceful node shutdown #3648

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Podman restart support #2272

Podman restart support #2272

alvinsw commented May 24, 2021

aojea commented May 24, 2021

BenTheElder commented May 24, 2021

alvinsw commented May 25, 2021

BenTheElder commented May 25, 2021

vugardzhamalov commented Sep 15, 2021

secustor commented Oct 21, 2021

benoitf commented Dec 9, 2022

BenTheElder commented Dec 9, 2022

tppalani commented Jan 9, 2024

BenTheElder commented Jan 11, 2024

ehdis commented May 31, 2024

BenTheElder commented May 31, 2024

ehdis commented May 31, 2024

BenTheElder commented May 31, 2024

BenTheElder commented May 31, 2024

ehdis commented Jun 1, 2024

Podman restart support #2272

Podman restart support #2272

Comments

alvinsw commented May 24, 2021

aojea commented May 24, 2021

BenTheElder commented May 24, 2021

alvinsw commented May 25, 2021

BenTheElder commented May 25, 2021

vugardzhamalov commented Sep 15, 2021

secustor commented Oct 21, 2021

benoitf commented Dec 9, 2022

BenTheElder commented Dec 9, 2022

tppalani commented Jan 9, 2024

BenTheElder commented Jan 11, 2024

ehdis commented May 31, 2024

BenTheElder commented May 31, 2024

ehdis commented May 31, 2024

BenTheElder commented May 31, 2024

BenTheElder commented May 31, 2024

ehdis commented Jun 1, 2024