-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman run -d: hangs when $NOTIFY_SOCKET is set #7316
Comments
Oooh! Try running |
Is this the same issue as #6688 |
Yes, it looks like a different manifestation of the same problem. I'd say "feel free to close", but given how badly #6688 has been neglected, I'm going to leave it open as a reminder that this really is an unpleasant and unacceptable bug, |
Oops. PR containers#6693 (sdnotify) added tests, but they were disabled due to broken crun on f31. I tried for three weeks to get a magic CI:IMG PR to update crun on the CI VMs ... but in that time I forgot to actually enable those new tests. This PR removes a 'skip', replacing it with a check that systemd is running plus one more to make sure our runtime is crun. It looks like sdnotify just doesn't work on Ubuntu (it hangs), and my guess is that it's a crun/runc issue. I also changed the test image from fedora:latest to :31, because, sigh, fedora:latest removed the systemd-notify tool. WARNING WARNING WARNING: the symptom of a missing systemd-notify is that podman will hang forever, not even stopped by the timeout command in podman_run! (Filed: containers#7316). This means that if the sdnotify-in-container test ever fails, the symptom will be that Cirrus itself will time out (2 hours?). This is horrible. I don't know what to do about it other than push for a fix for 7316. Signed-off-by: Ed Santiago <santiago@redhat.com>
I don't think it is a bug. |
I'm fine with the container hanging. I'm not fine with |
Yes, I think we need a way to work around the lock. |
We have a way forward via containers/conmon#182 if we can get it landed |
I am fine with the solution proposed there, but we would still have the locking issue when |
Some CI tests are hanging, timing out in 60 or 120 minutes. I wonder if it's containers#7316, the bug where all podman commands hang forever if NOTIFY_SOCKET is set? Signed-off-by: Ed Santiago <santiago@redhat.com>
Looks like containers/conmon#182 is ready to go in, but was not looked at for 20 days. |
A friendly reminder that this issue had no activity for 30 days. |
- run --userns=keep-id: confirm that $HOME gets set (containers#8013) - inspect: confirm that JSON output is a sane number of lines (10 or more), not an unreadable one-liner (containers#8011 and containers#8021). Do so with image, pod, network, volume because the code paths might be different. - cgroups: confirm that 'run' preserves cgroup manager (containers#7970) - sdnotify: reenable tests, and hope CI doesn't hang. This test was disabled on August 18 because CI jobs were hanging and timing out. My suspicion was that it was containers#7316, which in turn seems to have hinged on conmon containers#182. The latter was merged on Sep 16, so let's cross our fingers and see what happens. Also: remove inaccurate warning from a networking test. And, wow, fix is_cgroupsv2(), it has never actually worked. Signed-off-by: Ed Santiago <santiago@redhat.com>
I am continuing to try to get this to pass CI in #8508 |
A friendly reminder that this issue had no activity for 30 days. |
A friendly reminder that this issue had no activity for 30 days. |
Issue still present, fc02d16 |
A friendly reminder that this issue had no activity for 30 days. |
A friendly reminder that this issue had no activity for 30 days. |
Still present in 3bdbe3c |
A friendly reminder that this issue had no activity for 30 days. |
A friendly reminder that this issue had no activity for 30 days. |
IMHO, the only way to prevent hanging is to change the default of |
Or well, once #11246 is merged. |
Confirmed: I no longer see this bug when I apply #11246. |
This leverages conmon's ability to proxy the SD-NOTIFY socket. This prevents locking caused by OCI runtime blocking, waiting for SD-NOTIFY messages, and instead passes the messages directly up to the host. NOTE: Also re-enable the auto-update tests which has been disabled due to flakiness. With this change, Podman properly integrates into systemd. Fixes: containers#7316 Signed-off-by: Joseph Gooch <mrwizard@dok.org> Signed-off-by: Daniel J Walsh <dwalsh@redhat.com> Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Note the ten-second difference in the
socat
timestamps; that is because the container issleep 10
. Change it tosleep 30
, you get a 30-second delay.What I expected: since this is
run -d
(detached), I expected podman to detach immediately and let the container deal with sdnotify.(How I found this: trying to run
systemd-notify
in afedora:latest
container. Ha ha, silly me, they removed systemd-notify from that image). Ergo, I think this is counterintuitive behavior if a user has a container that never makes it tosdnotify
. I really don't expectpodman run -d
to hang forever.The text was updated successfully, but these errors were encountered: