-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On server reboot, container exits with code 128, won't retry #293
Comments
I'm seeing the same issue, also with a Plex container, and I'm also bind-mounting a network share. There are a few differences in my situation - I'm using the official Plex docker image, I'm using macvlan network, and I'm running it with docker-compose. I'm seeing exactly the same symptoms though. There are no application logs at all inside the container and no entries in the container logs either ( The container starts normally if I do If I remove the bind-mounted network share, the container starts normally at boot, so it seems that the issue is the container tries to start before the network share has been mounted. Therefore I'm not sure whether this constitutes a Docker bug to be honest. Excerpt from my
Error from docker inspect is the same as above:
I'm on a slightly later docker version and I'm on Ubuntu 18.04 LTS
|
This issue can be reproduced with this basic container which bind-mounts a network share:
It gives the same behaviour and same error after reboot.
If I remove the network bind-mount, then it works and starts correctly after reboot:
Therefore the issue is simply that Docker is attempting to start the container before the mount has completed. I don't think this can be considered a Docker bug - how is Docker daemon supposed to know to wait for the network mount? I suspect the fix on a case by case basis is to add an |
~Just for info my full working entry from `/etc/fstab/ is:~~
I spoke too soon. The above does allow the container to start, but the share isn't actually mounted. The above should not be used. A working fix is to modify the docker See https://www.freedesktop.org/software/systemd/man/systemd.unit.html#RequiresMountsFor= This is not ideal since it requires modifying the Docker service each time a container needs a mount, but it does the job. |
It seems this is actually a recurrence of a previous issue moby/moby#17485 Repro steps are nearly identical, apart from different mount type. |
I encounter the same issue. Even though the restart policy of my containers is set to unless-stopped, they don't come up if one of the prerequisite mount points are not available at the time Docker attemtps to start them. The retry logic (which otherwise works fine) is not executed. The status is:
|
Yep, struggling with this as well at the moment. NFS mount is not setup before Docker starts, so the container doesn't work as expected. |
hello, |
Why docker is not trying to restart this container? |
Same issue here. If the CIFS share is not mounted, container exits and does not attempt to restart. Container will start fine when started manually once the network share is available. |
Something similar happens in my case. I've got an encrypted folder in Synology, with automount enabled. Since it's not mounted yet when the docker service starts, it doesn't start until I manually do it with Result from
This is really annoying, since I only use my Synology NAS several hours a day... and I need to start some docker services automatically. |
I see the same issue. My docker-paths are directly mapped to the filesystem of locally attched SSDs. When I check
Is there a way to force docker to restart the services in this case? |
How is this not fixed??? This is extremely annoying, isn't it? |
I gathered extra information for my case:
Ths output of docker inspect:
And my /lib/systemd/system/docker.service :
Is there a way to wait for the nvidia driver to be properly loaded other than with "RequiresMountsFor"?? |
Same issue occurred today after running In my case the container is from the official image for Traefik and was having restart set to version: '2'
services:
traefik:
image: traefik:1.7
restart: always
ports:
- 80:80
- 443:443
networks:
- traefik
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/traefik/traefik.toml:/traefik.toml
- /opt/traefik/acme.json:/acme.json
container_name: traefik
networks:
traefik:
external: true
Anyone from docker can comment on this issue? Maybe @andrewhsu, @tiborvass, @thaJeztah or @duglin can help in pointing this issue to anyone that can give a hand here. |
I had this exact situation. I start my containers using
After another reboot everything was fixed. Any ideas on why did this happen, or where should I look at to debug the situation? |
A quick glance at the errors mentioned, it looks like all cases are trying to do a bind-mount of an extra disk that is not available yet the moment that docker starts, as commented above as well #293 (comment)
I think the reason the daemon might not continue trying is that it requires the container to start successfully "once", before it will start monitoring the container (to handle restarting the container once it exits). I seem to recall this was done to prevent situations where (e.g. similar to what's discussed here) a "broken" container configuration causing a DOS of the whole daemon. Perhaps the best solution is to create a systemd drop-in file to delay starting the docker service until after the required mounts are present, I see this thread on reddit https://www.reddit.com/r/linuxadmin/comments/5z819x/how_to_have_a_systemd_service_wait_for_a_network/ also mentions |
My "solution" so far is to create a SHELL=/snap/bin/pwsh
@reboot root <path>/autorestart.ps1 copy that script to
$isRunning = (docker inspect -f '{{.State.Running}}' <mycontainer>) | Out-String
while ($isRunning.TrimEnd() -ne "true")
{
"Container is not running. Starting container ..."
docker container start <mycontainer>
Start-Sleep -Seconds 10
$isRunning = (docker inspect -f '{{.State.Running}}' <mycontainer>) | Out-String
}
"Done." |
I am experiencing this same issue on Ubuntu 20.04 (and just upgraded to 21, same issue) using systemd. I have tried the RequiresMountsFor directive but it does not resolve the issue. |
I had the same trouble with a simple docker compose file for loki without any remote folders. It semed to fail for just mounting a local file quoting something about mounting through proc. I therefore created my own systemd startup file for docker, which seems to run now even iv rebooted: I changed/added these two lines:
Full file for reference is here:
|
you resolved these issue ??? comment please |
I don't remember the details as In no longer use virtualboz but I solved this by changing the systemd priorities. I think I held docker back until the auto mount was complete or I put a sleep in a startup script. I'm sorry In can't remember the details but the solution lies in systemd |
I have this issue on a local bind mount, not a network share, so it's definitely not just that situation. Only one container does this. I'm not sure why. I have restart=always on it, still doesn't retry. |
Experiencing the same problem with |
Having the same issue on Debian 12 and Vaultwarden - local binds only. Unfortunately, the fix suggested by @kkretsch did not work. Oddly, I have both Vaultwarden and vaultwarden-backup in the same compose file, binding the same local directory (vaultwarden-backup has two additional unrelated binds) - yet only Vaultwarden 128s every reboot; the other container start up just fine. On a separate host (Debian 11), I'm having the same issue with Traefik (sporadically, by contrast) . In this case as well, multiple additional containers are sharing a common local bind. However, testing without multiple containers binding a common directory yields inconsistent results for me. |
Ubuntu 22.04, Docker 25.0.0, build e758fe5, this is still an issue. For me it happens with any container, that has restart=always. |
I have the same issue with Ubuntu 22.04.1 Docker Version 24.0.5. Any solution? |
Just going to throw my "I have the same issue" out there. This is incredibly frustrating... I've also tried mounting the drive via The only work-around I have found is to delay docker from starting.
Add
This isn't a foolproof fix though, there is definitely still a chance things will fail to load properly. |
That's how I solved it: https://gitlab.com/-/snippets/3715249 |
Truly wonderful fix. I was distro-hopping and ended up with OpenSUSE. It uses NetworkManager by default, and I assume it has a delay with DHCP or something else, causing Pi-hole to exit with code 128, because I bind port 53 to the host IP. The error message I got was
I used your code, with a delay of 10 sec only, and it worked flawlessly.
Thank you |
Actual behavior
After rebooting the server the container does not start back up. The container tries to start but exists with code 128. This looks like its due to the network volume not being available at the time of startup. It takes a few seconds before the volume is ready. The message "no such device" appears in the error log. Manually starting the container works because the network volume is then available.
The container is set to restart=always but Docker does not attempt to restart the container. RestartCount is 0.
Here is the docker command:
Here is the error message from docker inspect:
Output of
docker version
:Output of
docker info
:The text was updated successfully, but these errors were encountered: