Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad does not see stopping jobs with podman 4.1.1 (rest-api response change) #182

Closed
DemonicTutor opened this issue Jul 21, 2022 · 0 comments · Fixed by #183
Closed

Nomad does not see stopping jobs with podman 4.1.1 (rest-api response change) #182

DemonicTutor opened this issue Jul 21, 2022 · 0 comments · Fixed by #183

Comments

@DemonicTutor
Copy link

DemonicTutor commented Jul 21, 2022

nomad 1.3.2
nomad-podman-driver 0.4.0
podman 4.1.1 (upgraded from 4.1.0)
CoreOS 36.20220703.3.1 (https://getfedora.org/en/coreos?stream=stable#) Release Date: Jul 18, 2022
aarch64

When stopping a job via Nomad the containers do get stopped but not removed (gc=true is configured) and the job status in nomad never changes to stopped.
When restarting the new allocation stays in starting and previous in running (but the first container gets stopped which matches our configs)

I think this commit in podman changed the rest-api stats response from HTTP 404 NOT FOUND to 200 OK if a container has been stopped and this seems to break the driver api
nomad-driver-podman/container_stats.go at 50007e0c40702d24c9710a8030a72a875826e743 · hashicorp/nomad-driver-podman which expects 404 to handle it as gone

podman via console:

[markus@mephistopheles ~]$ ssh 0.158.3.76
Fedora CoreOS 36.20220703.3.1
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos
[xxx@ip-10-158-3-76 ~]$ sudo podman ps -a
CONTAINER ID  IMAGE                                 COMMAND               CREATED      STATUS                       PORTS       NAMES
94de9d4d1bc7  [xxxx)       -config.file /dat...  3 hours ago  Up 3 hours ago                           promtail
97b88327af7d  [xxxx)  --path.procfs /ho...  3 hours ago  Up 3 hours ago                           prom-ne
7438328ead17  [xxxx)         agent -retry-join...  3 hours ago  Up 3 hours ago                           consul
97992073a31a  [xxxx)          agent -config=/no...  3 hours ago  Up 3 hours ago                           nomad
96343dff92bf  [xxxx)        traefik               3 hours ago  Up 3 hours ago                           traefik
2104ed346090  [xxxx)                             3 hours ago  Exited (143) 57 minutes ago              rng-9d62b66a-8b38-b1d9-5181-9fc4db8b5a27
[xxxx@ip-10-158-3-76 ~]$  sudo curl -i --unix-socket /run/podman/podman.sock http://u/v1.0.0/libpod/containers/rng-9d62b66a-8b38-b1d9-5181-9fc4db8b5a27/stats?stream=false
HTTP/1.1 200 OK
Api-Version: 1.40
Libpod-Api-Version: 4.1.1
Server: Libpod/4.1.1 (linux)
X-Reference-Id: 0xc0010e4860
Date: Wed, 20 Jul 2022 16:31:37 GMT
Transfer-Encoding: chunked

also i see these log messages from podman but i could not yet figure out where they come from - they continue until nomad is restarted so i assume it is related.

2022-07-21 10:22:42 | time="2022-07-21T08:22:42Z" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 8b70e11098b344d1cb065761f85b7313716ad9207619d0ebf38fe00323a104df is running: container is stopped"
2022-07-21 10:22:27 | time="2022-07-21T08:22:27Z" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 8b70e11098b344d1cb065761f85b7313716ad9207619d0ebf38fe00323a104df is running: container is stopped"
2022-07-21 10:22:12 | time="2022-07-21T08:22:12Z" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 8b70e11098b344d1cb065761f85b7313716ad9207619d0ebf38fe00323a104df is running: container is stopped"
2022-07-21 10:21:57 | time="2022-07-21T08:21:57Z" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 8b70e11098b344d1cb065761f85b7313716ad9207619d0ebf38fe00323a104df is running: container is stopped"
2022-07-21 10:21:42 | time="2022-07-21T08:21:42Z" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 8b70e11098b344d1cb065761f85b7313716ad9207619d0ebf38fe00323a104df is running: container is stopped"
2022-07-21 10:21:27 | time="2022-07-21T08:21:27Z" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 8b70e11098b344d1cb065761f85b7313716ad9207619d0ebf38fe00323a104df is running: container is stopped"
2022-07-21 10:21:12 | time="2022-07-21T08:21:12Z" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 8b70e11098b344d1cb065761f85b7313716ad9207619d0ebf38fe00323a104df is running: container is stopped"
2022-07-21 10:20:57 | time="2022-07-21T08:20:57Z" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 8b70e11098b344d1cb065761f85b7313716ad9207619d0ebf38fe00323a104df is running: container is stopped"
2022-07-21 10:20:42 | time="2022-07-21T08:20:42Z" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 8b70e11098b344d1cb065761f85b7313716ad9207619d0ebf38fe00323a104df is running: container is stopped"
2022-07-21 10:20:27 | time="2022-07-21T08:20:27Z" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 8b70e11098b344d1cb065761f85b7313716ad9207619d0ebf38fe00323a104df is running: container is stopped"
2022-07-21 10:20:12 | time="2022-07-21T08:20:12Z" level=error msg="Unable to get cgroup path of container: cannot get cgroup path unless container 8b70e11098b344d1cb065761f85b7313716ad9207619d0ebf38fe00323a104df is running: container is stopped"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

1 participant