Daemon errors with `(HTTP code 404) -- no such container: sandbox` #261

cywang117 · 2021-07-12T08:23:24Z

NOTE: For users and support agents arriving here in the future: since it's not clear how we can reproduce this issue, please find out more information about various conditions on the device. Some good starting questions and things to check:

Did this error appear after a release update?
Are deltas enabled?
Does the release build use intermediate containers? (If not sure, looking at the Dockerfile(s) of the containers would tell you)
Any other questions which you think might be relevant.

Asking the user if they wouldn't mind leaving the device in this invalid state for engineers to investigate would also help, if the user is okay with this of course.

Description

balenaEngine daemon errors with (HTTP code 404) -- no such container: sandbox . However, there is no sandbox container on the device. This error is communicated by the device Supervisor from the journal logs with:

Device state apply error Error: Failed to apply state transition steps. (HTTP code 404) no such container - sandbox 915c9f1f78712e9db8bb1edf3d94fd669a917c608270f4c95e3a8c72de142b15 not found Steps:["updateMetadata"]

Per https://github.com/balena-io/balena-io/issues/1684, this might be due to a bad internal state with one of the containers on the device. The issue is fixed by restarting balenaEngine with systemctl restart balena OR systemctl stop balena-supervisor && balena stop $(balena ps -a -q) && balena rm $(balena ps -a -q) && systemctl start balena-supervisor, however this is not ideal as the containers experience a few minutes of downtime.

It's unclear how to reproduce this issue.

Additional information you deem important (e.g. issue happens only occasionally):

Issue happens when a new update is downloaded by the device. Has sometimes appeared in combination with #1579, making cause unclear.

Additional environment details (device type, OS, etc.):

Device Type: Raspberry Pi 4 64bit, 2GB RAM
OS: balenaOS 2.80.3+rev1.prod

The text was updated successfully, but these errors were encountered:

jellyfish-bot · 2021-07-12T08:24:47Z

[cywang117] This issue has attached support thread https://jel.ly.fish/72633746-3415-449a-9617-e123cba1e954

jellyfish-bot · 2021-07-12T08:29:07Z

[cywang117] This issue has attached support thread https://jel.ly.fish/e7428359-c335-4d00-81db-dfb4293d1423

cywang117 · 2021-07-12T08:35:00Z

The fact that stopping the Supervisor, removing the containers, and starting the Supervisor fixes the issue seems to indicate that this is a Supervisor issue and not a balenaEngine issue. I'll move this to the Supervisor repo

cywang117 · 2021-07-12T09:06:13Z

So it seems that just restarting the Supervisor without removing containers does not fix this issue. However, restarting balenaEngine fixes the issue. Now I'm unclear whether this is Supervisor related or balenaEngine related. I'm leaning towards this being related to balenaEngine having bad state for one of the containers on the device, as a Supervisor restart didn't do anything.

jellyfish-bot · 2021-07-12T09:40:10Z

[cywang117] This issue has attached support thread https://jel.ly.fish/661c8c96-8357-4bfc-9380-308a65fff910

jellyfish-bot · 2021-07-20T19:01:48Z

[danthegoodman1] This issue has attached support thread https://jel.ly.fish/a4f6be4b-50dc-454d-9c5c-dbcf168119db

cywang117 · 2021-07-21T02:29:06Z

@lmbarros @robertgzr Drawing your attention to some edits I made to this GitHub issue:

NOTE: For users and support agents arriving here in the future: since it's not clear how we can reproduce this issue, please find out more information about various conditions on the device. Some good starting questions and things to check:

Did this error appear after a release update?

Are deltas enabled?

Does the release build use intermediate containers? (If not sure, looking at the Dockerfile(s) of the containers would tell you)

Any other questions which you think might be relevant.

Asking the user if they wouldn't mind leaving the device in this invalid state for engineers to investigate would also help, if the user is okay with this of course.

Are there any other questions that you think would be useful in investigating the causes behind this issue? Could this kind of problem be something that is unavoidable based on current implementation limitations in dependencies (Moby)?

jellyfish-bot · 2021-09-08T23:18:32Z

[pipex] This issue has attached support thread https://jel.ly.fish/dc8d2638-ebb4-4ba8-8ae6-edae48602850

jellyfish-bot · 2021-09-22T15:50:30Z

[pipex] This issue has attached support thread https://jel.ly.fish/e82fe388-3955-4252-97c4-6c837151cce2

jellyfish-bot · 2021-12-06T20:11:12Z

[pipex] This issue has attached support thread https://jel.ly.fish/b7fa70df-ad99-4deb-8f6a-2b78d2f47a44

pipex · 2021-12-09T16:17:57Z

Some extra information for this ticket, this has been reported to be happening more with containers that don't get updated as frequently as others. So a container that has been renamed a few times while others have been recreated may sometimes get into this state

For instance, for a particular device, the failing container shows a network prefix of 16

root@4cd008d3ffa1:/opt# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
15: eth0@if16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:11:00:05 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.5/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

While other veth networks have a much larger prefix, confirming that this is an old network.

root@c73b31f:~# ip a | grep veth
1291: veth6f7ff99@if1290: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue master br-2f18a4b13b86 
16: veth367da35@if15: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue master br-2f18a4b13b86 
1380: veth72261a3@if1379: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue master br-2f18a4b13b86 
1180: vethe52f1a4@if1179: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue master br-2f18a4b13b86

Could this issue be an unintended side effect of some cleanup process?

jellyfish-bot · 2022-01-04T16:48:17Z

[gantonayde] This issue has attached support thread https://jel.ly.fish/1b57a2f7-e2b2-4658-94ef-0a35bef04f4b

jellyfish-bot · 2022-01-19T14:48:49Z

[pipex] This issue has attached support thread https://jel.ly.fish/bf30fa84-cc92-4cf8-aefd-4c2f14c4a944

jellyfish-bot · 2022-01-21T13:01:01Z

[nitish] This issue has attached support thread https://jel.ly.fish/9f4bc524-e6d5-4480-98a5-4d2cefba84f3

vipulgupta2048 · 2022-06-02T21:02:51Z

Did this error appear after a release update? Yep
Are deltas enabled? Yes
Does the release build use intermediate containers? Indeed, 2 stages

Happened on a new device with just the second release I pushed on it, running a minimal server application (200 mb image, 2 stage build process). Error is below:

Jun 02 20:25:35 a01a838 balena-supervisor[2376]: [info]    Applying target state
Jun 02 20:25:36 a01a838 balena-supervisor[2376]: [error]   Scheduling another update attempt in 1000ms due to failure:  Error: Failed to appl>
Jun 02 20:25:36 a01a838 balena-supervisor[2376]: [error]         at fn (/usr/src/app/dist/app.js:6:8690)
Jun 02 20:25:36 a01a838 balena-supervisor[2376]: [error]   Device state apply error Error: Failed to apply state transition steps. (HTTP code>
Jun 02 20:25:36 a01a838 balena-supervisor[2376]: [error]         at fn (/usr/src/app/dist/app.js:6:8690)
Jun 02 20:25:37 a01a838 balena-supervisor[2376]: [info]    Applying target state
Jun 02 20:25:38 a01a838 balena-supervisor[2376]: [error]   Scheduling another update attempt in 2000ms due to failure:  Error: Failed to appl>
Jun 02 20:25:38 a01a838 balena-supervisor[2376]: [error]         at fn (/usr/src/app/dist/app.js:6:8690)
Jun 02 20:25:38 a01a838 balena-supervisor[2376]: [error]   Device state apply error Error: Failed to apply state transition steps. (HTTP code>
Jun 02 20:25:38 a01a838 balena-supervisor[2376]: [error]         at fn (/usr/src/app/dist/app.js:6:8690)
Jun 02 20:25:40 a01a838 balena-supervisor[2376]: [info]    Applying target state

Attaching diagnostics File: a01a83846e174aa51dc2b33fbf0a17e7_diagnostics_2022.06.02_20.56.19+0000.txt

Adding outputs of commands balena info and balena version

root@a01a838:~# balena info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 3
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: journald
 Cgroup Driver: systemd
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host null
  Log: journald json-file local
 Swarm: 
  NodeID: 
  Is Manager: false
  Node Address: 
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: balena-engine-init
 containerd version: 
 runc version: 
 init version: 949e6fa-dirty (expected: de40ad007797e)
 Kernel Version: 5.10.83-v8
 Operating System: balenaOS 2.94.4
 OSType: linux
 Architecture: aarch64
 CPUs: 4
 Total Memory: 960MiB
 Name: a01a838
 ID: V47H:PCFQ:GMDT:PV3S:OW2J:FRXS:MRZ7:V737:5HEQ:BFCP:GBUS:SJOJ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
root@a01a838:~# balena version
Client:
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.2
 Git commit:        73c78258302d94f9652da995af6f65a621fac918
 Built:             Wed Mar  2 10:28:01 2022
 OS/Arch:           linux/arm64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.2
  Git commit:       73c78258302d94f9652da995af6f65a621fac918
  Built:            Wed Mar  2 10:28:01 2022
  OS/Arch:          linux/arm64
  Experimental:     true
 containerd:
  Version:          1.4.0+unknown
  GitCommit:        
 runc:
  Version:          spec: 1.0.2-dev
  GitCommit:        
 balena-engine-init:
  Version:          0.13.0
  GitCommit:        949e6fa-dirty

FD: https://www.flowdock.com/app/rulemotion/r-supervisor/threads/FQqETXXQaGFg1oLyWz7ccNbPgAx

jellyfish-bot · 2022-08-18T12:31:39Z

[lmbarros] This has attached https://jel.ly.fish/88b86997-9411-40b9-ae2f-8f3505febb93

jellyfish-bot · 2022-11-22T22:29:13Z

[pipex] This has attached https://jel.ly.fish/c09369f0-c870-4f93-9133-0ec8b995fda9

The `updateMetadata` step renames the container to match the target release when the service doesn't change between releases. We have seen this step fail because of an engine bug that seems to relate to the engine keeping stale references after container restarts. The only way around this issue is to remove the old container and create it again. This implements that workaround during the updateMetadata step to deal with that issue. Change-type: minor Relates-to: balena-os/balena-engine#261

cywang117 transferred this issue from balena-os/balena-engine Jul 12, 2021

cywang117 transferred this issue from balena-os/balena-supervisor Jul 12, 2021

pipex mentioned this issue Nov 19, 2021

Containers drop off bridge networks unexpectedly #258

Open

pipex mentioned this issue Nov 15, 2023

Force remove container if updateMetadata fails balena-os/balena-supervisor#2226

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Daemon errors with `(HTTP code 404) -- no such container: sandbox` #261

Daemon errors with `(HTTP code 404) -- no such container: sandbox` #261

cywang117 commented Jul 12, 2021 •

edited

Loading

jellyfish-bot commented Jul 12, 2021

jellyfish-bot commented Jul 12, 2021

cywang117 commented Jul 12, 2021

cywang117 commented Jul 12, 2021 •

edited

Loading

jellyfish-bot commented Jul 12, 2021

jellyfish-bot commented Jul 20, 2021

cywang117 commented Jul 21, 2021

jellyfish-bot commented Sep 8, 2021

jellyfish-bot commented Sep 22, 2021

jellyfish-bot commented Dec 6, 2021

pipex commented Dec 9, 2021

jellyfish-bot commented Jan 4, 2022

jellyfish-bot commented Jan 19, 2022

jellyfish-bot commented Jan 21, 2022

vipulgupta2048 commented Jun 2, 2022 •

edited by jellyfish-bot

Loading

jellyfish-bot commented Aug 18, 2022

jellyfish-bot commented Nov 22, 2022

Daemon errors with (HTTP code 404) -- no such container: sandbox #261

Daemon errors with (HTTP code 404) -- no such container: sandbox #261

Comments

cywang117 commented Jul 12, 2021 • edited Loading

jellyfish-bot commented Jul 12, 2021

jellyfish-bot commented Jul 12, 2021

cywang117 commented Jul 12, 2021

cywang117 commented Jul 12, 2021 • edited Loading

jellyfish-bot commented Jul 12, 2021

jellyfish-bot commented Jul 20, 2021

cywang117 commented Jul 21, 2021

jellyfish-bot commented Sep 8, 2021

jellyfish-bot commented Sep 22, 2021

jellyfish-bot commented Dec 6, 2021

pipex commented Dec 9, 2021

jellyfish-bot commented Jan 4, 2022

jellyfish-bot commented Jan 19, 2022

jellyfish-bot commented Jan 21, 2022

vipulgupta2048 commented Jun 2, 2022 • edited by jellyfish-bot Loading

jellyfish-bot commented Aug 18, 2022

jellyfish-bot commented Nov 22, 2022

Daemon errors with `(HTTP code 404) -- no such container: sandbox` #261

Daemon errors with `(HTTP code 404) -- no such container: sandbox` #261

cywang117 commented Jul 12, 2021 •

edited

Loading

cywang117 commented Jul 12, 2021 •

edited

Loading

vipulgupta2048 commented Jun 2, 2022 •

edited by jellyfish-bot

Loading