Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reboot has a probability that docker can't start #2382

Closed
tieguoevan opened this issue Dec 14, 2018 · 9 comments
Closed

Reboot has a probability that docker can't start #2382

tieguoevan opened this issue Dec 14, 2018 · 9 comments
Assignees

Comments

@tieguoevan
Copy link

Hi,
I found that execute reboot in linux shell has a probability that docker can't start. It seems to be related to the filesystem, because I can found some error in dmesg:

root@sonic:/home/admin# dmesg | tail -n 10
[   13.576222] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   13.576224] 8021q: adding VLAN 0 to HW filter on device eth0
[   13.576253] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   14.922309] audit: type=1400 audit(1478292796.539:9): apparmor="DENIED" operation="open" info="Failed name lookup - disconnected path" error=-13 profile="/usr/sbin/ntpd" name="sbin" pid=2287 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[   14.922319] audit: type=1400 audit(1478292796.539:10): apparmor="DENIED" operation="open" info="Failed name lookup - disconnected path" error=-13 profile="/usr/sbin/ntpd" name="bin" pid=2287 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[   14.922327] audit: type=1400 audit(1478292796.539:11): apparmor="DENIED" operation="open" info="Failed name lookup - disconnected path" error=-13 profile="/usr/sbin/ntpd" name="usr/sbin" pid=2287 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[   14.922334] audit: type=1400 audit(1478292796.539:12): apparmor="DENIED" operation="open" info="Failed name lookup - disconnected path" error=-13 profile="/usr/sbin/ntpd" name="usr/bin" pid=2287 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[  316.408867] EXT4-fs (loop1): error count since last fsck: 1
[  316.408896] EXT4-fs (loop1): initial error at time 949787565: ext4_ext_map_blocks:4301: inode 19
[  316.408906] EXT4-fs (loop1): last error at time 949787565: ext4_ext_map_blocks:4301: inode 19
root@sonic:/home/admin#
root@sonic:/home/admin# sudo journalctl -u docker.service
-- Logs begin at Sat 2016-11-05 04:53:05 CST, end at Sat 2016-11-05 07:30:49 CST. --
Nov 05 04:53:06 sonic systemd[1]: Starting Docker Application Container Engine...
Nov 05 04:53:06 sonic docker[674]: time="2016-11-05T04:53:06.825701426+08:00" level=info msg="New containerd process, pid: 686\n"
Nov 05 04:53:10 sonic docker[674]: time="2016-11-05T04:53:07.938119747+08:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Nov 05 04:53:10 sonic docker[674]: time="2016-11-05T04:53:07.953930518+08:00" level=info msg="Firewalld running: false"
Nov 05 04:53:10 sonic docker[674]: time="2016-11-05T04:53:07.983691273+08:00" level=warning msg="Your kernel does not support swap memory limit."
Nov 05 04:53:10 sonic docker[674]: time="2016-11-05T04:53:07.985328337+08:00" level=info msg="Loading containers: start."
Nov 05 04:53:10 sonic docker[674]: ...........
Nov 05 04:53:10 sonic docker[674]: time="2016-11-05T04:53:08.001555607+08:00" level=info msg="Loading containers: done."
Nov 05 04:53:10 sonic docker[674]: time="2016-11-05T04:53:08.001588040+08:00" level=info msg="Daemon has completed initialization"
Nov 05 04:53:10 sonic docker[674]: time="2016-11-05T04:53:08.001609489+08:00" level=info msg="Docker daemon" commit=5604cbe graphdriver=overlay version=1.11.1
Nov 05 04:53:10 sonic docker[674]: time="2016-11-05T04:53:08.015608154+08:00" level=info msg="API listen on /var/run/docker.sock"
Nov 05 04:53:08 sonic systemd[1]: Started Docker Application Container Engine.
Nov 05 07:16:18 sonic docker[674]: time="2016-11-05T07:16:18.193778467+08:00" level=error msg="Error setting up exec command in container syncd: Container 646ec592eeaa9540e6d2e78045528c6141f222374048f5303bb6c028591ecf07 is not running"
Nov 05 07:16:18 sonic docker[674]: time="2016-11-05T07:16:18.193843803+08:00" level=error msg="Handler for POST /v1.23/containers/syncd/exec returned error: Container 646ec592eeaa9540e6d2e78045528c6141f222374048f5303bb6c028591ecf07 is not running"

I dont know whether it's related to debian:stretch but we didn't met the problem when using debian:jessie.

Any ideas about this problem? Thanks!

Steps to reproduce the issue:

  1. reboot several times
@dawnbeauty
Copy link

We encountered a similar issue on branch 201811. Add more infos:

  • Steps
    reboot several times
  • What we found
    swss or syncd does not start its docker container as /usr/local/bin/swss.sh or syncd.sh hangs while executing docker exec database redis-cli commands.
    After that, all docker commands (eg: inspect, logs, exec, ps) for database container hang without any output. But redis server is ok since sonic-cfggen -d --print-data works as normal.

@jipanyang
Copy link
Collaborator

Which reboot command was used during the test, /usr/bin/reboot or /sbin/reboot or /sbin/shutdown -r ?

@dawnbeauty
Copy link

sudo reboot -> /usr/bin/reboot

@lguohan
Copy link
Collaborator

lguohan commented Feb 7, 2019

we have updated the 18.09 and it seems solve this issue, can you check if you still see this problem in latest image?

@jipanyang
Copy link
Collaborator

@dawnbeauty with recent docker engine update and containerd systemd service change, I didn't see docker start issue with reboot any more. Has that fixed your issue too?

@dawnbeauty
Copy link

@jipanyang, thanks for the reply.
I'll build and check again.

@dawnbeauty
Copy link

@jipanyang The change seems fixed the issue. I've tried 100+ times to reboot and didn't encounter any issue about docker start.

SONiC Software Version: SONiC.201811.0-dirty-20190216.015227
Distribution: Debian 9.7
Kernel: 4.9.0-8-amd64
Build commit: 4faa5f2
Build date: Sat Feb 16 10:57:27 UTC 2019

@jipanyang
Copy link
Collaborator

@dawnbeauty thanks for the update! I think we may close this issue.

@yxieca
Copy link
Contributor

yxieca commented Sep 19, 2019

This issue has been addressed with docker engine upgrade.

@yxieca yxieca closed this as completed Sep 19, 2019
stephenxs added a commit to stephenxs/sonic-buildimage that referenced this issue Aug 9, 2022
dc8bc1c [portsorch] Expose supported FEC modes to STABE_DB and check whether FEC mode is supported before setting it (sonic-net#2333)
6565b50 Revert "[portsorch] Expose supported FEC modes to STABE_DB and check whether FEC mode is supported before setting it (sonic-net#2333)" (sonic-net#2396)
dc88d55 Revert hwinfo count change (sonic-net#2383)
75fc965 [DualToR] Handle race condition between tunnel_decap and mux orchestrator (sonic-net#2397)
525a57f Fix for remote system interface not getting created (sonic-net#2364)
3161eaa portsorch: initial support for link-training (sonic-net#2359)
dc477fb [swss/cfgmgr] teammgr configure lacp rate (sonic-net#2121)
2489ad5 Improve pytest speend by grouping 20 tests together. (sonic-net#2390)
168bd3b [EVPN]Modified tunnel creation logic when creating tunnel in VRF-VNI map creation flow (sonic-net#2404)
1e1438e [portsorch] Expose supported FEC modes to STABE_DB and check whether FEC mode is supported before setting it (sonic-net#2400)
9f2e27b [QoS] Fix issue: the WRED profile can not be set if current min > new max or current max < new min (sonic-net#2379)
d36c17d [asan][aclorch] fix a memory leak in the SaiAttrWrapper::swap() (sonic-net#2382)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
prgeor pushed a commit that referenced this issue Aug 10, 2022
* Advance sonic-utilities pointer

7919077f Add FEC correctable and uncorrectable port stats (#2027)
ecb91367 Add CLI to configure YANG config validation (#2147)
e9ab5235 Add override testcase to verify removal (#2288)
c6794b55 Fix version in db_migrator  for `PORT_QOS_MAP|global` (#2289)
92b889b7 [intfutil] Check whether the FEC mode is supported on the platform before configuring it to CONFIG_DB (#2223)
dab0d065 Transfer organization from Azure to sonic-net (#2284)
6de18a1d [watermarkstat] Fix CLI script for unconfigured PG counters (#2239)
ac2f5530 Improve the way to check port type of RJ45 port (#2249)
142185c9 Fix the issue that sonic_platform is not installed on vs image (#2300)
ca14133f [crm] add checking for CRM interval range (#2293)

Signed-off-by: Stephen Sun <stephens@nvidia.com>

* Advance swss

dc8bc1c [portsorch] Expose supported FEC modes to STABE_DB and check whether FEC mode is supported before setting it (#2333)
6565b50 Revert "[portsorch] Expose supported FEC modes to STABE_DB and check whether FEC mode is supported before setting it (#2333)" (#2396)
dc88d55 Revert hwinfo count change (#2383)
75fc965 [DualToR] Handle race condition between tunnel_decap and mux orchestrator (#2397)
525a57f Fix for remote system interface not getting created (#2364)
3161eaa portsorch: initial support for link-training (#2359)
dc477fb [swss/cfgmgr] teammgr configure lacp rate (#2121)
2489ad5 Improve pytest speend by grouping 20 tests together. (#2390)
168bd3b [EVPN]Modified tunnel creation logic when creating tunnel in VRF-VNI map creation flow (#2404)
1e1438e [portsorch] Expose supported FEC modes to STABE_DB and check whether FEC mode is supported before setting it (#2400)
9f2e27b [QoS] Fix issue: the WRED profile can not be set if current min > new max or current max < new min (#2379)
d36c17d [asan][aclorch] fix a memory leak in the SaiAttrWrapper::swap() (#2382)

Signed-off-by: Stephen Sun <stephens@nvidia.com>

* More sonic-utilities PRs

be1866fd Fix GCU bug when backend service modifying config (#2295)
bcf36eb3 Fix issues for sonic_installer upgrade-docker and sonic_installer rollback-docker (#2278)

Signed-off-by: Stephen Sun <stephens@nvidia.com>

Signed-off-by: Stephen Sun <stephens@nvidia.com>
tshalvi pushed a commit to tshalvi/sonic-buildimage that referenced this issue Dec 20, 2022
…c-net#2382)

* fix a leak caused by overriding this->m_attr (which contained a dynamically allocated list) in the SaiAttrWrapper::swap()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants