Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monit errors in the logs "ERR /memory_checker: Failed to retrieve the running container list from docker daemon" #11472

Closed
liorghub opened this issue Jul 18, 2022 · 0 comments

Comments

@liorghub
Copy link
Contributor

liorghub commented Jul 18, 2022

Description

The following error appear in log during deinit flow:
ERR /memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))'

Steps to reproduce the issue:

This is issue is not deterministic. It happens when Monit plugin memory_checker that run periodically is running after reboot was perfomed. The plugin is trying to fetch the running containers but Docker engine was already stopped by systemd.

Describe the results you received:

The following error appear in log:
ERR /memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))'

Describe the results you expected:

If Docker engine is not running, do not try to fetch running containers and return empty list.

Output of show version:

root@r-lionfish-14:/home/admin# show version

SONiC Software Version: SONiC.202012.121748-81f200fde
Distribution: Debian 10.12
Kernel: 4.19.0-12-2-amd64
Build commit: 81f200fde
Build date: Wed Jul 13 13:56:52 UTC 2022
Built by: AzDevOps@sonic-build-workers-001RQ0

Platform: x86_64-mlnx_msn3420-r0
HwSKU: ACS-MSN3420
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2019X13879
Uptime: 09:01:14 up 0 min,  1 user,  load average: 2.14, 0.55, 0.19

Docker images:
REPOSITORY                    TAG                       IMAGE ID            SIZE
docker-syncd-mlnx             202012.121748-81f200fde   4ff0a131809c        859MB
docker-syncd-mlnx             latest                    4ff0a131809c        859MB
docker-sonic-mgmt-framework   202012.121748-81f200fde   ce220f832981        687MB
docker-sonic-mgmt-framework   latest                    ce220f832981        687MB
docker-sonic-telemetry        202012.121748-81f200fde   4fa7aae2af85        452MB
docker-sonic-telemetry        latest                    4fa7aae2af85        452MB
docker-teamd                  202012.121748-81f200fde   84db114155f5        373MB
docker-teamd                  latest                    84db114155f5        373MB
docker-orchagent              202012.121748-81f200fde   63355fc651e0        390MB
docker-orchagent              latest                    63355fc651e0        390MB
docker-nat                    202012.121748-81f200fde   7630cadd9ab8        376MB
docker-nat                    latest                    7630cadd9ab8        376MB
docker-sflow                  202012.121748-81f200fde   472b17674271        374MB
docker-sflow                  latest                    472b17674271        374MB
docker-fpm-frr                202012.121748-81f200fde   b2913e29b1c9        392MB
docker-fpm-frr                latest                    b2913e29b1c9        392MB
docker-platform-monitor       202012.121748-81f200fde   a528e89700c6        670MB
docker-platform-monitor       latest                    a528e89700c6        670MB
docker-snmp                   202012.121748-81f200fde   6571c977d276        405MB
docker-snmp                   latest                    6571c977d276        405MB
docker-router-advertiser      202012.121748-81f200fde   5bb86250436e        362MB
docker-router-advertiser      latest                    5bb86250436e        362MB
docker-lldp                   202012.121748-81f200fde   28155abdd11d        402MB
docker-lldp                   latest                    28155abdd11d        402MB
docker-database               202012.121748-81f200fde   dad90c7e6b74        362MB
docker-database               latest                    dad90c7e6b74        362MB
docker-mux                    202012.121748-81f200fde   b714d3f1d07f        414MB
docker-mux                    latest                    b714d3f1d07f        414MB
docker-dhcp-relay             202012.121748-81f200fde   87da71316a82        375MB
docker-dhcp-relay             latest                    87da71316a82        375MB

root@r-lionfish-14:/home/admin# 

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

yozhao101 pushed a commit that referenced this issue Jul 27, 2022
…emon is not running (#11476)

Fix in Monit memory_checker plugin. Skip fetching running containers if docker engine is down (can happen in deinit).
This PR fixes issue #11472.

Signed-off-by: liora liora@nvidia.com

Why I did it
In the case where Monit runs during deinit flow, memory_checker plugin is fetching the running containers without checking if Docker service is still running. I added this check.

How I did it
Use systemctl is-active to check if Docker engine is still running.

How to verify it
Use systemctl to stop docker engine and reload Monit, no errors in log and relevant print appears in log.

Which release branch to backport (provide reason below if selected)
The fix is required in 202205 and 202012 since the PR that introduced the issue was cherry picked to those branches (#11129).
qiluo-msft pushed a commit that referenced this issue Jul 27, 2022
…emon is not running (#11476)

Fix in Monit memory_checker plugin. Skip fetching running containers if docker engine is down (can happen in deinit).
This PR fixes issue #11472.

Signed-off-by: liora liora@nvidia.com

Why I did it
In the case where Monit runs during deinit flow, memory_checker plugin is fetching the running containers without checking if Docker service is still running. I added this check.

How I did it
Use systemctl is-active to check if Docker engine is still running.

How to verify it
Use systemctl to stop docker engine and reload Monit, no errors in log and relevant print appears in log.

Which release branch to backport (provide reason below if selected)
The fix is required in 202205 and 202012 since the PR that introduced the issue was cherry picked to those branches (#11129).
yxieca pushed a commit that referenced this issue Jul 28, 2022
…emon is not running (#11476)

Fix in Monit memory_checker plugin. Skip fetching running containers if docker engine is down (can happen in deinit).
This PR fixes issue #11472.

Signed-off-by: liora liora@nvidia.com

Why I did it
In the case where Monit runs during deinit flow, memory_checker plugin is fetching the running containers without checking if Docker service is still running. I added this check.

How I did it
Use systemctl is-active to check if Docker engine is still running.

How to verify it
Use systemctl to stop docker engine and reload Monit, no errors in log and relevant print appears in log.

Which release branch to backport (provide reason below if selected)
The fix is required in 202205 and 202012 since the PR that introduced the issue was cherry picked to those branches (#11129).
@liorghub liorghub closed this as completed Aug 4, 2022
skbarista pushed a commit to skbarista/sonic-buildimage that referenced this issue Aug 17, 2022
…emon is not running (sonic-net#11476)

Fix in Monit memory_checker plugin. Skip fetching running containers if docker engine is down (can happen in deinit).
This PR fixes issue sonic-net#11472.

Signed-off-by: liora liora@nvidia.com

Why I did it
In the case where Monit runs during deinit flow, memory_checker plugin is fetching the running containers without checking if Docker service is still running. I added this check.

How I did it
Use systemctl is-active to check if Docker engine is still running.

How to verify it
Use systemctl to stop docker engine and reload Monit, no errors in log and relevant print appears in log.

Which release branch to backport (provide reason below if selected)
The fix is required in 202205 and 202012 since the PR that introduced the issue was cherry picked to those branches (sonic-net#11129).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants