Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMON container exit immediately after "docker start pmon" with latest master image #2839

Closed
keboliu opened this issue Apr 30, 2019 · 2 comments

Comments

@keboliu
Copy link
Collaborator

keboliu commented Apr 30, 2019

Description

PMON container exited immediately after "docker start pmon" in pmon.sh

Apr 30 07:05:58.477598 mtbc-sonic-01-2410 INFO pmon.sh[10704]: Starting existing pmon container with HWSKU ACS-MSN2410
Apr 30 07:05:58.685721 mtbc-sonic-01-2410 INFO containerd[378]: time="2019-04-30T07:05:58.685295060Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/158ebc25513c2550a472e06a2a3e75a86a6bac80e523799ab6075b23bef1bc4f/shim.sock" debug=false pid=11667
Apr 30 07:05:59.295153 mtbc-sonic-01-2410 INFO pmon.sh[10704]: pmon
Apr 30 07:05:59.310058 mtbc-sonic-01-2410 INFO systemd[1]: Started Platform monitor container.
Apr 30 07:05:59.407225 mtbc-sonic-01-2410 INFO swss#supervisord: start.sh enable_counters: started
Apr 30 07:06:01.662867 mtbc-sonic-01-2410 NOTICE swss#nbrmgrd: :- main: --- Starting nbrmgrd ---
Apr 30 07:06:01.662867 mtbc-sonic-01-2410 NOTICE swss#nbrmgrd: :- loadRedisScript: lua script loaded, sha: 88270a7c5c90583e56425aca8af8a4b8c39fe757
Apr 30 07:06:01.662867 mtbc-sonic-01-2410 NOTICE swss#nbrmgrd: :- main: starting main loop
Apr 30 07:06:02.543252 mtbc-sonic-01-2410 INFO syncd.sh[4801]: Starting existing syncd container with HWSKU ACS-MSN2410
Apr 30 07:06:02.600995 mtbc-sonic-01-2410 INFO swss#supervisord: start.sh nbrmgrd: started
Apr 30 07:06:02.942199 mtbc-sonic-01-2410 INFO containerd[378]: time="2019-04-30T07:06:02.941962628Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/41282de755b7a2e29002162b120dd59614f10c56db298e1cc4cb78c097268d65/shim.sock" debug=false pid=12724
Apr 30 07:06:02.972090 mtbc-sonic-01-2410 INFO containerd[378]: time="2019-04-30T07:06:02.966316443Z" level=info msg="shim reaped" id=158ebc25513c2550a472e06a2a3e75a86a6bac80e523799ab6075b23bef1bc4f
Apr 30 07:06:02.986537 mtbc-sonic-01-2410 INFO dockerd[379]: time="2019-04-30T07:06:02.986346651Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 30 07:06:03.069359 mtbc-sonic-01-2410 INFO pmon.sh[11933]: 0
Apr 30 07:06:03.393095 mtbc-sonic-01-2410 INFO pmon.sh[12750]: pmon

Steps to reproduce the issue:
this issue reproduced stably with master image starting from 953.

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

syslog attached.

syslog.zip

SONiC Software Version: SONiC.HEAD.956-ad2c1b2
Distribution: Debian 9.9
Kernel: 4.9.0-8-2-amd64
Build commit: ad2c1b2
Build date: Sun Apr 28 08:38:17 UTC 2019
Built by: johnar@jenkins-worker-3

Platform: x86_64-mlnx_msn2410-r0
HwSKU: ACS-MSN2410
ASIC: mellanox
Serial Number: MT1848K10623
Uptime: 10:34:23 up  3:29,  1 user,  load average: 2.73, 2.55, 2.47

Docker images:
REPOSITORY                 TAG                 IMAGE ID            SIZE
docker-dhcp-relay          HEAD.956-ad2c1b2    4e8db672e338        256MB
docker-dhcp-relay          latest              4e8db672e338        256MB
docker-fpm-quagga          HEAD.956-ad2c1b2    b5d608c9c504        281MB
docker-fpm-quagga          latest              b5d608c9c504        281MB
docker-syncd-mlnx-rpc      HEAD.956-ad2c1b2    e59e740d124e        617MB
docker-syncd-mlnx-rpc      latest              e59e740d124e        617MB
docker-teamd               HEAD.956-ad2c1b2    184617224403        300MB
docker-teamd               latest              184617224403        300MB
docker-sonic-telemetry     HEAD.956-ad2c1b2    0b0b9f93bfd5        300MB
docker-sonic-telemetry     latest              0b0b9f93bfd5        300MB
docker-snmp-sv2            HEAD.956-ad2c1b2    dd7f19154668        317MB
docker-snmp-sv2            latest              dd7f19154668        317MB
docker-router-advertiser   HEAD.956-ad2c1b2    1e0782909d55        279MB
docker-router-advertiser   latest              1e0782909d55        279MB
docker-platform-monitor    HEAD.956-ad2c1b2    98dffb8c7efa        324MB
docker-platform-monitor    latest              98dffb8c7efa        324MB
docker-orchagent           HEAD.956-ad2c1b2    6f752e1afab4        319MB
docker-orchagent           latest              6f752e1afab4        319MB
docker-lldp-sv2            HEAD.956-ad2c1b2    199b4184f107        298MB
docker-lldp-sv2            latest              199b4184f107        298MB
docker-database            HEAD.956-ad2c1b2    6ff3d687146f        280MB
docker-database            latest              6ff3d687146f        280MB
**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
@nazariig
Copy link
Collaborator

nazariig commented May 2, 2019

@keboliu the root cause is the new platform API package which shares the same name with the sonic-config-engine.

root@sonic:/usr/local/lib/python2.7/dist-packages# sonic-cfggen -h
Traceback (most recent call last):
File "/usr/local/bin/sonic-cfggen", line 31, in
from sonic_platform import get_machine_info
ImportError: cannot import name get_machine_info

root@sonic:/usr/local/lib/python2.7/dist-packages# ls sonic_platform -la
total 64
drwxr-sr-x 2 root staff 4096 May 2 13:40 .
drwxrwsr-x 1 root staff 4096 May 2 14:42 ..
-rw-r--r-- 1 root staff 0 May 2 10:35 init.py
-rw-r--r-- 1 root staff 155 May 2 10:35 init.pyc
-rw-r--r-- 1 root staff 2234 May 2 10:35 chassis.py
-rw-r--r-- 1 root staff 2226 May 2 10:35 chassis.pyc
-rw-r--r-- 1 root staff 8085 May 2 10:35 fan.py
-rw-r--r-- 1 root staff 7627 May 2 10:35 fan.pyc
-rw-r--r-- 1 root staff 2343 May 2 10:35 psu.py
-rw-r--r-- 1 root staff 2564 May 2 10:35 psu.pyc
-rw-r--r-- 1 root staff 6618 May 2 10:35 watchdog.py
-rw-r--r-- 1 root staff 8439 May 2 10:35 watchdog.pyc

root@sonic:/usr/local/lib/python2.7/dist-packages# ls -la | grep sonic_platform
drwxr-sr-x 2 root staff 4096 May 2 13:40 sonic_platform
-rw-r--r-- 1 root staff 2631 May 2 09:14 sonic_platform.py
-rw-r--r-- 1 root staff 2877 May 2 09:14 sonic_platform.pyc
drwxr-sr-x 2 root staff 4096 May 2 13:40 sonic_platform_base
drwxr-sr-x 2 root staff 4096 May 2 13:40 sonic_platform_common-1.0.dist-info

root@sonic:/usr/local/lib/python2.7/dist-packages# mv sonic_platform sonic_platform_
root@sonic:/usr/local/lib/python2.7/dist-packages# sonic-cfggen -h
usage: sonic-cfggen [-h] [-m [MINIGRAPH] | -M DEVICE_DESCRIPTION | -k HWSKU]
[-p [PORT_CONFIG]] [-y YAML] [-j JSON]
[-a ADDITIONAL_DATA] [-d] [-H] [-s REDIS_UNIX_SOCK_FILE]
[-t TEMPLATE | -v VAR | --var-json VAR_JSON | --write-to-db | --print-data | --preset {l2,empty,t1}]

Render configuration file from minigraph data and jinja2 template.

optional arguments:
-h, --help show this help message and exit
-m [MINIGRAPH], --minigraph [MINIGRAPH]
minigraph xml file
-M DEVICE_DESCRIPTION, --device-description DEVICE_DESCRIPTION
device description xml file
-k HWSKU, --hwsku HWSKU
HwSKU
-p [PORT_CONFIG], --port-config [PORT_CONFIG]
port config file, used with -m or -k
-y YAML, --yaml YAML yaml file that contains additional variables
-j JSON, --json JSON json file that contains additional variables
-a ADDITIONAL_DATA, --additional-data ADDITIONAL_DATA
addition data, in json string
-d, --from-db read config from configdb
-H, --platform-info read platform and hardware info
-s REDIS_UNIX_SOCK_FILE, --redis-unix-sock-file REDIS_UNIX_SOCK_FILE
unix sock file for redis connection
-t TEMPLATE, --template TEMPLATE
render the data with the template file
-v VAR, --var VAR print the value of a variable, support jinja2
expression
--var-json VAR_JSON print the value of a variable, in json format
--write-to-db write config into configdb
--print-data print all data
--preset {l2,empty,t1}
generate sample configuration from a preset template

@keboliu keboliu closed this as completed Jun 18, 2019
@keboliu
Copy link
Collaborator Author

keboliu commented Aug 28, 2019

fixed by sonic-net/sonic-utilities#528

dprital added a commit to dprital/sonic-buildimage that referenced this issue Jun 20, 2023
Update sonic-utilities submodule pointer to include the following:
* 0b629ba1 Revert [chassis][voq] Clear fabric counters queue/port (2789) ([sonic-net#2882](sonic-net/sonic-utilities#2882))
* 3ba8241a [db_migtrator] Add migration of FLEX_COUNTER_DELAY_STATUS during 1911->master upgrade + fast-reboot. Add UT. ([sonic-net#2839](sonic-net/sonic-utilities#2839))
* fceef2ed [chassis][voq] Clear fabric counters queue/port ([sonic-net#2789](sonic-net/sonic-utilities#2789))
* 659ba24b [syslog] Adjust runningconfiguration syslog command ([sonic-net#2843](sonic-net/sonic-utilities#2843))
* 46fba26f [db_migrator] add required protocol field in ROUTE_TABLE ([sonic-net#2766](sonic-net/sonic-utilities#2766))
* f186376e Fix issue: show interfaces transceiver eeprom -d should display same entry for CMIS cable ([sonic-net#2864](sonic-net/sonic-utilities#2864))
* de491798 fix precedence in portstat CLI ([sonic-net#2874](sonic-net/sonic-utilities#2874))

Signed-off-by: dprital <drorp@nvidia.com>
dprital added a commit to dprital/sonic-buildimage that referenced this issue Jun 21, 2023
Update sonic-utilities submodule pointer to include the following:
* 0b629ba1 Revert [chassis][voq] Clear fabric counters queue/port (2789) ([sonic-net#2882](sonic-net/sonic-utilities#2882))
* 3ba8241a [db_migtrator] Add migration of FLEX_COUNTER_DELAY_STATUS during 1911->master upgrade + fast-reboot. Add UT. ([sonic-net#2839](sonic-net/sonic-utilities#2839))
* fceef2ed [chassis][voq] Clear fabric counters queue/port ([sonic-net#2789](sonic-net/sonic-utilities#2789))
* 659ba24b [syslog] Adjust runningconfiguration syslog command ([sonic-net#2843](sonic-net/sonic-utilities#2843))
* 46fba26f [db_migrator] add required protocol field in ROUTE_TABLE ([sonic-net#2766](sonic-net/sonic-utilities#2766))
* f186376e Fix issue: show interfaces transceiver eeprom -d should display same entry for CMIS cable ([sonic-net#2864](sonic-net/sonic-utilities#2864))
* de491798 fix precedence in portstat CLI ([sonic-net#2874](sonic-net/sonic-utilities#2874))

Signed-off-by: dprital <drorp@nvidia.com>
dgsudharsan added a commit to dgsudharsan/sonic-buildimage that referenced this issue Jul 11, 2023
Update sonic-utilities submodule pointer to include the following:
* ff380e04 [hash]: Implement GH frontend ([sonic-net#2580](sonic-net/sonic-utilities#2580))
* 61bad064 [db_migrator] Set correct CURRENT_VERSION, extend UT ([sonic-net#2895](sonic-net/sonic-utilities#2895))
* 6b8ee47c [CLI][Show][BGP] Show BGP Change for no neighbor scenario ([sonic-net#2885](sonic-net/sonic-utilities#2885))
* 73d8d633 [doc] Update Command-Reference.md, change show bgp peer command to show bfd peer ([sonic-net#2750](sonic-net/sonic-utilities#2750))
* 7bc08c28 [db_migrator] Remove hardcoded config and migrate config from minigraph ([sonic-net#2887](sonic-net/sonic-utilities#2887))
* b1aa9426 [generate_dump]: Enhance show techsupport for Marvell platform ([sonic-net#2676](sonic-net/sonic-utilities#2676))
* 316b14c0 Add support for secure upgrade ([sonic-net#2698](sonic-net/sonic-utilities#2698))
* dc2945bc [dns] Implement config and show commands for static DNS. ([sonic-net#2737](sonic-net/sonic-utilities#2737))
* 8414a709 [chassis][multi asic] change acl_loader to use tcp socket for db communication ([sonic-net#2525](sonic-net/sonic-utilities#2525))
* 0b629ba1 Revert [chassis][voq] Clear fabric counters queue/port (2789) ([sonic-net#2882](sonic-net/sonic-utilities#2882))
* 3ba8241a [db_migtrator] Add migration of FLEX_COUNTER_DELAY_STATUS during 1911->master upgrade + fast-reboot. Add UT. ([sonic-net#2839](sonic-net/sonic-utilities#2839))
* fceef2ed [chassis][voq] Clear fabric counters queue/port ([sonic-net#2789](sonic-net/sonic-utilities#2789))

Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
liat-grozovik pushed a commit that referenced this issue Jul 11, 2023
Update sonic-utilities submodule pointer to include the following:
* ff380e04 [hash]: Implement GH frontend ([#2580](sonic-net/sonic-utilities#2580))
* 61bad064 [db_migrator] Set correct CURRENT_VERSION, extend UT ([#2895](sonic-net/sonic-utilities#2895))
* 6b8ee47c [CLI][Show][BGP] Show BGP Change for no neighbor scenario ([#2885](sonic-net/sonic-utilities#2885))
* 73d8d633 [doc] Update Command-Reference.md, change show bgp peer command to show bfd peer ([#2750](sonic-net/sonic-utilities#2750))
* 7bc08c28 [db_migrator] Remove hardcoded config and migrate config from minigraph ([#2887](sonic-net/sonic-utilities#2887))
* b1aa9426 [generate_dump]: Enhance show techsupport for Marvell platform ([#2676](sonic-net/sonic-utilities#2676))
* 316b14c0 Add support for secure upgrade ([#2698](sonic-net/sonic-utilities#2698))
* dc2945bc [dns] Implement config and show commands for static DNS. ([#2737](sonic-net/sonic-utilities#2737))
* 8414a709 [chassis][multi asic] change acl_loader to use tcp socket for db communication ([#2525](sonic-net/sonic-utilities#2525))
* 0b629ba1 Revert [chassis][voq] Clear fabric counters queue/port (2789) ([#2882](sonic-net/sonic-utilities#2882))
* 3ba8241a [db_migtrator] Add migration of FLEX_COUNTER_DELAY_STATUS during 1911->master upgrade + fast-reboot. Add UT. ([#2839](sonic-net/sonic-utilities#2839))
* fceef2ed [chassis][voq] Clear fabric counters queue/port ([#2789](sonic-net/sonic-utilities#2789))

Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
mssonicbld added a commit that referenced this issue Jul 11, 2023
…atically (#15456)

#### Why I did it
src/sonic-utilities
```
* ff380e04 - (HEAD -> master, origin/master, origin/HEAD) [hash]: Implement GH frontend (#2580) (13 hours ago) [Nazarii Hnydyn]
* 61bad064 - [db_migrator] Set correct CURRENT_VERSION, extend UT (#2895) (4 days ago) [Vadym Hlushko]
* 6b8ee47c - [CLI][Show][BGP] Show BGP Change for no neighbor scenario (#2885) (6 days ago) [Dev Ojha]
* 73d8d633 - [doc] Update Command-Reference.md, change "show bgp peer" command to "show bfd peer" (#2750) (11 days ago) [PinghaoQu]
* 7bc08c28 - [db_migrator] Remove hardcoded config and migrate config from minigraph (#2887) (11 days ago) [Vaibhav Hemant Dixit]
* b1aa9426 - [generate_dump]: Enhance show techsupport for Marvell platform (#2676) (11 days ago) [pavannaregundi]
* 316b14c0 - Add support for secure upgrade (#2698) (2 weeks ago) [ycoheNvidia]
* dc2945bc - [dns] Implement config and show commands for static DNS. (#2737) (2 weeks ago) [Oleksandr Ivantsiv]
* 8414a709 - [chassis][multi asic] change acl_loader to use tcp socket for db communication (#2525) (2 weeks ago) [Arvindsrinivasan Lakshmi Narasimhan]
* 0b629ba1 - Revert "[chassis][voq] Clear fabric counters queue/port (#2789)" (#2882) (3 weeks ago) [RoRonoa]
* 3ba8241a - [db_migtrator] Add migration of FLEX_COUNTER_DELAY_STATUS during 1911->master upgrade + fast-reboot. Add UT. (#2839) (4 weeks ago) [Vadym Hlushko]
* fceef2ed - [chassis][voq] Clear fabric counters queue/port (#2789) (4 weeks ago) [jfeng-arista]
```
#### How I did it
#### How to verify it
#### Description for the changelog
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this issue Sep 20, 2023
Update sonic-utilities submodule pointer to include the following:
* ff380e04 [hash]: Implement GH frontend ([sonic-net#2580](sonic-net/sonic-utilities#2580))
* 61bad064 [db_migrator] Set correct CURRENT_VERSION, extend UT ([sonic-net#2895](sonic-net/sonic-utilities#2895))
* 6b8ee47c [CLI][Show][BGP] Show BGP Change for no neighbor scenario ([sonic-net#2885](sonic-net/sonic-utilities#2885))
* 73d8d633 [doc] Update Command-Reference.md, change show bgp peer command to show bfd peer ([sonic-net#2750](sonic-net/sonic-utilities#2750))
* 7bc08c28 [db_migrator] Remove hardcoded config and migrate config from minigraph ([sonic-net#2887](sonic-net/sonic-utilities#2887))
* b1aa9426 [generate_dump]: Enhance show techsupport for Marvell platform ([sonic-net#2676](sonic-net/sonic-utilities#2676))
* 316b14c0 Add support for secure upgrade ([sonic-net#2698](sonic-net/sonic-utilities#2698))
* dc2945bc [dns] Implement config and show commands for static DNS. ([sonic-net#2737](sonic-net/sonic-utilities#2737))
* 8414a709 [chassis][multi asic] change acl_loader to use tcp socket for db communication ([sonic-net#2525](sonic-net/sonic-utilities#2525))
* 0b629ba1 Revert [chassis][voq] Clear fabric counters queue/port (2789) ([sonic-net#2882](sonic-net/sonic-utilities#2882))
* 3ba8241a [db_migtrator] Add migration of FLEX_COUNTER_DELAY_STATUS during 1911->master upgrade + fast-reboot. Add UT. ([sonic-net#2839](sonic-net/sonic-utilities#2839))
* fceef2ed [chassis][voq] Clear fabric counters queue/port ([sonic-net#2789](sonic-net/sonic-utilities#2789))

Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants