Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[system monitor]ERR healthd: system_servicejoin() argument must be str, bytes, or os.PathLike object, not 'NoneType' #18818

Closed
dgsudharsan opened this issue Apr 29, 2024 · 9 comments · Fixed by #19480
Labels
BRCM Issue for 202311 Triaged this issue has been triaged

Comments

@dgsudharsan
Copy link
Collaborator

Description

While performing config save followed by config reload sometimes we get the following log

ERR healthd: system_servicejoin() argument must be str, bytes, or os.PathLike object, not 'NoneType'

Steps to reproduce the issue:

  1. config save
  2. config reload -y -f

Describe the results you received:

Error in syslog

Describe the results you expected:

No error in syslog

Output of show version:

SONiC Software Version: SONiC.202311_RC.39-c50d88168_Internal
SONiC OS Version: 11
Distribution: Debian 11.9
Kernel: 5.10.0-23-2-amd64
Build commit: c78ff9d63
Build date: Fri Apr 26 05:01:25 UTC 2024
Built by: sw-r2d2-bot@r-build-sonic-ci03-241

Platform: x86_64-nvidia_sn5600_simx-r0
HwSKU: ACS-SN5600
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2315XZ04ZJ
Model Number: 920-9N42F-00RS-5NA
Hardware Revision: A1
Uptime: 03:36:42 up  1:24,  1 user,  load average: 1.91, 3.44, 2.24
Date: Mon 29 Apr 2024 03:36:42

Docker images:
REPOSITORY                                         TAG                               IMAGE ID       SIZE
docker-dhcp-relay                                  latest                            1a4c76eda529   324MB
docker-platform-monitor                            202311_RC.39-c50d88168_Internal   693addbace38   821MB
docker-platform-monitor                            latest                            693addbace38   821MB
docker-macsec                                      latest                            07000709328f   344MB
docker-orchagent                                   202311_RC.39-c50d88168_Internal   278069786798   353MB
docker-orchagent                                   latest                            278069786798   353MB
docker-eventd                                      202311_RC.39-c50d88168_Internal   af8d08dce832   315MB
docker-eventd                                      latest                            af8d08dce832   315MB
docker-snmp                                        202311_RC.39-c50d88168_Internal   6a51b8d8f606   354MB
docker-snmp                                        latest                            6a51b8d8f606   354MB
docker-nat                                         202311_RC.39-c50d88168_Internal   739b3809fe31   345MB
docker-nat                                         latest                            739b3809fe31   345MB
docker-sflow                                       202311_RC.39-c50d88168_Internal   164f4326030d   343MB
docker-sflow                                       latest                            164f4326030d   343MB
docker-fpm-frr                                     202311_RC.39-c50d88168_Internal   5bd54c2d63e0   373MB
docker-fpm-frr                                     latest                            5bd54c2d63e0   373MB
docker-syncd-mlnx                                  202311_RC.39-c50d88168_Internal   5f8046eaefce   833MB
docker-syncd-mlnx                                  latest                            5f8046eaefce   833MB
docker-teamd                                       202311_RC.39-c50d88168_Internal   f4416035b8f5   342MB
docker-teamd                                       latest                            f4416035b8f5   342MB
docker-sonic-gnmi                                  202311_RC.39-c50d88168_Internal   fe28d796529d   403MB
docker-sonic-gnmi                                  latest                            fe28d796529d   403MB
docker-mux                                         202311_RC.39-c50d88168_Internal   8feaaeda5785   364MB
docker-mux                                         latest                            8feaaeda5785   364MB
docker-lldp                                        202311_RC.39-c50d88168_Internal   ad04c3d79223   357MB
docker-lldp                                        latest                            ad04c3d79223   357MB
docker-database                                    202311_RC.39-c50d88168_Internal   fe6fa16c1643   315MB
docker-database                                    latest                            fe6fa16c1643   315MB
docker-router-advertiser                           202311_RC.39-c50d88168_Internal   2c52659a0d45   315MB
docker-router-advertiser                           latest                            2c52659a0d45   315MB
docker-sonic-mgmt-framework                        202311_RC.39-c50d88168_Internal   a34baf831465   417MB
docker-sonic-mgmt-framework                        latest                            a34baf831465   417MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@dgsudharsan
Copy link
Collaborator Author

@sg893052 @adyeung FYI

@sg893052
Copy link
Contributor

sg893052 commented May 8, 2024

@dgsudharsan @adyeung Found the issue, it is due to EOFError from the queue processing during queue shutdown.

The fix already exists in the master code -->
https://github.com/sonic-net/sonic-buildimage/blob/master/src/system-health/health_checker/sysmonitor.py#L485

Please backport it accordingly.

@neethajohn neethajohn added Triaged this issue has been triaged BRCM labels May 8, 2024
@liat-grozovik
Copy link
Collaborator

@sg893052 please share the PR in master so we can add the relevant label for the backport.

@sg893052
Copy link
Contributor

sg893052 commented May 9, 2024

@sg893052 please share the PR in master so we can add the relevant label for the backport.
#17459 is the PR in master

@dgsudharsan
Copy link
Collaborator Author

@sg893052 Even with the PR we see the issue.

@sg893052
Copy link
Contributor

@sg893052 Even with the PR we see the issue.

@dgsudharsan Please share the Techsupport and image details.

@dgsudharsan
Copy link
Collaborator Author

@sg893052 I found the issue. It is due to the underlying infrastructure where there is an access to device metadata table while the config reload is done. I added traceback and below is what is seen

May 29 00:02:42.517915 r-spider-05 ERR healthd: 
Traceback (most recent call last):#012  File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 490, in system_service#012    
self.check_unit_status(event)#012  
File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 419, in check_unit_status#012    
full_srv_list = self.get_all_service_list()#012  
File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 153, in get_all_service_list#012    
self.get_service_from_feature_table(dir_list)#012  
File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 210, in get_service_from_feature_table#012    
device_config.update(device_info.get_device_runtime_metadata())#012  
File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 618, in get_device_runtime_metadata#012    
port_metadata = {'ETHERNET_PORTS_PRESENT': True if get_path_to_port_config_file(hwsku=None, asic="0" if is_multi_npu() else None) else False}#012  
File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 415, in get_path_to_port_config_file#012    
(platform_path, hwsku_path) = get_paths_to_platform_and_hwsku_dirs()#012  
File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 381, in get_paths_to_platform_and_hwsku_dirs#012    
hwsku_path = os.path.join(platform_path, hwsku)#012  
File "/usr/lib/python3.9/posixpath.py", line 90, in join#012    
genericpath._check_arg_types('join', a, *p)#012  File "/usr/lib/python3.9/genericpath.py", line 152, in _check_arg_types#012    
raise TypeError(f'{funcname}() argument must be str, bytes, or '#012TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'

@dgsudharsan
Copy link
Collaborator Author

@abdosi There is a race condition if get_device_runtime_metadata if it is called during config reload. #11795
During config reload since config is written to config_db, the device_metadata table might not be available resulting in None and thus a traceback.
Can we cache the hwsku or try to handle this gracefully?

@bingwang-ms
Copy link
Contributor

bingwang-ms commented Jul 1, 2024

@abdosi Can you please check and comment on this issue?
@qiluo-msft FYI

abdosi added a commit to abdosi/sonic-buildimage that referenced this issue Jul 4, 2024
Basically handle any exception when calling API
get_device_runtime_metadata() and go for retry.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
@abdosi abdosi linked a pull request Jul 4, 2024 that will close this issue
@rlhui rlhui closed this as completed in 339a4e6 Jul 13, 2024
arun1355492 pushed a commit to arun1355492/sonic-buildimage that referenced this issue Jul 26, 2024
…9480)

*Fix: sonic-net#18818

Handle any exception in API get_service_from_feature_table() gracefully .

---------

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this issue Aug 1, 2024
…9480)

*Fix: sonic-net#18818

Handle any exception in API get_service_from_feature_table() gracefully .

---------

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Aug 2, 2024
…9480)

*Fix: sonic-net#18818

Handle any exception in API get_service_from_feature_table() gracefully .

---------

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Aug 2, 2024
…9480)

*Fix: sonic-net#18818

Handle any exception in API get_service_from_feature_table() gracefully .

---------

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
mssonicbld pushed a commit that referenced this issue Aug 3, 2024
*Fix: #18818

Handle any exception in API get_service_from_feature_table() gracefully .

---------

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
mssonicbld pushed a commit that referenced this issue Aug 24, 2024
*Fix: #18818

Handle any exception in API get_service_from_feature_table() gracefully .

---------

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BRCM Issue for 202311 Triaged this issue has been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants