-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI support for SmartSwitch PMON #3271
base: master
Are you sure you want to change the base?
Conversation
Can you please add UT for the new functions? |
addressed. 2. The DPU reboot-cause data is fetched directly fromn the chassis_state_db now
temporarily bypassing the check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy As discussed offline, please update shutdown CLI to follow pre-shutdown steps listed in sonic-net/SONiC#1699. shutdown or power-down also should follow same pre-shutdown steps for DPU.
@rameshraghupathy, @prgeor According to the Smart Switch PMON HLD the DPU reboot cause and the reboot history should be stored in the file on the host side. Hovewer, I don't see this implemented here
|
|
|
@@ -110,8 +110,9 @@ def shutdown_chassis_module(db, chassis_module_name): | |||
|
|||
if not chassis_module_name.startswith("SUPERVISOR") and \ | |||
not chassis_module_name.startswith("LINE-CARD") and \ | |||
not chassis_module_name.startswith("FABRIC-CARD"): | |||
ctx.fail("'module_name' has to begin with 'SUPERVISOR', 'LINE-CARD' or 'FABRIC-CARD'") | |||
not chassis_module_name.startswith("FABRIC-CARD") and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to perform additional validation to check if the chassis_module_name is actually present (or is an actual valid module name) or not, if user executes config chassis modules startup DPU5
on a system which does not have DPU5, this will cause crash in chassisd for the SmartSwitchConfigManagerTask
in chassisd preventing further startup or shutdown calls (even though output of the command would be Starting up chassis module DPU1
or Shutting down chassis module DPU1
the only operation which is performed is addition/removal from the CONFIG_DB )
changes such as 1. STATE_DB vs CHASSIS_STATE_DB and the key info
d754a5f
to
9a0225b
Compare
8a5801c
to
8f191d6
Compare
What I did
Enhanced the following CLIs to support SmartSwitch PMON as described in the PMON HLD documentation "https://github.com/sonic-net/SONiC/blob/d19d8933a43d0a31a4f3b2310f4336f289bca340/doc/smart-switch/pmon/smartswitch-pmon.md"
CLIs:
Added new module "DPUX" support for 1 and 2 below
1. "config chassis module startup DPUX" , where X could be 0, to the maximum number of DPUs-1 in the SmartSwitch chassis
2. "config chassis module shutdown DPUX"
Extended the following CLIs to support the new module "DPUX" and also proved a "all" option to display the "SWITCH" and all "DPUX" modules
1. "show reboot-cause" will remain the same and added "show reboot-cause all"
2. "show reboot-cause history" will remain the same and added "show reboot-cause history ", where module name could be DPUX, SWITCH and all.
Extended the following CLIs to support the new module "DPUX" and also proved a "all" option to display the "SWITCH" and all "DPUX" modules
1. "show system-health summary" will remain the same and added sub-command "show system-health summary ", where module name could be DPUX, SWITCH and all.
2. "show system-health monitor-list" will remain the same and added sub-command "show system-health monitor-list ", where module name could be DPUX, SWITCH and all.
3. "show system-health summary" will remain the same and added sub-command "show system-health summary ", where module name could be DPUX, SWITCH and all.etail" will remain the same and added sub-command "show system-health detail ", where module name could be DPUX, SWITCH and all.
4. Added a new sub command "show system-health dpu ", where module name could be DPUX, and all. This new subcommand will provide additional DPU state details as mentioned in the HLD
How I did it
How to verify it
Require files:
- This PR including reboot_cause.py, chassis_modules.py, system_health.py)
- The other PR including module_base.py, chassis_base.py, docker-pmon.supervisord.conf.j2, chassisd, mock_module_base.py, and the appropriate database_config.json
- Platform "platform-cisco-8000" supporting PMON (module.py, chassis.py, inventory.py, pmon_daemon_control.json, and the required grpc and DB changes)
Previous command output (if the output of a command-line utility has changed)
root@sonic:~# show reboot-cause
Unknown
root@sonic:~# show reboot-cause history
Name Cause Time User Comment
2023_06_19_11_00_24 Power Loss N/A N/A Unknown (First boot of SONiC version 202311.10869-dirty-2024044)
New command output (if the output of a command-line utility has changed)
root@sonic:~# show reboot-cause history all
Device Name Cause Time User Comment
SWITCH 2023_06_19_11_00_24 Power Loss N/A N/A Unknown (First boot of SONiC version 202311.10869-dirty-2024044)
root@sonic:~# show reboot-cause history SWITCH
Device Name Cause Time User Comment
SWITCH 2023_06_19_11_00_24 Power Loss N/A N/A Unknown (First boot of SONiC version 202311.10869-dirty-2024044)
root@sonic:~# show reboot-cause history DPU0
Device Name Cause Time User Comment