Skip to content

Commit

Permalink
Reload BCM SDK kmods on syncd start to handle syncd restart issues (#…
Browse files Browse the repository at this point in the history
…12804)

Why I did it
There is an issue on the Arista PikeZ platform (using T3.X2: BCM56274) while running SONiC. If the 'syncd' container in SONiC is restarted, the expected behaviour is that syncd will automatically restart/recover; however it does not and always fails at create_switch due to BCM SDK kmod DMA operation cancellation getting stuck.

Sep 16 22:19:44.855125 pkz208 ERR syncd#syncd: [none] SAI_API_SWITCH:platform_process_command:428 Platform command "init soc" failed, rc = -1. Sep 16 22:19:44.855206 pkz208 INFO syncd#supervisord: syncd CMIC_CMC0_PKTDMA_CH4_DESC_COUNT_REQ:0x33#015 Sep 16 22:19:44.855264 pkz208 CRIT syncd#syncd: [none] SAI_API_SWITCH:platformInit:1909 initialization command "init soc" failed, rc = -1 (Internal error). Sep 16 22:19:44.855403 pkz208 CRIT syncd#syncd: [none] SAI_API_SWITCH:sai_driver_init:642 Error initializing driver, rc = -1. ... Sep 16 22:19:44.855891 pkz208 CRIT syncd#syncd: [none] SAI_API_SWITCH:brcm_sai_create_switch:1173 initializing SDK failed with error Operation failed (0xfffffff5).

Reloading the BCM SDK kmods allows the switch init to continue properly.

How I did it
If BCM SDK kmods are loaded, unload and load them again on syncd docker start script.

How to verify it
Steps to reproduce:

In SONiC, run 'docker ps' to see current running containers; 'syncd' should be present.
Run 'docker stop syncd'
Wait ~1 minute.
Run 'docker ps' to see that syncd is missing.
Check logs to see messages similar to the above.

Signed-off-by: Michael Li <michael.li@broadcom.com>
  • Loading branch information
michaelli10 authored Nov 30, 2022
1 parent 0bd3be3 commit f725b83
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions files/scripts/syncd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,19 @@ function startplatform() {
debug "Firmware update procedure ended"
fi

if [[ x"$sonic_asic_platform" == x"broadcom" ]]; then
if [[ x"$WARM_BOOT" != x"true" ]]; then
is_bcm0=$(ls /sys/class/net | grep bcm0)
if [[ "$is_bcm0" == "bcm0" ]]; then
debug "stop SDK opennsl-modules ..."
/etc/init.d/opennsl-modules stop
debug "start SDK opennsl-modules ..."
/etc/init.d/opennsl-modules start
debug "started SDK opennsl-modules"
fi
fi
fi

if [[ x"$sonic_asic_platform" == x"barefoot" ]]; then
is_usb0=$(ls /sys/class/net | grep usb0)
if [[ "$is_usb0" == "usb0" ]]; then
Expand Down

0 comments on commit f725b83

Please sign in to comment.