Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warm-boot][SAI] syncd crash with SAI_API_SWITCH:_brcm_sai_switch_assert ERROR #6655

Closed
vaibhavhd opened this issue Feb 3, 2021 · 3 comments

Comments

@vaibhavhd
Copy link
Contributor

Description

Warm-reboot fails with syncd crash and core is seen with the latest SAI version 4.3.0.10-3 with below log seen:
Feb 2 19:56:56.585774 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_switch_assert:558 ERROR: Assertion failed: (index[1] < __brcm_sai_index2_max[type].max2) at /__w/1/s/src/arch/brcm_sai_data_mgr.c:16869

Steps to reproduce the issue:

  1. Warm-reboot the DUT.
  2. Check core and syslog

Describe the results you received:

Feb  2 19:56:56.573915 str-s6100-acs-2 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  2 19:56:56.574218 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: VID: oid:0x18000000000cbd RID: oid:0x11800000001
Feb  2 19:56:56.574281 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: attr: SAI_BUFFER_POOL_ATTR_SIZE: 15982720
Feb  2 19:56:56.574812 str-s6100-acs-2 ERR swss#orchagent: :- set: set status: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  2 19:56:56.574812 str-s6100-acs-2 ERR swss#orchagent: :- processBufferPool: Failed to modify buffer pool, name:egress_lossless_pool, sai object:18000000000cbd, status:-196608
Feb  2 19:56:56.574812 str-s6100-acs-2 ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it
Feb  2 19:56:56.575698 str-s6100-acs-2 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  2 19:56:56.575943 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: VID: oid:0x18000000000cbe RID: oid:0x11800000002
Feb  2 19:56:56.576038 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: attr: SAI_BUFFER_POOL_ATTR_SIZE: 9243812
Feb  2 19:56:56.576449 str-s6100-acs-2 ERR swss#orchagent: :- set: set status: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  2 19:56:56.576449 str-s6100-acs-2 ERR swss#orchagent: :- processBufferPool: Failed to modify buffer pool, name:egress_lossy_pool, sai object:18000000000cbe, status:-196608
Feb  2 19:56:56.576449 str-s6100-acs-2 ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it
Feb  2 19:56:56.581450 str-s6100-acs-2 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  2 19:56:56.581977 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: VID: oid:0x19000000000cc0 RID: oid:0x1900000006
Feb  2 19:56:56.582052 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: attr: SAI_BUFFER_PROFILE_ATTR_SHARED_STATIC_TH: 15982720
Feb  2 19:56:56.582822 str-s6100-acs-2 ERR swss#orchagent: :- set: set status: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  2 19:56:56.583054 str-s6100-acs-2 ERR swss#orchagent: :- processBufferProfile: Failed to modify buffer profile, name:egress_lossless_profile, sai object:19000000000cc0, status:-196608
Feb  2 19:56:56.583230 str-s6100-acs-2 ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it
Feb  2 19:56:56.584146 str-s6100-acs-2 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  2 19:56:56.584427 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: VID: oid:0x18000000000cbf RID: oid:0x1800000001
Feb  2 19:56:56.584508 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: attr: SAI_BUFFER_POOL_ATTR_SIZE: 10875072
Feb  2 19:56:56.585160 str-s6100-acs-2 ERR swss#orchagent: :- set: set status: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  2 19:56:56.585242 str-s6100-acs-2 ERR swss#orchagent: :- processBufferPool: Failed to modify buffer pool, name:ingress_lossless_pool, sai object:18000000000cbf, status:-196608
Feb  2 19:56:56.585299 str-s6100-acs-2 ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it
Feb  2 19:56:56.585774 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_switch_assert:558 ERROR: Assertion failed: (index[1] < __brcm_sai_index2_max[type].max2) at /__w/1/s/src/arch/brcm_sai_data_mgr.c:16869
Feb  2 19:56:56.597319 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1347 Obtained 17 stack frames.
Feb  2 19:56:56.597389 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/lib/libsai.so.1(_brcm_sai_log_backtrace+0x21) [0x7fc30b6c4c61]
Feb  2 19:56:56.597502 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/lib/libsai.so.1(_brcm_sai_switch_assert+0x31) [0x7fc30b53cf91]
Feb  2 19:56:56.597560 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/lib/libsai.so.1(_brcm_sai_indexed_data_get+0x1c4d) [0x7fc30b7096ed]
Feb  2 19:56:56.597643 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/lib/libsai.so.1(_brcm_sai_switch_port_queue_get+0x68) [0x7fc30b542248]
Feb  2 19:56:56.597727 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/lib/libsai.so.1(driverEgressQueueFieldSet+0x78) [0x7fc30b80d608]
Feb  2 19:56:56.597810 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/lib/libsai.so.1(+0x3e5588c) [0x7fc30b60f88c]
Feb  2 19:56:56.597891 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/bin/syncd(+0x92f31) [0x55f3906f6f31]
Feb  2 19:56:56.597988 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/bin/syncd(+0x24c46) [0x55f390688c46]
Feb  2 19:56:56.598083 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/bin/syncd(+0x2a925) [0x55f39068e925]
Feb  2 19:56:56.598178 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/bin/syncd(+0x3247e) [0x55f39069647e]
Feb  2 19:56:56.598354 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/bin/syncd(+0x336d6) [0x55f3906976d6]
Feb  2 19:56:56.598414 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/bin/syncd(+0x339e4) [0x55f3906979e4]
Feb  2 19:56:56.598655 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/bin/syncd(+0x34c38) [0x55f390698c38]
Feb  2 19:56:56.598655 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/bin/syncd(+0x21c08) [0x55f390685c08]
Feb  2 19:56:56.598749 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/bin/syncd(+0x201ae) [0x55f3906841ae]
Feb  2 19:56:56.598837 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7fc30723209b]
Feb  2 19:56:56.598916 str-s6100-acs-2 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_log_backtrace:1350 /usr/bin/syncd(+0x2184a) [0x55f39068584a]

Describe the results you expected:
After warm-reboot crash and core should not be seen.

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**

SONiC Software Version: SONiC.HEAD.341-3f2a39d5
Distribution: Debian 10.7
Kernel: 4.19.0-9-2-amd64
Build commit: 3f2a39d5
Build date: Tue Feb  2 14:01:35 UTC 2021
Built by: johnar@jenkins-worker-22

Platform: x86_64-dell_s6100_c2538-r0
**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
@lguohan
Copy link
Collaborator

lguohan commented Feb 8, 2021

@vaibhavhd, do we have csp opened for this one?

@vaibhavhd
Copy link
Contributor Author

@vaibhavhd, do we have csp opened for this one?

There are two issues captured here:

  1. CS00011729363 handled the _brcm_sai_switch_assert failure and stacktrace seen above. This issue is not seen anymore in the latest public image.
  2. processBufferPool: Failed to modify buffer pool still is an open issue. I need to confirm if this is broadcom issue or sonic image issue.

@vaibhavhd
Copy link
Contributor Author

Closing this issue as there is a PR linked to this issue which fixes the _brcm_sai_switch_assert failure seen.
The second issue (failure in modifying buffer pool) will be separately tracked here #6726

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants