Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[master] orchagent exits in a multi-asic linecard in Chassis #11064

Closed
judyjoseph opened this issue Jun 7, 2022 · 5 comments
Closed

[master] orchagent exits in a multi-asic linecard in Chassis #11064

judyjoseph opened this issue Jun 7, 2022 · 5 comments
Labels
Chassis 🤖 Modular chassis support Issue for 202205 P0 Priority of the issue

Comments

@judyjoseph
Copy link
Contributor

Description

On booting up of a multi-asic linecard in Chassis, the following error is seen and orchagent exits

Jun  3 19:20:24.075068 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.075068 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.075076 str2--lc1-1 ERR syncd0#syncd: :- sendApiResponse: api SAI_COMMON_API_BULK_CREATE failed in syncd mode: SAI_STATUS_FAILURE
Jun  3 19:20:24.075295 str2--lc1-1 ERR swss0#orchagent: :- flush_creating_entries: ObjectBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_FAILURE
Jun  3 19:20:24.075295 str2--lc1-1 ERR swss0#orchagent: :- addNextHopGroup: Failed to create next hop group 50000000011c0 member 0: 0
Jun  3 19:20:24.075793 str2--lc1-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group [10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0](mailto:10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0)
Jun  3 19:20:24.076073 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.076083 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.076083 str2--lc1-1 ERR syncd0#syncd: :- sendApiResponse: api SAI_COMMON_API_BULK_CREATE failed in syncd mode: SAI_STATUS_FAILURE
Jun  3 19:20:24.076230 str2--lc1-1 ERR swss0#orchagent: :- flush_creating_entries: ObjectBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_FAILURE
Jun  3 19:20:24.076230 str2--lc1-1 ERR swss0#orchagent: :- addNextHopGroup: Failed to create next hop group 50000000011c3 member 0: 0
Jun  3 19:20:24.076230 str2--lc1-1 ERR swss0#orchagent: :- flush_creating_entries: ObjectBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_FAILURE
Jun  3 19:20:24.076230 str2--lc1-1 ERR swss0#orchagent: :- addNextHopGroup: Failed to create next hop group 50000000011c3 member 0: 0
Jun  3 19:20:24.076707 str2--lc1-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group [10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0](mailto:10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0)
Jun  3 19:20:24.076971 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.076971 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.076984 str2--lc1-1 ERR syncd0#syncd: :- sendApiResponse: api SAI_COMMON_API_BULK_CREATE failed in syncd mode: SAI_STATUS_FAILURE
Jun  3 19:20:24.077132 str2--lc1-1 ERR swss0#orchagent: :- flush_creating_entries: ObjectBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_FAILURE
Jun  3 19:20:24.077132 str2--lc1-1 ERR swss0#orchagent: :- addNextHopGroup: Failed to create next hop group 50000000011c6 member 0: 0
Jun  3 19:20:24.077434 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group:3028 Unable to reserve ECMP fec block failed with error -4.
Jun  3 19:20:24.077434 str2--lc1-1 ERR syncd0#syncd: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: SAI_STATUS_INSUFFICIENT_RESOURCES
Jun  3 19:20:24.077487 str2--lc1-1 ERR syncd0#syncd: :- processQuadEvent: attr: SAI_NEXT_HOP_GROUP_ATTR_TYPE: SAI_NEXT_HOP_GROUP_TYPE_DYNAMIC_UNORDERED_ECMP
Jun  3 19:20:24.077576 str2--lc1-1 ERR swss0#orchagent: :- create: create status: SAI_STATUS_INSUFFICIENT_RESOURCES
Jun  3 19:20:24.077589 str2--lc1-1 ERR swss0#orchagent: :- addNextHopGroup: Failed to create next hop group [10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0](mailto:10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0), rv:-4
Jun  3 19:20:24.077589 str2--lc1-1 ERR swss0#orchagent: :- handleSaiCreateStatus: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_NEXT_HOP_GROUP, status: SAI_STATUS_INSUFFICIENT_RESOURCES
Jun  3 19:20:24.425794 str2--lc1-1 INFO swss1#supervisord 2022-06-03 19:20:24,425 INFO exited: orchagent (terminated by SIGABRT (core dumped); not expected)
Jun  3 19:20:24.463014 str2--lc1-1 INFO swss0#supervisord 2022-06-03 19:20:24,462 INFO exited: orchagent (terminated by SIGABRT (core dumped); not expected)
Jun  3 19:20:24.998927 str2--lc1-1 NOTICE coredump_gen_handler.py[14245]: Another instance of techsupport running, aborting this. stderr: Accquiring lock failed, PID 14436 is active
Jun  3 19:20:25.431147 str2--lc1-1 INFO swss1#supervisor-proc-exit-listener: Process 'orchagent' exited unexpectedly. Terminating supervisor 'swss'
Jun  3 19:20:25.431662 str2--lc1-1 INFO swss1#supervisord 2022-06-03 19:20:25,431 WARN received SIGTERM indicating exit request

Steps to reproduce the issue:

  1. Use the latest master image after May 28th build.
  2. Boot the multi-asic linecard.

Describe the results you received:

The above errors and OA exit.

Describe the results you expected:

There should not be an OA exit.
Not expecting these errors " SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2

Output of show version:

Build based on master/29043ff026a815e1fea338759ff05491c48e2f03

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@judyjoseph judyjoseph added the Chassis 🤖 Modular chassis support label Jun 7, 2022
@mlok-nokia
Copy link
Contributor

mlok-nokia commented Jun 9, 2022

This commit updated frr version which introduced a new attribute "weight" with value 2 to the ROUTE_TABLE entry. But Syncd/BCM expects the weight value is 1. Before the update, there is NO "weight" attribute in the ROUTE_TABLE entry

commit a477dbb
Author: Hasan Naqvi 56742004+hasan-brcm@users.noreply.github.com
Date: Tue May 24 14:47:09 2022 -0700
Frr 8.2 upgrade (#10691)

  "ROUTE_TABLE:192.170.96.0/25": {
    "expireat": 1654787366.084417,
    "ttl": -0.001,
    "type": "hash",
    "value": {
      "ifname": "PortChannel101,Ethernet6",
      "nexthop": "10.0.0.13,10.0.0.17",
      "weight": "2,2"
    }
  },
  "ROUTE_TABLE:192.170.96.128/25": {
    "expireat": 1654787366.082857,
    "ttl": -0.001,
    "type": "hash",
    "value": {
      "ifname": "PortChannel101,Ethernet6",
      "nexthop": "10.0.0.13,10.0.0.17",
      "weight": "2,2"
    }
  },

@rlhui rlhui added the P0 Priority of the issue label Jun 9, 2022
@prsunny
Copy link
Contributor

prsunny commented Jun 9, 2022

Tracking with Broadcom

@hasan-brcm
Copy link
Contributor

hasan-brcm commented Jun 9, 2022

I see the default weight in frr is 1:

root@sonic:/home/admin# vtysh -c"show ip route 10.10.10.10/32"
Routing entry for 10.10.10.10/32
  Known via "bgp", distance 20, metric 0, best
  Last update 00:03:14 ago
  * 30.0.0.2, via Ethernet4, weight 1
  * 30.1.0.2, via Ethernet12, weight 1

But app-db it gets reflected as 2:

root@sonic:/home/admin# sonic-db-cli APPL_DB "hgetall ROUTE_TABLE:10.10.10.10"
nexthop
30.0.0.2,30.1.0.2
ifname
Ethernet4,Ethernet12
weight
2,2
root@sonic:/home/admin#

The issue seems to be due to below code in PR1853
routesync.cpp L#1211

    uint8_t weight = rtnl_route_nh_get_weight(nexthop);
    if (weight)
    {
        result += to_string(weight + 1);

@prsunny
Copy link
Contributor

prsunny commented Jun 9, 2022

Fix - sonic-net/sonic-swss#2320

@lguohan
Copy link
Collaborator

lguohan commented Jun 10, 2022

fix in #11094

@lguohan lguohan closed this as completed Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis 🤖 Modular chassis support Issue for 202205 P0 Priority of the issue
Projects
None yet
Development

No branches or pull requests

6 participants