Skip to content

Commit

Permalink
Increase Syncd operation timeout from 1 min to 6 min. (#828)
Browse files Browse the repository at this point in the history
What I did:
Increase SAI Sync operation timeout from 1 min to 6 min by reverting #472

Why I did:
Issue as mention sonic-net/sonic-buildimage#3832 is seen when there is link/port-channel flaps towards T2  which results in sequence of this operation on T1 topology:-

Link Going down
- Next Hop Member removal from Nexthop Group
- Creation of New Nexthop Group
- All the Routes towards T2 are given SET Operation with new Nexthop Group (6000+ routes)

Link Going up
- Creation of New Nexthop Group
- All the Routes towards T2 are given SET Operation with new Nexthop Group (6000+ routes)

Above Sequence with flaps create many SAI operation of CREATE/SET and if SAI is slow to process them if there in any intermediate Get operation (common GET case being CRM polling at default 5 mins) will result timeout if there is no response in 1 min which will cause next Get operation to fail because of attribute mismatch as it make Request and Reponse are out of sync. 

To fix this:-

- Have increase the timeout to 6 min.  Even 201811 image/branch is having same timeout value. 

- Hard to predict what is good timeout value but based on  below experiment and since 201811 is having 6 min using that value.

How I verify:

- Ran the below script for 800 iteration and orchagent did not got timeout error and no crash. Without fix OA used to crash consistently with most of the time less than 100 iteration of flaps.
- Based on sairedis.rec of the first CRM `GET` operation between `Create/Set` operation in these 800 iteration worst time for GET response was see 4+ min. 
```
#!/bin/bash

j=0
while [ True ]; do
        echo "iteration $j"
        config interface shutdown PortChannel0002
        sleep 2
        config interface startup PortChannel0002
        sleep 2
        j=$((j+1))
done
```


```
admin@str-xxxx-06:/var/log/swss$ sudo zgrep -A 1 "|g|.*AVAILABLE_IPV4_ROUTE" sairedis.rec*
sairedis.rec:2021-05-04.17:40:13.431643|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec:2021-05-04.17:46:18.132115|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52530
sairedis.rec:--
sairedis.rec:2021-05-04.17:46:21.128261|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec:2021-05-04.17:46:31.258628|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52530
sairedis.rec.1:2021-05-04.17:30:13.900135|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.1:2021-05-04.17:34:47.448092|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52531
sairedis.rec.1:--
sairedis.rec.1:2021-05-04.17:35:12.827247|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.1:2021-05-04.17:35:32.939798|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52533
sairedis.rec.2.gz:2021-05-04.17:20:13.103322|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.2.gz:2021-05-04.17:24:47.796720|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52531
sairedis.rec.2.gz:--
sairedis.rec.2.gz:2021-05-04.17:25:15.073981|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.2.gz:2021-05-04.17:26:08.525235|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52532
```


Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
  • Loading branch information
abdosi authored May 5, 2021
1 parent 17974ad commit e438b0d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion lib/inc/sairedis.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ const std::string objectTypeGetAvailabilityResponse("object_type_get_availabilit
/**
* @brief Default synchronous operation response timeout in milliseconds.
*/
#define SAI_REDIS_DEFAULT_SYNC_OPERATION_RESPONSE_TIMEOUT (60*1000)
#define SAI_REDIS_DEFAULT_SYNC_OPERATION_RESPONSE_TIMEOUT (6*60*1000)

typedef enum _sai_redis_notify_syncd_t
{
Expand Down

0 comments on commit e438b0d

Please sign in to comment.