Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BGP graceful restart timeout err on peer if perform warm-restart on SONiC device #2958

Closed
wangxin opened this issue May 30, 2019 · 1 comment

Comments

@wangxin
Copy link
Contributor

wangxin commented May 30, 2019

Description

The warm-reboot testing failed on lastest master image: SONiC.HEAD.978-6d62249. On peer device, below BGP_GRACEFUL_RESTART_TIMEOUT error was observed while warm-reboot was performed on SONiC device.

Below is the log observed on peer device (Arista VM):

May 30 17:58:45 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer 10.0.0.56 (AS 65100) old state Established event Closed new state Idle
May 30 17:58:45 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer fc00::71 (AS 65100) old state Established event Closed new state Idle
May 30 18:00:45 ARISTA01T1 Rib: %BGP-5-BGP_GRACEFUL_RESTART_TIMEOUT: Deleting stale routes from peer 10.0.0.56 (AS 65100)
May 30 18:00:45 ARISTA01T1 Rib: %BGP-5-BGP_GRACEFUL_RESTART_TIMEOUT: Deleting stale routes from peer fc00::71 (AS 65100)
May 30 18:01:13 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer fc00::71 (AS 65100) old state OpenConfirm event RecvKeepAlive new state Established
May 30 18:01:13 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer 10.0.0.56 (AS 65100) old state OpenConfirm event RecvKeepAlive new state Established
May 30 18:01:16 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer fc00::71 (AS 65100) old state Established event Closed new state Idle
May 30 18:01:17 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer 10.0.0.56 (AS 65100) old state Established event Closed new state Idle
May 30 18:01:26 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer fc00::71 (AS 65100) old state OpenConfirm event RecvKeepAlive new state Established
May 30 18:01:28 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer 10.0.0.56 (AS 65100) old state OpenConfirm event RecvKeepAlive new state Established

Before quagga was replaced with frr, the graceful restart time was configured to 240 seconds in #2754. However, this configuration was not in templates/frr.conf.j2. And the frr graceful restart time is default to 120 seconds. Actually, the frr takes longer than that to do graceful restart. This caused the warm-reboot testing failed.

Steps to reproduce the issue:

  1. BGP is configured between SONiC device and peer device (Aristra VM for example)
  2. Perform warm-reboot on SONiC
  3. Monitor the ip routes on peer device.

Describe the results you received:
When the BGP_GRACEFUL_RESTART_TIMEOUT error was observed on Arista VM, the ip routes learnt from SONiC was removed.

Describe the results you expected:
The routes should not be removed during warm-reboot.

Additional information you deem important (e.g. issue happens only occasionally):

Output of show version:

SONiC Software Version: SONiC.HEAD.978-6d62249
Distribution: Debian 9.9
Kernel: 4.9.0-8-2-amd64
Build commit: 6d62249
Build date: Sun May 26 13:48:25 UTC 2019
Built by: johnar@jenkins-worker-4

Attach debug file sudo generate_dump:

```
(paste your output here)
```
@lguohan
Copy link
Collaborator

lguohan commented Jun 13, 2019

resolved in #2998

@wangxin wangxin closed this as completed Jul 5, 2019
mssonicbld added a commit that referenced this issue Nov 23, 2023
…lly (#17275)

#### Why I did it
src/sonic-swss
```
* 2ca3deb0 - (HEAD -> master, origin/master, origin/HEAD) [dash] fix DASH ACL Rule protocol use-after-free (#2958) (9 hours ago) [Yakiv Huryk]
* b8841ecb - [orchagent]: Extend the SRv6Orch to support the programming of the L3Adj (#2902) (24 hours ago) [Carmine Scarpitta]
```
#### How I did it
#### How to verify it
#### Description for the changelog
yxieca pushed a commit that referenced this issue Dec 13, 2023
…lly (#17455)

src/sonic-swss

* d839eec3 - (HEAD -> 202311, origin/202311) Add support for fabric monitor daemon (swss part). (#2920) (11 days ago) [jfeng-arista]
* 8dc0a856 - Add support for new Port SI parameters in PortsOA (#2929) (11 days ago) [Tomer Shalvi]
* 9458b855 - [hash]: Add ECMP/LAG hash algorithm to OA (#2953) (12 days ago) [Nazarii Hnydyn]
* dac3972d - [coppmgrd] Fix Copp processing logic by using Producer del instead of del from Table (13 days ago) [Vivek]
* f6a35e98 - [gcov]: Fix directory prefix issue for (#2969) (13 days ago) [Lawrence Lee]
* 14408ca3 - [Chassis][master][orchagent] : Added test case to verify WRED profile on system ports (#2954) (2 weeks ago) [vmittal-msft]
* 2ca3deb0 - [dash] fix DASH ACL Rule protocol use-after-free (#2958) (3 weeks ago) [Yakiv Huryk]
* b8841ecb - [orchagent]: Extend the SRv6Orch to support the programming of the L3Adj (#2902) (3 weeks ago) [Carmine Scarpitta]
* 194566a7 - Fix the Orchagent Qos error messages reported in Issue #16787 (#2947) (3 weeks ago) [saksarav-nokia]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants