Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warm-reboot]: add bgp eoiu support to speed up route reconciliation … #856

Merged
merged 9 commits into from
Aug 1, 2019

Conversation

jipanyang
Copy link
Contributor

@jipanyang jipanyang commented Apr 25, 2019

…in fpmsyncd

Signed-off-by: Jipan Yang jipan.yang@alibaba-inc.com

What I did

Three PRs for adding BGP eoiu support to speed up route reconciliation in fpmsyncd

sonic-buildimage: sonic-net/sonic-buildimage#2823
sonic-swss-common: sonic-net/sonic-swss-common#273
sonic-swss: #856

Why I did it

  1. Similar to restore_neigbors.py for neigborsyncd, start a bgp_eoiu_mark.py for bgp docker.

  2. The script check bgp neighbor state via cli interface periodically (every 1 second)
    It looks for explicit EOR and implicit EOR (keep alive after established) in the json output of show ip bgp neighbors A.B.C.D json

  3. Once the script has collected all needed EORs, it set a EOIU flag in stateDB.

  4. fpmsyncd could hold a few seconds (3 seconds) after getting the flag before starting routing reconciliation.

  5. For any reason the script failed to set EOIU flag in stateDB, the current warm_restart bgp_timer will kick in later.

This approach may have a few more seconds delay compared with the FRR embedded EOIU solution, but simple and less risk.

How I verified it

Before warm upgrade bgp docker:

root@ASW-C2-4-C11-A.NA61:/home/admin# vtysh -c 'show bgp sum'

IPv4 Unicast Summary:
BGP router identifier 1.1.1.1, local AS number 65021 vrf-id 0
BGP table version 1057
RIB entries 1145, using 170 KiB of memory
Peers 16, using 309 KiB of memory
Peer groups 2, using 128 bytes of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
10.0.0.2        4 4200001162      41      43        0    0    0 00:06:04          544
10.0.0.6        4 4200001162      41      43        0    0    0 00:06:05          544
10.0.0.10       4 4200001162      41      43        0    0    0 00:06:04          544
10.0.0.14       4 4200001162      41      43        0    0    0 00:06:04          544
10.0.0.18       4 4200001162      41      43        0    0    0 00:06:04          544
10.0.0.22       4 4200001162      41      43        0    0    0 00:06:04          544
10.0.0.26       4 4200001162      41      43        0    0    0 00:06:04          544
10.0.0.30       4 4200001162      41      43        0    0    0 00:06:04          544

Total number of neighbors 8

IPv6 Unicast Summary:
BGP router identifier 1.1.1.1, local AS number 65021 vrf-id 0
BGP table version 671
RIB entries 1029, using 153 KiB of memory
Peers 16, using 309 KiB of memory
Peer groups 2, using 128 bytes of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
2000::2         4 4200001162     544      43        0    0    0 00:06:04          505
2000:0:0:100::2 4 4200001162     544      43        0    0    0 00:06:04          505
2000:0:0:200::2 4 4200001162     544      43        0    0    0 00:06:04          505
2000:0:0:300::2 4 4200001162     544      43        0    0    0 00:06:04          505
2000:0:0:400::2 4 4200001162     544      43        0    0    0 00:06:04          505
2000:0:0:500::2 4 4200001162     544      43        0    0    0 00:06:04          505
2000:0:0:600::2 4 4200001162     544      43        0    0    0 00:06:04          505
2000:0:0:700::2 4 4200001162     544      43        0    0    0 00:06:04          505

Total number of neighbors 8

Warm upgrade bgp docker:

root@ASW-C2-4-C11-A.NA61:/home/admin# sonic_installer upgrade_docker bgp /tmp/docker-fpm-alibgp.gz --warm -y
Command: config warm_restart enable bgp

Stopping bgp ...
Command: docker exec -i bgp pkill -9 zebra

Command: docker exec -i bgp pkill -9 bgpd

Command: sleep 2

Command: docker exec -i database redis-cli  -n 6 -p 6379 hdel 'WARM_RESTART_TABLE|bgp' state
1

Stopped  bgp ...
Command: systemctl stop bgp

Command: docker rm bgp 
bgp

Command: docker load < /tmp/docker-fpm-alibgp.gz
Loaded image: docker-fpm-alibgp:latest

Command: docker tag docker-fpm-alibgp:latest docker-fpm-alibgp:AliNOS-rel-v2.0.4-dirty-

Command: systemctl restart bgp

[==========================]
Command: config warm_restart disable bgp

Done

Check bgp sum immediately:

root@ASW-C2-4-C11-A.NA61:/home/admin# vtysh -c 'show bgp sum'

IPv4 Unicast Summary:
BGP router identifier 1.1.1.1, local AS number 65021 vrf-id 0
BGP table version 1057
RIB entries 1145, using 170 KiB of memory
Peers 16, using 309 KiB of memory
Peer groups 2, using 128 bytes of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
10.0.0.2        4 4200001162       7       7        0    0    0 00:00:23          544
10.0.0.6        4 4200001162       7       7        0    0    0 00:00:23          544
10.0.0.10       4 4200001162       7       7        0    0    0 00:00:23          544
10.0.0.14       4 4200001162       7       7        0    0    0 00:00:23          544
10.0.0.18       4 4200001162       7       7        0    0    0 00:00:23          544
10.0.0.22       4 4200001162       7       7        0    0    0 00:00:23          544
10.0.0.26       4 4200001162       7       7        0    0    0 00:00:23          544
10.0.0.30       4 4200001162       7       7        0    0    0 00:00:23          544

Total number of neighbors 8

IPv6 Unicast Summary:
BGP router identifier 1.1.1.1, local AS number 65021 vrf-id 0
BGP table version 766
RIB entries 1029, using 153 KiB of memory
Peers 16, using 309 KiB of memory
Peer groups 2, using 128 bytes of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
2000::2         4 4200001162     510       7        0    0    0 00:00:23          505
2000:0:0:100::2 4 4200001162     510       7        0    0    0 00:00:23          505
2000:0:0:200::2 4 4200001162     510       7        0    0    0 00:00:22          505
2000:0:0:300::2 4 4200001162     510       7        0    0    0 00:00:23          505
2000:0:0:400::2 4 4200001162     510       7        0    0    0 00:00:23          505
2000:0:0:500::2 4 4200001162     510       7        0    0    0 00:00:23          505
2000:0:0:600::2 4 4200001162     510       7        0    0    0 00:00:23          505
2000:0:0:700::2 4 4200001162     510       7        0    0    0 00:00:23          505

Total number of neighbors 8

Details if related

@jipanyang
Copy link
Contributor Author

jipanyang commented Apr 25, 2019

syslog:

  1. Warm-Restart timer started upon bgp docker start:

Apr 25 01:39:22.424804 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:102: Warm-Restart timer started.
Apr 25 01:39:22.424804 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:109: Warm-Restart eoiuCheckTimer timer started.

  1. bgp_eoiu_marker.py collected eoiu marker for both IPv4 and IPv6

Apr 25 01:40:11.212250 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP ipv4 eoiu reached
Apr 25 01:40:11.295457 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 2000::2
Apr 25 01:40:11.395633 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 2000:0:0:200::2
Apr 25 01:40:11.477900 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 2000:0:0:600::2
Apr 25 01:40:11.477900 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP ipv6 eoiu reached
Apr 25 01:40:11.496043 ASW-C2-4-C11-A.na61 INFO supervisord: bgp_eoiu_marker bgp_eoiu_marker service is started
Apr 25 01:40:11.508407 ASW-C2-4-C11-A.na61 INFO supervisord: bgp_eoiu_marker bgp_eoiu_marker service is done

  1. eoiuCheckTimer noticed the flag, then start a 3 seconds eoiuHoldTimer,

Apr 25 01:40:12.450718 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: eoiuFlagsSet:42: Warm-Restart bgp eoiu reached for both ipv4 and ipv6
Apr 25 01:40:12.450843 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:154: Warm-Restart started EOIU hold timer which is to expire in 3 seconds.

  1. EOIU hold timer expired, start reconciliation.

Apr 25 01:40:15.450982 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:133: Warm-Restart EOIU hold timer expired.
Apr 25 01:40:15.451248 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: reconcile:155: Warm-Restart: Initiating reconciliation process for bgp application.
Apr 25 01:40:15.467004 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: setWarmStartState:206: bgp warm start state changed to reconciled
Apr 25 01:40:15.467206 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: reconcile:259: Warm-Restart: Concluded reconciliation process for bgp application.
Apr 25 01:40:15.467435 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:138: Warm-Restart reconciliation processed.

  1. After 127 seconds, Warm-Restart timer expired. Already in reconciled state, nothing to do, just call removeSelectable to stop listening on it.

Apr 25 01:42:22.425170 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:129: Warm-Restart timer expired.

More detailed syslog:

root@ASW-C2-4-C11-A.NA61:/home/admin# tail -F /var/log/syslog | grep "bgp\|eoiu\|fpmsycd\|Warm-Restart"
Apr 25 01:38:58.423828 ASW-C2-4-C11-A.NA61 INFO bgp.sh[723]: 2019-04-25 09:38:58,420 INFO exited: zebra (terminated by SIGKILL; not expected)
Apr 25 01:38:58.901562 ASW-C2-4-C11-A.NA61 INFO bgp.sh[723]: 2019-04-25 09:38:58,894 INFO exited: bgpd (terminated by SIGKILL; not expected)
Apr 25 01:39:01.152152 ASW-C2-4-C11-A.na61 INFO supervisord 2019-04-25 09:38:58,894 INFO exited: bgpd (terminated by SIGKILL; not expected)
Apr 25 01:39:01.669903 ASW-C2-4-C11-A.NA61 INFO bgp.sh[723]: 2019-04-25 09:39:01,664 WARN received SIGTERM indicating exit request
Apr 25 01:39:01.670420 ASW-C2-4-C11-A.NA61 INFO bgp.sh[723]: 2019-04-25 09:39:01,664 INFO waiting for bgp-watchdog, fpmsyncd, bgpcfgd, rsyslogd to die
Apr 25 01:39:01.674103 ASW-C2-4-C11-A.NA61 INFO bgp.sh[723]: 2019-04-25 09:39:01,672 INFO stopped: fpmsyncd (terminated by SIGTERM)
Apr 25 01:39:01.690659 ASW-C2-4-C11-A.NA61 INFO bgp.sh[723]: 2019-04-25 09:39:01,689 INFO stopped: rsyslogd (exit status 0)
Apr 25 01:39:01.698041 ASW-C2-4-C11-A.NA61 INFO bgp.sh[723]: 2019-04-25 09:39:01,696 INFO stopped: bgpcfgd (terminated by SIGTERM)
Apr 25 01:39:01.701835 ASW-C2-4-C11-A.NA61 INFO bgp.sh[723]: 2019-04-25 09:39:01,700 INFO stopped: bgp-watchdog (terminated by SIGTERM)
Apr 25 01:39:02.128256 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8084]: bgp
Apr 25 01:39:11.843560 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8329]: Creating new bgp container with HWSKU bfnmodel with HOSTNAME ASW-C2-4-C11-A.NA61
Apr 25 01:39:12.110831 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8329]: 86459fa3ce2c49b1b08ef2d9e701c2a2bfa0245c958a210893fac60941425cb8
Apr 25 01:39:12.779478 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8329]: bgp
Apr 25 01:39:13.811456 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:13,809 INFO spawned: 'bgp-watchdog' with pid 8
Apr 25 01:39:13.819982 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:13,814 INFO spawned: 'start.sh' with pid 9
Apr 25 01:39:14.789678 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:14,788 INFO spawned: 'bgp_eoiu_marker' with pid 26
Apr 25 01:39:14.791503 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:14,789 INFO success: bgp_eoiu_marker entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Apr 25 01:39:14.907579 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:14,906 INFO success: bgp-watchdog entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Apr 25 01:39:14.908083 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:14,906 INFO success: start.sh entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Apr 25 01:39:14.914561 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:14,913 INFO spawned: 'bgpcfgd' with pid 30
Apr 25 01:39:15.930928 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:15,929 INFO success: bgpcfgd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Apr 25 01:39:16.043568 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:16,035 INFO spawned: 'rsyslogd' with pid 36
Apr 25 01:39:16.050373 ASW-C2-4-C11-A.na61 INFO supervisord 2019-04-25 09:39:13,809 INFO spawned: 'bgp-watchdog' with pid 8
Apr 25 01:39:16.050373 ASW-C2-4-C11-A.na61 INFO supervisord 2019-04-25 09:39:14,788 INFO spawned: 'bgp_eoiu_marker' with pid 26
Apr 25 01:39:16.050373 ASW-C2-4-C11-A.na61 INFO supervisord 2019-04-25 09:39:14,789 INFO success: bgp_eoiu_marker entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Apr 25 01:39:16.050373 ASW-C2-4-C11-A.na61 INFO supervisord 2019-04-25 09:39:14,906 INFO success: bgp-watchdog entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Apr 25 01:39:16.050373 ASW-C2-4-C11-A.na61 INFO supervisord 2019-04-25 09:39:14,913 INFO spawned: 'bgpcfgd' with pid 30
Apr 25 01:39:16.050373 ASW-C2-4-C11-A.na61 INFO supervisord 2019-04-25 09:39:15,929 INFO success: bgpcfgd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Apr 25 01:39:17.041616 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:17,039 INFO success: rsyslogd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Apr 25 01:39:17.180265 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:17,168 INFO spawned: 'zebra' with pid 44
Apr 25 01:39:18.214486 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:18,213 INFO success: zebra entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Apr 25 01:39:18.331923 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:18,330 INFO spawned: 'bgpd' with pid 46
Apr 25 01:39:19.337139 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:19,336 INFO success: bgpd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Apr 25 01:39:19.344727 ASW-C2-4-C11-A.na61 INFO supervisord: start.sh bgpd: started
Apr 25 01:39:19.463468 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:19,460 INFO spawned: 'fpmsyncd' with pid 50
Apr 25 01:39:20.133286 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP ipv4 neighbors: [u'10.0.0.14', u'10.0.0.30', u'10.0.0.10', u'10.0.0.18', u'10.0.0.26', u'10.0.0.6', u'10.0.0.22', u'10.0.0.2']
Apr 25 01:39:20.133286 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP ipv4 neighbors: [u'2000:0:0:500::2', u'2000::2', u'2000:0:0:200::2', u'2000:0:0:600::2', u'2000:0:0:300::2', u'2000:0:0:700::2', u'2000:0:0:400::2', u'2000:0:0:100::2']
Apr 25 01:39:20.488442 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:20,487 INFO success: fpmsyncd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Apr 25 01:39:20.520848 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:39:20,519 INFO exited: start.sh (exit status 0; expected)
Apr 25 01:39:22.277648 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: checkWarmStart:145: bgp doing warm start, restore count 6
Apr 25 01:39:22.277968 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: checkAndStart:61: Initializing Warm-Restart cycle for bgp application.
Apr 25 01:39:22.278817 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: setWarmStartState:206: bgp warm start state changed to initialized
Apr 25 01:39:22.279498 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: getWarmStartTimer:172: Getting warmStartTimer for docker: bgp, app: bgp, value: 180
Apr 25 01:39:22.279601 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: runRestoration:105: Warm-Restart: Initiating AppDB restoration process for bgp application.
Apr 25 01:39:22.424214 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: runRestoration:126: Warm-Restart: Received 1071 records from AppDB for bgp application.
Apr 25 01:39:22.424478 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: setWarmStartState:206: bgp warm start state changed to restored
Apr 25 01:39:22.424586 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: runRestoration:131: Warm-Restart: Completed AppDB restoration process for bgp application.
Apr 25 01:39:22.424804 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:102: Warm-Restart timer started.
Apr 25 01:39:22.424804 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:109: Warm-Restart eoiuCheckTimer timer started.
Apr 25 01:39:26.073640 ASW-C2-4-C11-A.na61 INFO supervisord 2019-04-25 09:39:18,330 INFO spawned: 'bgpd' with pid 46
Apr 25 01:39:26.073640 ASW-C2-4-C11-A.na61 INFO supervisord 2019-04-25 09:39:19,336 INFO success: bgpd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Apr 25 01:39:59.013319 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 10.0.0.26 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.025439 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 2000:0:0:500::2 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.028136 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 2000:0:0:300::2 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.038212 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 10.0.0.30 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.056967 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 2000:0:0:700::2 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.059822 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 10.0.0.10 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.067383 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 2000::2 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.072219 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 10.0.0.22 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.074528 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 2000:0:0:100::2 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.083023 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 10.0.0.18 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.086985 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 2000:0:0:600::2 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.099128 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 10.0.0.14 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.103498 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 10.0.0.6 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.129856 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 2000:0:0:400::2 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.134275 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 10.0.0.2 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.150643 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 10.0.0.26(Unknown) in vrf Default Up
Apr 25 01:39:59.171955 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 2000:0:0:500::2(Unknown) in vrf Default Up
Apr 25 01:39:59.171955 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 2000:0:0:300::2(Unknown) in vrf Default Up
Apr 25 01:39:59.200489 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 10.0.0.30(Unknown) in vrf Default Up
Apr 25 01:39:59.218510 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 2000:0:0:700::2(Unknown) in vrf Default Up
Apr 25 01:39:59.230039 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 10.0.0.10(Unknown) in vrf Default Up
Apr 25 01:39:59.243564 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 10.0.0.22(Unknown) in vrf Default Up
Apr 25 01:39:59.253627 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 2000::2(Unknown) in vrf Default Up
Apr 25 01:39:59.264013 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 2000:0:0:100::2(Unknown) in vrf Default Up
Apr 25 01:39:59.271617 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 10.0.0.18(Unknown) in vrf Default Up
Apr 25 01:39:59.284028 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 2000:0:0:600::2(Unknown) in vrf Default Up
Apr 25 01:39:59.286210 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 10.0.0.14(Unknown) in vrf Default Up
Apr 25 01:39:59.297272 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 10.0.0.6(Unknown) in vrf Default Up
Apr 25 01:39:59.313614 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: 2000:0:0:200::2 unrecognized capability code: 6 - ignored
Apr 25 01:39:59.328082 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 2000:0:0:400::2(Unknown) in vrf Default Up
Apr 25 01:39:59.353575 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 10.0.0.2(Unknown) in vrf Default Up
Apr 25 01:40:00.039497 ASW-C2-4-C11-A.na61 WARNING bgpd[46]: %ADJCHANGE: neighbor 2000:0:0:200::2(Unknown) in vrf Default Up
Apr 25 01:40:08.633208 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 10.0.0.26
Apr 25 01:40:08.713851 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 10.0.0.30
Apr 25 01:40:08.976463 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 2000:0:0:500::2
Apr 25 01:40:09.359790 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 2000:0:0:300::2
Apr 25 01:40:09.485372 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 2000:0:0:700::2
Apr 25 01:40:09.575723 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 2000:0:0:400::2
Apr 25 01:40:09.673616 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 2000:0:0:100::2
Apr 25 01:40:10.761177 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 10.0.0.14
Apr 25 01:40:10.873291 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 10.0.0.6
Apr 25 01:40:10.959993 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 10.0.0.10
Apr 25 01:40:11.040517 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 10.0.0.18
Apr 25 01:40:11.128651 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 10.0.0.22
Apr 25 01:40:11.211780 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 10.0.0.2
Apr 25 01:40:11.212250 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP ipv4 eoiu reached
Apr 25 01:40:11.295457 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 2000::2
Apr 25 01:40:11.395633 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 2000:0:0:200::2
Apr 25 01:40:11.477900 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP implicit eor received for neighbors: 2000:0:0:600::2
Apr 25 01:40:11.477900 ASW-C2-4-C11-A.na61 INFO bgp_eoiu_marker.py: BGP ipv6 eoiu reached
Apr 25 01:40:11.496043 ASW-C2-4-C11-A.na61 INFO supervisord: bgp_eoiu_marker bgp_eoiu_marker service is started
Apr 25 01:40:11.508407 ASW-C2-4-C11-A.na61 INFO supervisord: bgp_eoiu_marker bgp_eoiu_marker service is done
Apr 25 01:40:11.513285 ASW-C2-4-C11-A.NA61 INFO bgp.sh[8427]: 2019-04-25 09:40:11,511 INFO exited: bgp_eoiu_marker (exit status 0; expected)
Apr 25 01:40:12.450718 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: eoiuFlagsSet:42: Warm-Restart bgp eoiu reached for both ipv4 and ipv6
Apr 25 01:40:12.450843 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:154: Warm-Restart started EOIU hold timer which is to expire in 3 seconds.
Apr 25 01:40:15.450982 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:133: Warm-Restart EOIU hold timer expired.
Apr 25 01:40:15.451248 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: reconcile:155: Warm-Restart: Initiating reconciliation process for bgp application.
Apr 25 01:40:15.467004 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: setWarmStartState:206: bgp warm start state changed to reconciled
Apr 25 01:40:15.467206 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: reconcile:259: Warm-Restart: Concluded reconciliation process for bgp application.
Apr 25 01:40:15.467435 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:138: Warm-Restart reconciliation processed.
Apr 25 01:40:16.112229 ASW-C2-4-C11-A.na61 INFO supervisord 2019-04-25 09:40:11,511 INFO exited: bgp_eoiu_marker (exit status 0; expected)
Apr 25 01:42:22.425170 ASW-C2-4-C11-A.na61 NOTICE fpmsyncd: main:129: Warm-Restart timer expired.

Explicit EoR case:

Apr 25 03:51:27.420279 sonic INFO bgp#bgp_eoiu_marker.py: Cleaned ipv4 and ipv6 eoiu marker flags
Apr 25 03:51:27.421528 sonic NOTICE bgp#bgp_eoiu_marker.py: :- checkWarmStart: bgp doing warm start, restore count 0
Apr 25 03:51:27.487080 sonic INFO bgp#bgp_eoiu_marker.py: BGP ipv4 neighbors: [u'10.0.0.59', u'10.0.0.61', u'10.0.0.63', u'10.0.0.57']
Apr 25 03:51:27.487696 sonic INFO bgp#bgp_eoiu_marker.py: BGP ipv4 neighbors: [u'fc00::7e', u'fc00::76', u'fc00::7a', u'fc00::72']
Apr 25 03:51:27.553538 sonic INFO bgp#bgp_eoiu_marker.py: BGP eor received for neighbors: 10.0.0.59
Apr 25 03:51:27.613157 sonic INFO bgp#bgp_eoiu_marker.py: BGP eor received for neighbors: 10.0.0.61
Apr 25 03:51:27.676095 sonic INFO bgp#bgp_eoiu_marker.py: BGP eor received for neighbors: 10.0.0.63
Apr 25 03:51:27.736519 sonic INFO bgp#bgp_eoiu_marker.py: BGP eor received for neighbors: 10.0.0.57
Apr 25 03:51:27.736564 sonic INFO bgp#bgp_eoiu_marker.py: BGP ipv4 eoiu reached
Apr 25 03:51:27.796805 sonic INFO bgp#bgp_eoiu_marker.py: BGP eor received for neighbors: fc00::7e
Apr 25 03:51:27.858103 sonic INFO bgp#bgp_eoiu_marker.py: BGP eor received for neighbors: fc00::76
Apr 25 03:51:27.920507 sonic INFO bgp#bgp_eoiu_marker.py: BGP eor received for neighbors: fc00::7a
Apr 25 03:51:27.983820 sonic INFO bgp#bgp_eoiu_marker.py: BGP eor received for neighbors: fc00::72
Apr 25 03:51:27.983870 sonic INFO bgp#bgp_eoiu_marker.py: BGP ipv6 eoiu reached

fpmsyncd/fpmsyncd.cpp Show resolved Hide resolved
@pavel-shirshov
Copy link
Contributor

@jipanyang I have two questions:

  1. What if the neighbor announced it's not going to send you EOR?
  2. Can we use either grpc or zmq which are supported by frr? Screen scraping is not so robust

fpmsyncd/fpmsyncd.cpp Show resolved Hide resolved
@jipanyang
Copy link
Contributor Author

@jipanyang I have two questions:

  1. What if the neighbor announced it's not going to send you EOR?
  2. Can we use either grpc or zmq which are supported by frr? Screen scraping is not so robust

@pavel-shirshov

  1. In this case we will hit the implicit eor scenario.
All the configured peers, except the shutdown peers, have sent explicit EOR (End-Of-RIB) or an implicit-EOR. The first keep-alive after BGP has reached Established is considered an implicit-EOR. 
  1. I'm not familiar with grpc/zmq support in FRR. Do you have some example usage of them?
    The extensive test we have done showed consistent result as to the vtysh json output for BGP eor state.

@lguohan
Copy link
Contributor

lguohan commented Jun 19, 2019

retest this please

except Exception:
syslog.syslog(syslog.LOG_ERR, "*ERROR* get_all_peers Exception: %s" % (traceback.format_exc()))
time.sleep(5)
self.get_all_peers()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we have some persistent exception?
We will loop here forever

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the exception here is to deal with the case that bgp startup is slow and 'show bgp summary json' failed.

A retry limit of (120/5) = 24 may be enforced here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should enforce it here.
Can you please implement it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will add more commit to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pkill -9 bgpd", then try the test again: bgp_eoiu_marker.py: Failed to get bgp neighbor info in 120 seconds, exiting

Jun 24 08:00:32.389598 vlab-01 INFO bgp#bgp_eoiu_marker.py: Cleaned ipv4 and ipv6 eoiu marker flags
Jun 24 08:00:32.390907 vlab-01 NOTICE bgp#bgp_eoiu_marker.py: :- checkWarmStart: bgp doing warm start, restore count 0
Jun 24 08:00:32.567670 vlab-01 ERR bgp#bgp_eoiu_marker.py: *ERROR* get_all_peers Exception: Traceback (most recent call last):#012  File "/usr/bin/bgp_eoiu_marker.py", line 53, in get_all_peers#012    peer_info = json.loads(output)#012  File "/usr/lib/python2.7/json/__init__.py", line 339, in loads#012    return _default_decoder.decode(s)#012  File "/usr/lib/python2.7/json/decoder.py", line 364, in decode#012    obj, end = self.raw_decode(s, idx=_w(s, 0).end())#012  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode#012    raise ValueError("No JSON object could be decoded")#012ValueError: No JSON object could be decoded
Jun 24 08:00:36.795726 vlab-01 NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.57,10.0.0.59,10.0.0.61,10.0.0.63

Jun 24 08:00:43.357393 vlab-01 NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group fc00::72,fc00::76,fc00::7a,fc00::7e
Jun 24 08:02:22.918385 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY
Jun 24 08:02:22.920196 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY read only not implemented on oid:0x2100000000
Jun 24 08:02:22.920752 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.925638 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.925638 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 49 , rv:-15
Jun 24 08:02:22.928650 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_SWITCH_ATTR_AVAILABLE_IPV6_ROUTE_ENTRY
Jun 24 08:02:22.929727 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_SWITCH_ATTR_AVAILABLE_IPV6_ROUTE_ENTRY read only not implemented on oid:0x2100000000
Jun 24 08:02:22.931380 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.936154 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.936342 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 50 , rv:-15
Jun 24 08:02:22.941056 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_SWITCH_ATTR_AVAILABLE_IPV4_NEXTHOP_ENTRY
Jun 24 08:02:22.941965 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_SWITCH_ATTR_AVAILABLE_IPV4_NEXTHOP_ENTRY read only not implemented on oid:0x2100000000
Jun 24 08:02:22.942365 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.945633 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.945633 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 51 , rv:-15
Jun 24 08:02:22.953105 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_SWITCH_ATTR_AVAILABLE_IPV6_NEXTHOP_ENTRY
Jun 24 08:02:22.953949 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_SWITCH_ATTR_AVAILABLE_IPV6_NEXTHOP_ENTRY read only not implemented on oid:0x2100000000
Jun 24 08:02:22.954337 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.955411 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.956184 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 52 , rv:-15
Jun 24 08:02:22.959512 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_SWITCH_ATTR_AVAILABLE_IPV4_NEIGHBOR_ENTRY
Jun 24 08:02:22.960532 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_SWITCH_ATTR_AVAILABLE_IPV4_NEIGHBOR_ENTRY read only not implemented on oid:0x2100000000
Jun 24 08:02:22.960924 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.964854 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.965683 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 53 , rv:-15
Jun 24 08:02:22.968983 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_SWITCH_ATTR_AVAILABLE_IPV6_NEIGHBOR_ENTRY
Jun 24 08:02:22.968983 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_SWITCH_ATTR_AVAILABLE_IPV6_NEIGHBOR_ENTRY read only not implemented on oid:0x2100000000
Jun 24 08:02:22.971467 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.972047 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 54 , rv:-15
Jun 24 08:02:22.973522 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.974244 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 56 , rv:-15
Jun 24 08:02:22.975887 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.976324 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 55 , rv:-15
Jun 24 08:02:22.981321 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.981938 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 60 , rv:-15
Jun 24 08:02:22.986395 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.988600 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 61 , rv:-15
Jun 24 08:02:22.994613 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.994613 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get ACL table attribute 4412 , rv:-15
Jun 24 08:02:22.995269 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.995269 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_MEMBER_ENTRY
Jun 24 08:02:22.995269 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_MEMBER_ENTRY read only not implemented on oid:0x2100000000
Jun 24 08:02:22.995269 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.995269 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_ENTRY
Jun 24 08:02:22.995269 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_ENTRY read only not implemented on oid:0x2100000000
Jun 24 08:02:22.995269 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.995269 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_SWITCH_ATTR_AVAILABLE_ACL_TABLE
Jun 24 08:02:22.995269 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_SWITCH_ATTR_AVAILABLE_ACL_TABLE read only not implemented on oid:0x2100000000
Jun 24 08:02:22.995269 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.995269 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_SWITCH_ATTR_AVAILABLE_ACL_TABLE_GROUP
Jun 24 08:02:22.995269 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_SWITCH_ATTR_AVAILABLE_ACL_TABLE_GROUP read only not implemented on oid:0x2100000000
Jun 24 08:02:22.995269 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:22.995269 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_ACL_TABLE_ATTR_AVAILABLE_ACL_ENTRY
Jun 24 08:02:22.995269 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_ACL_TABLE_ATTR_AVAILABLE_ACL_ENTRY read only not implemented on oid:0x700000000
Jun 24 08:02:23.000113 vlab-01 ERR swss#orchagent: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:23.000113 vlab-01 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 57 , rv:-15
Jun 24 08:02:23.000113 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:23.000113 vlab-01 WARNING syncd#syncd: :- refresh_read_only_BCM56850: need to recalculate RO: SAI_SWITCH_ATTR_AVAILABLE_FDB_ENTRY
Jun 24 08:02:23.000113 vlab-01 ERR syncd#syncd: :- internal_vs_generic_get: SAI_SWITCH_ATTR_AVAILABLE_FDB_ENTRY read only not implemented on oid:0x2100000000
Jun 24 08:02:23.000113 vlab-01 ERR syncd#syncd: :- meta_sai_get_oid: get status: SAI_STATUS_NOT_IMPLEMENTED
Jun 24 08:02:23.001651 vlab-01 WARNING swss#orchagent: :- checkCrmThresholds: NEXTHOP_GROUP_MEMBER THRESHOLD_CLEAR for TH_PERCENTAGE 0% Used count 0 free count 0
Jun 24 08:02:23.004233 vlab-01 WARNING swss#orchagent: :- checkCrmThresholds: NEXTHOP_GROUP THRESHOLD_CLEAR for TH_PERCENTAGE 0% Used count 0 free count 0
Jun 24 08:02:40.955039 vlab-01 ERR bgp#bgp_eoiu_marker.py: message repeated 24 times: [ *ERROR* get_all_peers Exception: Traceback (most recent call last):#012  File "/usr/bin/bgp_eoiu_marker.py", line 53, in get_all_peers#012    peer_info = json.loads(output)#012  File "/usr/lib/python2.7/json/__init__.py", line 339, in loads#012    return _default_decoder.decode(s)#012  File "/usr/lib/python2.7/json/decoder.py", line 364, in decode#012    obj, end = self.raw_decode(s, idx=_w(s, 0).end())#012  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode#012    raise ValueError("No JSON object could be decoded")#012ValueError: No JSON object could be decoded]
Jun 24 08:02:40.955039 vlab-01 ERR bgp#bgp_eoiu_marker.py: Failed to get bgp neighbor info in 120 seconds, exiting

@pavel-shirshov
Copy link
Contributor

Let's avoid ZMQ usage for now. GRPC/Protobuf will not help us in this case.
Please check my comment and I'm ok with .py portion of the PR.
P.S.: Looks like a hack, but I don't see better solution for now.

Copy link
Contributor

@pavel-shirshov pavel-shirshov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok for me.

Please check my comments.
Also please wait for some other team member opinion.

@lguohan
Copy link
Contributor

lguohan commented Jul 24, 2019

vs test on the swss is missing.

@jipanyang jipanyang force-pushed the bgp_eoiu branch 2 times, most recently from c90eeeb to a1e6a9d Compare July 27, 2019 19:21
@lguohan
Copy link
Contributor

lguohan commented Jul 27, 2019

retest this please

@lguohan
Copy link
Contributor

lguohan commented Jul 31, 2019

@jipanyang , I cannot resolve the conflict. can you resolve the conflict?

…in fpmsyncd

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
…olean

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
…olving merge conflict

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
@lguohan lguohan merged commit ee2b1e5 into sonic-net:master Aug 1, 2019
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
… platforms (sonic-net#856)

* Update stop, reset failed status and restart of systemd services
to support multi-asic platforms.

* Create function to avoid code duplication.

* Fixed errors due to pervious commit and review comments.

* Minor update to fix spacing.

* Minor update to fix spacing.

* Minor update to fix spacing.

* For multi asic platform updated logic of stopping/restarting of
services to ensure that the right instances are stopped and
restarted if a service is both global and multi-instance.

* Fixed log error message with incorrect number of parameterts.
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023
Signed-off-by: Venkat Garigipati <venkatg@cisco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants