-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WarmRestart class changes #600
WarmRestart class changes #600
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current bool WarmStart::checkWarmStart()
"Check warm start flag at the very begining of application, do it once for each process."
One of the key functions is to record the warm restart count, that is why it should be called once only for each application process.
Once the process is up, "WarmStart::isWarmStart()" could be used to check whether warm start has been enabled.
are you trying to address the scenario that fpmsyncd is cold started, then enable warm start to deal with state restore for zebra<->fpmSyncd tcp re-establishment? Probably that specific case should be handled separately.
Jipan, by having checkWarmStart() being called only once, we are limiting the scenarios in which Warm-Restart logic can be utilized. In scenarios such as zebra<->fpmSyncd tcp re-establishment, we cannot rely on isWarmStart() method. Btw, fpmSyncd may not be the only app interested in this behavior, as every other north-app has an fpmSyncd equivalent. We can dicuss this offline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the original definition of restart_count needs to be changed to reflect the extra scope.
restart_count = 1*10DIGIT ; a number between 0 and 2147483647,
; count of warm start times.
Other than fpmsyncd, I don't see other applications have the need to resync the state while being up.
Would it be simpler to just restart fpmsyncd in case of tcp connection lost? So we could focus on one scenario in stead of multiple different scenarios with the same goal of state restore/reconcile.
[ dependencies with PR/600: sonic-net#600 ] Please refer to the corresponding design document for more details: sonic-net/SONiC#239 The following manual UT plan has been successfully executed. UT automation pending. Physical topology formed by various BGP peers connecting to the DUT. Both non-ecmp and ecmp prefixes are learned/tested. Testcase Suite 1: Making use of “pkill -9 bgpd && pkill -9 zebra” as trigger. ============= IPv4 prefixes ========== * Restart zebra/bgpd: - Verify that all prefixes are properly stale-marked and that no change is pushed to AppDB during reconciliation. - Result: PASSED * Restart zebra/bgpd and add one new non-ecmp IPv4 prefix. - To produce a route-state-inconsistency, add prefix in adjacent node before bgp sessions are re-established. - Verify that new-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and withdraw one non-ecmp IPv4 prefix. - To produce a route-state-inconsistency, remove prefix in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and add one new path to an IPv4 ecmp-prefix. - To produce a route-state-inconsistency, add prefix-path in adjacent node before bgp sessions are re-established. - Verify that new prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and withdraw one ecmp-path from an existing ecmp IPv4 prefix. - To produce a route-state-inconsistency, remove prefix-path in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Eventually, during reconciliation, this path will be eliminated as it’s not being refreshed by remote-peer. - Result: PASSED IPv6 prefixes ========== * Restart zebra/bgpd: - Verify that all prefixes are properly stale-marked and that no change is pushed to AppDB during reconciliation. - Result: PASSED * Restart zebra/bgpd and add one new non-ecmp IPv6 prefix. - To produce a route-state-inconsistency, add prefix in adjacent node before bgp sessions are re-established. - Verify that new-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and withdraw one non-ecmp IPv6 prefix. - To produce a route-state-inconsistency, remove prefix in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and add one new path to an IPv6 ecmp-prefix. - To produce a route-state-inconsistency, add prefix-path in adjacent node before bgp sessions are re-established. - Verify that new prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and withdraw one ecmp-path from an existing ecmp IPv6 prefix. - To produce a route-state-inconsistency, remove prefix-path in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Eventually, during reconciliation, this path will be eliminated as it’s not being refreshed by remote-peer. - Result: PASSED Testcase Suite 2: Making use of sudo service bgp restart” as trigger. ============= IPv4 prefixes ========== * Restart bgp service/docker: - Verify that all prefixes are properly stale-marked and that no change is pushed to AppDB during reconciliation. - Result: PASSED * Restart bgp service/docker and add one new non-ecmp IPv4 prefix. - To produce a route-state-inconsistency, add prefix in adjacent node before bgp sessions are re-established. - Verify that new-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and withdraw one non-ecmp IPv4 prefix. - To produce a route-state-inconsistency, remove prefix in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and add one new path to an IPv4 ecmp-prefix. - To produce a route-state-inconsistency, add prefix-path in adjacent node before bgp sessions are re-established. - Verify that new prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and withdraw one ecmp-path from an existing ecmp IPv4 prefix. - To produce a route-state-inconsistency, remove prefix-path in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Eventually, during reconciliation, this path will be eliminated as it’s not being refreshed by remote-peer. - Result: PASSED IPv6 prefixes ========== * Restart bgp service/docker: - Verify that all prefixes are properly stale-marked and that no change is pushed to AppDB during reconciliation. - Result: PASSED * Restart bgp service/docker and add one new non-ecmp IPv6 prefix. - To produce a route-state-inconsistency, add prefix in adjacent node before bgp sessions are re-established. - Verify that new-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and withdraw one non-ecmp IPv6 prefix. - To produce a route-state-inconsistency, remove prefix in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and add one new path to an IPv6 ecmp-prefix. - To produce a route-state-inconsistency, add prefix-path in adjacent node before bgp sessions are re-established. - Verify that new prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and withdraw one ecmp-path from an existing ecmp IPv6 prefix. - To produce a route-state-inconsistency, remove prefix-path in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Eventually, during reconciliation, this path will be eliminated as it’s not being refreshed by remote-peer. - Result: PASSED Signed-off-by: Rodny Molina <rmolina@linkedin.com>
test failure |
what is the scenario for zebra <-> fpmsyncd re-establish? they are in the same docker, should be restarted as the same time. |
@lguohan i'm adding certain level of flexibility in my routing warm-reboot code. This logic is dealing with all these scenarios without needing to add any extra complexity: 1) routing-stack crash, or 2) fpmsyncd crash, or 3) docker restart. If we have a process-manager within the docker container (supervisord or watchfrr) that detects the crash and re-instantiate the process, then there's no need to restart the entire bgp-service from the host side. |
c30f2f1
to
f9be17a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
686484b
to
dc7b2a8
Compare
The failing test-case (VXLAN) is not related to any of my changes. |
retest this please |
The current WarmStart class assumes that restarting applications will invoke checkWarmStart() method during initialization. Currently, this method serves two main purposes: 1) a static-constructor and 2) checks if WarmRestart feature has been enabled for a given application. In this PR i'm proposing to split these different logics in two separated methods. The main goal here is to allow applications such as fpmSyncd to call checkWarmStart whenever it please (i.e. fpmSyncd restart or zebra<->fpmSyncd tcp re-establishment). There could be other applications in the future interested in this differentiated approch. Other than that i'm making small adjustments in the code to enhance readibility. Signed-off-by: Rodny Molina <rmolina@linkedin.com>
Signed-off-by: Rodny Molina <rmolina@linkedin.com>
Signed-off-by: Rodny Molina <rmolina@linkedin.com>
Signed-off-by: Rodny Molina <rmolina@linkedin.com>
…tate. Signed-off-by: Rodny Molina <rmolina@linkedin.com>
dc7b2a8
to
088ecc8
Compare
Signed-off-by: Rodny Molina <rmolina@linkedin.com>
088ecc8
to
5f4ae3b
Compare
…uration ports" (sonic-net#600) Signed-off-by: Vasant Patil <vapatil@linkedin.com>
The current WarmStart class assumes that restarting applications will invoke checkWarmStart() method during initialization. Currently, this method serves two main purposes: 1) a static-constructor and 2) checks if WarmRestart feature has been enabled for a given application.
In this PR i'm proposing to split these different logics in two separated methods. The main goal here is to allow applications such as fpmSyncd to call checkWarmStart whenever it please (i.e. fpmSyncd restart event or zebra<->fpmSyncd tcp re-establishment). There could be other applications in the future interested in this differentiated approach.
Other than that i'm making small adjustments in the code to enhance readibility.
Signed-off-by: Rodny Molina rmolina@linkedin.com