Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default route is not deleted from H/W in some cases #9055

Closed
prsunny opened this issue Oct 25, 2021 · 2 comments · Fixed by #9182
Closed

Default route is not deleted from H/W in some cases #9055

prsunny opened this issue Oct 25, 2021 · 2 comments · Fixed by #9182

Comments

@prsunny
Copy link
Contributor

prsunny commented Oct 25, 2021

Description

With the fix for skipping eth0 route, observed that default route is not deleted from HW in some cases. (PR - 1606).

Old issue - #6483

Steps to reproduce the issue:

  1. Learn default route via BGP
  2. Confirm default route for Mgmt IP is present in table default
  3. Shutdown all bgp interfaces

Describe the results you received:

Default route still present in APP_DB/ASIC_DB

Describe the results you expected:

Default route must be deleted

Output of show version:

SONiC.20201231.39

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

show ip route 0.0.0.0/0
Routing entry for 0.0.0.0/0
  Known via "bgp", distance 20, metric 0, best
  Last update 00:00:12 ago
  * 10.0.0.57, via PortChannel0001, weight 1
  * 10.0.0.59, via PortChannel0002, weight 1
  * 10.0.0.61, via PortChannel0003, weight 1
  * 10.0.0.63, via PortChannel0004, weight 1

redis-cli 
127.0.0.1:6379> HGETALL ROUTE_TABLE:0.0.0.0/0
1) "nexthop"
2) "10.0.0.57,10.0.0.59,10.0.0.61,10.0.0.63"
3) "ifname"

config bgp shutdown all
Shutting down BGP session with neighbor 10.0.0.59...
Shutting down BGP session with neighbor fc00::72...
Shutting down BGP session with neighbor 10.0.0.63...
Shutting down BGP session with neighbor fc00::7a...
Shutting down BGP session with neighbor fc00::76...
Shutting down BGP session with neighbor 10.0.0.61...
Shutting down BGP session with neighbor 10.0.0.57...
Shutting down BGP session with neighbor fc00::7e...
admin@str2-7050cx3-acs-02:~/mux$ show ip route 0.0.0.0/0
Routing entry for 0.0.0.0/0
  Known via "static", distance 200, metric 0, best
  Last update 00:16:53 ago
  * 10.3.146.1, via eth0, weight 1

$ redis-cli 
127.0.0.1:6379> HGETALL ROUTE_TABLE:0.0.0.0/0
1) "nexthop"
2) "10.0.0.57,10.0.0.61"
3) "ifname"
4) "PortChannel0001,PortChannel0003"
127.0.0.1:6379> 
@arlakshm arlakshm mentioned this issue Nov 5, 2021
2 tasks
@arlakshm arlakshm linked a pull request Nov 5, 2021 that will close this issue
2 tasks
arlakshm added a commit that referenced this issue Nov 24, 2021
Why I did it
resolves #8979 and #9055

How I did it
Remove the file static.conf.j2,which adds the default route on eth0 from bgp docker

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
qiluo-msft pushed a commit that referenced this issue Dec 1, 2021
Why I did it
resolves #8979 and #9055

How I did it
Remove the file static.conf.j2,which adds the default route on eth0 from bgp docker

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
@Minkang-Tsai
Copy link

@prsunny
If a user set default route which nexthop is eth0 after learn default route via BGP and then shutdown all BGP interfaces, the problem still happened, right?
We think that root cause is FRR doesn't notify fpmsyncd remove the default route which learned by BGP when shutdown all BGP interfaces. From FRR point of view, it can chose the default route added by user under this situation, so it just updates default route. Fpmsyncd can not updates default route from APP DB because the default route which nexthop is eth0, it will be skipped by the following condition in routersync.cpp

    vector<string> alsv = tokenize(ifnames, ',');
    for (auto alias : alsv)
    {
        /*
         * An FRR behavior change from 7.2 to 7.5 makes FRR update default route to eth0 in interface
         * up/down events. Skipping routes to eth0 or docker0 to avoid such behavior
         */
        if (alias == "eth0" || alias == "docker0")
        {
            SWSS_LOG_DEBUG("Skip routes to eth0 or docker0: %s %s %s",
                    destipprefix, nexthops.c_str(), ifnames.c_str());
            return;
        }
    }

On the other hand, FRR will updates routes which added by kernel when interface down, so fpmsyncd need to update routes
from local/main/default routing table for default VRF. On booting, linux kernel will add default route to default routing table according to interfaces.j2, so fpmsyncd will get the route update for default route when interface down. According to this issue we think that fpmsyncd need to do the following modification

  • Need to revert 0003-Use-vrf_id-for-vrf-not-tabled_id.patch
  • Vrfmgrd need to save mapping between VRF name and table ID to REDIS DB.
  • fpmsyncd need to distinguish local/main/default routing table for default VRF.
  • fpmsyncd apply route to default VRF according to routing table policy.
  • fpmsyncd need to remove route1 from APP_DB if it receives a route which nexthop is eth0 and prefix is same the route1.

What do you think about it?

@prsunny
Copy link
Contributor Author

prsunny commented Dec 9, 2021

@Minkang-Tsai , I think with the fix for skipping eth0 in routesync and the fix to remove static route from FRR (#9182 ), we've this issue resolved, correct?. Agree with the approach that you proposed but it seems to be significant changes.

arlakshm added a commit that referenced this issue Jan 19, 2022
Why I did it
resolves #8979 and #9055

How I did it
Remove the file static.conf.j2,which adds the default route on eth0 from bgp docker

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
abdosi pushed a commit that referenced this issue Feb 18, 2022
resolves #8979 and #9055

How I did it
Remove the file static.conf.j2,which adds the default route on eth0 from frr docker

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants