Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'adj-rib out' is incorrect after reloading configuration with an offline router #1126

Open
longmalx opened this issue Nov 15, 2022 · 5 comments
Assignees
Labels

Comments

@longmalx
Copy link

Bug Description

When a neighbor is unavailable during configuration changes, we have noticed that the adj-rib may contain routes which have been removed from the configuration. These remain even when the neighbor becomes available again.

All route changes are performed by updating exabgp.conf and issuing a reload.

To Reproduce

I have 2 configuration files (contained in exabgp.conf.zip):

  • exabgp.conf.empty (2 neighbors (1 available, 1 not) and no routes
  • exabgp.conf.routes (2 neighbors (1 available, 1 not) and no routes

I run commands to set the configuration to empty, restart, add the routes, reload, remove routes, reload

[root@edx-cms exabgp]# cp exabgp.conf.empty /etc/exabgp/exabgp.conf[root@edx-cms exabgp]# systemctl restart exabgp
[root@edx-cms exabgp]# cp exabgp.conf.routes /etc/exabgp/exabgp.conf
[root@edx-cms exabgp]# systemctl reload exabgp
[root@edx-cms exabgp]# exabgpcli show adj-rib out extensive;
neighbor 192.168.200.254 local-ip 0.0.0.0 local-as 65000 peer-as 65000 router-id 10.10.243.170 family-allowed in-open ipv4 flow flow
destination-ipv4 172.16.1.1/32 extended-community redirect:666:666
neighbor 192.168.200.253 local-ip 0.0.0.0 local-as 65000 peer-as 65000 router-id 10.10.243.170 family-allowed in-open ipv4 flow flow
destination-ipv4 172.16.1.1/32 extended-community redirect:666:666
[root@edx-cms exabgp]# cp exabgp.conf.empty /etc/exabgp/exabgp.conf
[root@edx-cms exabgp]# systemctl reload exabgp
[root@edx-cms exabgp]# exabgpcli show adj-rib out extensive;
neighbor 192.168.200.253 local-ip 0.0.0.0 local-as 65000 peer-as 65000 router-id 10.10.243.170 family-allowed in-open ipv4 flow flow
destination-ipv4 172.16.1.1/32 extended-community redirect:666:666
[root@edx-cms exabgp]# exabgpcli show neighbor summary
Peer            AS        up/down state       |     #sent     #recvd
192.168.200.254 65000     0:01:33 established           6          4

It can be seen that the neighbor which is not available has an incorrect entry in the RIB.
If I repeat the test with both neighbors connected, then there is no issue.

Expected behavior

After removing all routes I would expect there to be no entries when I run 'adj-rib out' command.

Environment (please complete the following information):

  • CentOS Linux release 7.9.2009 (Core)
  • Version 4.2.21
[root@edx-cms exabgp]# pip3 show exabgp
Name: exabgp
Version: 4.2.21
Summary: BGP swiss army knife
Home-page: https://github.com/Exa-Networks/exabgp
Author: Thomas Mangin
Author-email: thomas.mangin@exa-networks.co.uk
License: BSD 3-Clause License
Location: /usr/local/lib/python3.6/site-packages
Requires: setuptools

Additional context

Here are the logs from a reproduction - exabgp.log:

[root@edx-cms exabgp]# date
Tue Nov 15 06:29:26 EST 2022
[root@edx-cms exabgp]# cp exabgp.conf.empty /etc/exabgp/exabgp.conf
[root@edx-cms exabgp]# systemctl restart exabgp
[root@edx-cms exabgp]# cp exabgp.conf.routes /etc/exabgp/exabgp.conf
[root@edx-cms exabgp]# systemctl reload exabgp
[root@edx-cms exabgp]# exabgpcli show adj-rib out extensive;
neighbor 192.168.200.254 local-ip 0.0.0.0 local-as 65000 peer-as 65000 router-id 10.10.243.170 family-allowed in-open ipv4 flow flow destination-ipv4 172.16.1.1/32 extended-community redirect:666:666
neighbor 192.168.200.253 local-ip 0.0.0.0 local-as 65000 peer-as 65000 router-id 10.10.243.170 family-allowed in-open ipv4 flow flow destination-ipv4 172.16.1.1/32 extended-community redirect:666:666
[root@edx-cms exabgp]# cp exabgp.conf.empty /etc/exabgp/exabgp.conf
[root@edx-cms exabgp]# systemctl reload exabgp
[root@edx-cms exabgp]# exabgpcli show adj-rib out extensive;
neighbor 192.168.200.253 local-ip 0.0.0.0 local-as 65000 peer-as 65000 router-id 10.10.243.170 family-allowed in-open ipv4 flow flow destination-ipv4 172.16.1.1/32 extended-community redirect:666:666
[root@edx-cms exabgp]# systemctl stop exabgp
@thomas-mangin
Copy link
Member

This behaviour is indeed unintuitive and a side effect of the code design (I would have to check to be 100% sure), but there is a reason for it, and as changing it would change current users' expectations. I will look at possibly changing it in master (or documenting it) but probably not in the 4.x releases.

Currently, routes can be accepted via the API or the command line. We are unfortunately not "tagging" the source.
If the route is from the API, it should remain announced, if it came from the configuration file, I agree with you position that it should be removed, the code clearly does not do this correctly.

SIGNAL control of ExaBGP was designed before we decided that we should really control the program via the API and it has seen no work since, as otherwise other daemons, such as BIRD, are way more capable. So the focus is on programmatic change.

@longmalx
Copy link
Author

Thanks. I guess what I'm not following is why the router being offline would affect the behavior if this is caused by the intrinsic handling of API vs configuration routes.
I will have a look into this in more details later this week to see if I can work out any further hints.

@thomas-mangin
Copy link
Member

Sorry, I misread the information when first replying, and agree that it is not right. Please discard my answer above.

@longmalx
Copy link
Author

longmalx commented Nov 25, 2022

I think the problem is that this code in peer.py is never run to action the configuration changes if the neighbor is offline:

                    # we are here following a configuration change
                    if self._neighbor:
                        # see what changed in the configuration
                        self.neighbor.rib.outgoing.replace(self._neighbor.backup_changes, self._neighbor.changes)
                        # do not keep the previous routes in memory as they are not useful anymore
                        self._neighbor.backup_changes = []

I thought I'd tried reconnecting the neighbor and this didn't resolve the RIB, but the code looks like it should.

@thomas-mangin
Copy link
Member

Porting the patch to the main branch does cause a regression with the test suite there, so I can not close this issue or release a new 4.2 version until it can be investigated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants