Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isisd: Fix memory leaks when the transition of neighbor state from non-UP to DOWN #15716

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

zhou-run
Copy link
Contributor

@zhou-run zhou-run commented Apr 9, 2024

When receiving a hello packet, if the neighbor state transitions directly from a non-ISIS_ADJ_UP state (such as ISIS_ADJ_INITIALIZING) to ISIS_ADJ_DOWN state, the neighbor entry cannot be deleted. If the neighbor is removed or the neighbor's System ID changes, it may result in memory leakage in the neighbor entry.

Test Scenario:
LAN link between Router A and Router B is established. Router A does not configure neighbor authentication, while Router B is configured with neighbor authentication. When the neighbor entry on Router B ages out, the neighbor state on Router A transitions to INIT. If Router B is then removed, the neighbor state on Router A transitions to DOWN and persists.

Signed-off-by: zhou-run zhou.run@h3c.com

@frrbot frrbot bot added the isis label Apr 9, 2024
@riw777 riw777 self-requested a review April 9, 2024 14:32
Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@zhou-run
Copy link
Contributor Author

Copy link
Member

@odd22 odd22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@odd22
Copy link
Member

odd22 commented Apr 16, 2024

Does the failed test make sense, or in other words, should the neighbor not be deleted when transitioning from UP to DOWN state? https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO5U18ARM8-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO8U18AMD64-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-ASAN7-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO1DEB10AMD64-2673

I re-run the failed test to be sure it is not a false positive.

Now, regarding your question, the neighbor is created when you add it in the configuration and when IS-IS Hello messages are received on the given interface and are deleted when you remove it in the configuration. I think, we must keep the context when the interface goes from UP to DOWN to speed up the process when the interface goes from DOWN to UP. In particular, when an interface (frequent when optical connector SFP reach end of life) flap (going Down, then Up, then Down ...), if you remove the context, the code will delete dynamic memory allocation and then recreate it and so on. This could decrease performance and at the end the risk of a memory corruption. So, I think it is preferable to keep the context if the interface goes DOWN and only remove the context when the neighbor is removed from the configuration

@donaldsharp donaldsharp self-requested a review April 16, 2024 15:29
@zhou-run
Copy link
Contributor Author

Does the failed test make sense, or in other words, should the neighbor not be deleted when transitioning from UP to DOWN state? https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO5U18ARM8-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO8U18AMD64-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-ASAN7-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO1DEB10AMD64-2673

I re-run the failed test to be sure it is not a false positive.

Now, regarding your question, the neighbor is created when you add it in the configuration and when IS-IS Hello messages are received on the given interface and are deleted when you remove it in the configuration. I think, we must keep the context when the interface goes from UP to DOWN to speed up the process when the interface goes from DOWN to UP. In particular, when an interface (frequent when optical connector SFP reach end of life) flap (going Down, then Up, then Down ...), if you remove the context, the code will delete dynamic memory allocation and then recreate it and so on. This could decrease performance and at the end the risk of a memory corruption. So, I think it is preferable to keep the context if the interface goes DOWN and only remove the context when the neighbor is removed from the configuration

Does 'remove the context when the neighbor is removed from the configuration' refer to the command 'clear isis neighbor [WORD]'?

@odd22
Copy link
Member

odd22 commented Apr 30, 2024

Does 'remove the context when the neighbor is removed from the configuration' refer to the command 'clear isis neighbor [WORD]'?
No. I'm referring to CLI no isis router WORD at the interface level which delete the IS-IS configuration on a given interface.

@riw777
Copy link
Member

riw777 commented May 7, 2024

this says there are linter problems, but it doesn't show that the are ... let's try rerunning ci to see if we can find them

@riw777
Copy link
Member

riw777 commented May 7, 2024

ci:rerun

@zhou-run
Copy link
Contributor Author

Does 'remove the context when the neighbor is removed from the configuration' refer to the command 'clear isis neighbor [WORD]'?
No. I'm referring to CLI no isis router WORD at the interface level which delete the IS-IS configuration on a given interface.

Does this mean that when the IS-IS configuration is not removed on the specified interface, we do not need to consider potential memory leaks caused by frequent changes in the System ID due to neighbor updates?
Also, if neighbors are not deleted, should the neighbor state transition from DOWN to UP again also ensure that it goes through the INITIALIZING state?

@ton31337
Copy link
Member

@Mergifyio backport stable/10.0

Copy link

mergify bot commented May 17, 2024

backport stable/10.0

✅ Backports have been created

@ton31337
Copy link
Member

@frrbot rereview

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix frrbot styling issues found.

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squash to a single commit all 3 commits.

…n-UP to DOWN

When receiving a hello packet, if the neighbor state transitions directly from a non-ISIS_ADJ_UP state (such as ISIS_ADJ_INITIALIZING) to ISIS_ADJ_DOWN state, the neighbor entry cannot be deleted. If the neighbor is removed or the neighbor's System ID changes, it may result in memory leakage in the neighbor entry.

Test Scenario:
LAN link between Router A and Router B is established. Router A does not configure neighbor authentication, while Router B is configured with neighbor authentication. When the neighbor entry on Router B ages out, the neighbor state on Router A transitions to INIT. If Router B is then removed, the neighbor state on Router A transitions to DOWN and persists.

Signed-off-by: zhou-run <166502045+zhou-run@users.noreply.github.com>

fix frrbot styling issues found.

fix frrbot styling issues found.

Signed-off-by: zhou-run <166502045+zhou-run@users.noreply.github.com>
@zhou-run
Copy link
Contributor Author

Squash to a single commit all 3 commits.

Done.

@riw777
Copy link
Member

riw777 commented Jun 4, 2024

Does 'remove the context when the neighbor is removed from the configuration' refer to the command 'clear isis neighbor [WORD]'?
No. I'm referring to CLI no isis router WORD at the interface level which delete the IS-IS configuration on a given interface.

Does this mean that when the IS-IS configuration is not removed on the specified interface, we do not need to consider potential memory leaks caused by frequent changes in the System ID due to neighbor updates? Also, if neighbors are not deleted, should the neighbor state transition from DOWN to UP again also ensure that it goes through the INITIALIZING state?

are we still waiting on an answer to this before pushing?

should this be labeled a bugfix? it seems like it should ...

@zhou-run
Copy link
Contributor Author

zhou-run commented Jun 6, 2024

No. I'm referring to CLI no isis router WORD at the interface level which delete the IS-IS configuration on a given interface.

Does this mean that when the IS-IS configuration is not removed on the specified interface, we do not need to consider potential memory leaks caused by frequent changes in the System ID due to neighbor updates? Also, if neighbors are not deleted, should the neighbor state transition from DOWN to UP again also ensure that it goes through the INITIALIZING state?

I tried the scenario of a P2P link. If the link goes DOWN without deleting the neighbor table entry, the new neighbor cannot be established because the old neighbor is still occupying the slot, unless ISIS is deleted and reconfigured. @odd22 @riw777

@odd22
Copy link
Member

odd22 commented Jun 11, 2024

I tried the scenario of a P2P link. If the link goes DOWN without deleting the neighbor table entry, the new neighbor cannot be established because the old neighbor is still occupying the slot, unless ISIS is deleted and reconfigured. @odd22 @riw777

OK. So, in this case, we must delete the neighbor context to be sure that the adjacency could goes UP when the link goes UP again. Just to be sure, when you perform the test, you shutdown the interface and then no shut the interface ? You don't try to change the neighbor or the configuration of the interface ?

@odd22
Copy link
Member

odd22 commented Jun 11, 2024

I just perform a test on one of my router: set the interface down with ìp link set down dev ethX` check that we lost the IS-IS adjacency and thus the neighbor. Then, set the link up again and we get back the IS-IS neighbor. I perform the test on a standard FRR. Not compiled with your patch. So, the original code handle correctly link failure when interface is P2P.
So, in case of the state is not UP, I think you are right and neighbor should be removed to handle the memory leak and also the case where the adjacency is UP and goes down, then UP again even if it consume some CPU/Memory allocation in case of intermittent link failure.

@zhou-run
Copy link
Contributor Author

I tried the scenario of a P2P link. If the link goes DOWN without deleting the neighbor table entry, the new neighbor cannot be established because the old neighbor is still occupying the slot, unless ISIS is deleted and reconfigured. @odd22 @riw777

OK. So, in this case, we must delete the neighbor context to be sure that the adjacency could goes UP when the link goes UP again. Just to be sure, when you perform the test, you shutdown the interface and then no shut the interface ? You don't try to change the neighbor or the configuration of the interface ?

This is an exceptional scenario. I did not shutdown the interface and then no shut the interface, but due to failed adjacency authentication, the neighbor became unidirectional. After the IS table entry for the neighbor aged out, it relearned the INIT entry for the neighbor. At this point, when I modified the neighbor's IS net, it resulted in the new neighbor unable to establish adjacency.

@riw777
Copy link
Member

riw777 commented Jun 24, 2024

what requested change are we waiting on here?

@zhou-run
Copy link
Contributor Author

what requested change are we waiting on here?

I don't know either. @ton31337

@ton31337 ton31337 dismissed their stale review June 25, 2024 05:05

Squashed.

@riw777 riw777 merged commit 63924c9 into FRRouting:master Jul 2, 2024
9 checks passed
Jafaral added a commit that referenced this pull request Jul 2, 2024
isisd: Fix memory leaks when the transition of neighbor state from non-UP to DOWN (backport #15716)
@zhou-run zhou-run deleted the 202404092101 branch July 10, 2024 05:53
@donaldsharp
Copy link
Member

@Mergifyio backport stable/10.1

Copy link

mergify bot commented Oct 29, 2024

backport stable/10.1

✅ Backports have been created

Jafaral added a commit that referenced this pull request Oct 29, 2024
isisd: Fix memory leaks when the transition of neighbor state from non-UP to DOWN (backport #15716)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants