Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NodeLatencyMonitor not starting #6552

Merged

Conversation

antoninbas
Copy link
Contributor

The NodeLatencyMonitor was not starting (more precisely, the monitorLoop method was never called). This is because NewNodeLatencyMonitor was called after starting the informer factory. This means that the informer created in NewNodeLatencyMonitor was never started, and that WaitForNamedCacheSync was blocking indefinitely.

With this change, we ensure that NewNodeLatencyMonitor is called before starting the informer factory. The Run method is still called at the end of Agent "initialization", once the datapath has been configured.

Note that it is not clear that WaitForNamedCacheSync is really needed for the NodeLatencyMonitor. However, it is probably fine to use the same "pattern" as for our other controllers.

The NodeLatencyMonitor was not starting (more precisely, the monitorLoop
method was never called). This is because NewNodeLatencyMonitor was
called *after* starting the informer factory. This means that the
informer created in NewNodeLatencyMonitor was never started, and that
WaitForNamedCacheSync was blocking indefinitely.

With this change, we ensure that NewNodeLatencyMonitor is called
*before* starting the informer factory. The Run method is still called
at the end of Agent "initialization", once the datapath has been
configured.

Note that it is not clear that WaitForNamedCacheSync is really needed
for the NodeLatencyMonitor. However, it is probably fine to use the same
"pattern" as for our other controllers.

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas antoninbas requested review from tnqn and luolanzone July 25, 2024 23:40
@antoninbas antoninbas added the kind/bug Categorizes issue or PR as related to a bug. label Jul 25, 2024
@antoninbas antoninbas added this to the Antrea v2.1 release milestone Jul 25, 2024
@antoninbas
Copy link
Contributor Author

@IRONICBo FYI
Did you not run into this issue during your testing?

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Jul 26, 2024

@antoninbas thanks for testing and ensuring it works, otherwise it would be awkward to encounter the issue after release. We do need an e2e test to validate it if there isn't one. I will file an issue to track it. (Antonin already created #6549)

@IRONICBo did you make last minute change to it after testing done? I'm also curious how did it work before.

@tnqn
Copy link
Member

tnqn commented Jul 26, 2024

/skip-all

@antoninbas antoninbas merged commit 0a72491 into antrea-io:main Jul 26, 2024
52 of 58 checks passed
@antoninbas antoninbas deleted the fix-nodelatencymonitor-not-starting branch July 26, 2024 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants