Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodePortLocal rules for a particular Pod are missing while the NPL annotation is present #6527

Closed
tnqn opened this issue Jul 16, 2024 · 0 comments · Fixed by #6531
Closed
Labels
area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature kind/bug Categorizes issue or PR as related to a bug. reported-by/end-user Issues reported by end users.

Comments

@tnqn
Copy link
Member

tnqn commented Jul 16, 2024

Describe the bug

The symptom of the issue is as below:

  • Few pool members (Pods) are not reachable from the LoadBalancer when NodePortLocal is used, though the Pods are annotated with nodeportlocal.antrea.io annotation correctly.
  • Log in the Node that hosts the Pod, the NodePortLocal rules for this Pod is missing from iptables.
  • There are no errors logged for NodePortLocal, but it appears to have a lot of terminated Pods remaining on the Node.

The issue is because the service's endpoint pod had the same IP as a terminated Pod (because the IP was recycled, which is legit), and at some point the terminated Pod was deleted, causing iptables rule associated with the Pod IP was deleted.

A workaround to recover the issue is to restart antrea-agent on that node, which makes it rebuild its cache and re-install the missing iptables rules.

A proper fix will need to take IP recycle and the case of terminated Pods into consideration, ensuring rules bound to Pod itself, instead of its IP.

To Reproduce

  1. Create a Pod that could run into succeeded phase and will not restart.
  2. Create a Service with NodePortLocal enabled.
  3. Create a backend Pod for the above Service, making it reuse the PodIP of the Pod in the 1st step.
  4. Confirm the NPL rule is present and the Service is reachable via NodeLocal port.
  5. Delete the Pod created in the 1st step.
  6. Confirm the NPF rule is missing and the Service is no longer reachable via NodeLocal port.

Versions:

  • Antrea version (Docker image tag).
    v2.0.x <= v2.0.1
    v1.15.x <= v1.15.2
    all <= v1.14
@tnqn tnqn added kind/bug Categorizes issue or PR as related to a bug. area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature labels Jul 16, 2024
@tnqn tnqn added this to the Antrea v2.1 release milestone Jul 16, 2024
@tnqn tnqn added the reported-by/end-user Issues reported by end users. label Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature kind/bug Categorizes issue or PR as related to a bug. reported-by/end-user Issues reported by end users.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant