Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkerd CNI repair controller does not listen on IPv6 #12864

Closed
lwj5 opened this issue Jul 21, 2024 · 1 comment · Fixed by #12874
Closed

Linkerd CNI repair controller does not listen on IPv6 #12864

lwj5 opened this issue Jul 21, 2024 · 1 comment · Fixed by #12874
Labels

Comments

@lwj5
Copy link

lwj5 commented Jul 21, 2024

What is the issue?

Repair controller fail to start on EKS IPv6 cluster. Admin is only listening at 0.0.0.0:9990

  Warning  Unhealthy  21s (x6 over 71s)  kubelet            Liveness probe failed: Get "http://[ipv6]:9990/live": dial tcp [ipv6]:9990: connect: connection refused
  Warning  Unhealthy  21s (x7 over 61s)  kubelet            Readiness probe failed: Get "http://[ipv6]:9990/ready": dial tcp [ipv6]:9990: connect: connection refused
  Normal   Killing    21s (x2 over 51s)  kubelet            Container repair-controller failed liveness probe, will be restarted

How can it be reproduced?

Start a IPv6 only cluster and instal linkerd cni with repair controller enabled.

Logs, error output, etc

None

output of linkerd check -o short

Environment

EKS 1.18

Possible solution

  repair-controller:
    Container ID:    containerd://8cfbb7ced3c98c071c2078722ac385a914e44e5e6c8b23eed176a013231b0367
    Image:           cr.l5d.io/linkerd/cni-plugin:v1.5.1
    Image ID:        cr.l5d.io/linkerd/cni-plugin@sha256:adc21c4af0cfae6e6454b6aecac11f13c11edd351813bf0dd60260191fe4e375
    Port:            9990/TCP
    Host Port:       0/TCP
    SeccompProfile:  RuntimeDefault
    Command:
      /usr/lib/linkerd/linkerd-cni-repair-controller
    Args:
      --admin-addr=0.0.0.0:9990
      --log-format
      plain
      --log-level
      info

Change admin address to listen on IPv6 as well. [::]:9990

Additional context

No response

Would you like to work on fixing this bug?

yes

@lwj5 lwj5 added the bug label Jul 21, 2024
alpeb added a commit that referenced this issue Jul 22, 2024
Fixes #12864

The linkerd-cni network-validator container was binding to the IPv4
wildcard and connecting to an IPv4 address. This wasn't breaking things
in IPv6 clusters but it was only validating the iptables rules and not
the ip6tables ones. This change introduces logic to use addresses
according to the value of `disableIPv6`. If IPv6 is enabled, then the
ip6tables rules would get exercised. Note that a more complete change
would also exercise both iptables and ip6tables, but for now we're
defaulting to ip6tables.

This implied changing the helm value `networkValidator.connectAddr` to
`connectPort`. @mateiidavid could you please validate if this entry with
its simplified doc still makes sense, in light of #12797 ?

Similarly was the case with repair-controller, but given the IPv4
wildcard was used for the admin server, in IPv6 clusters the kubelet
wasn't able to reach the probe endpoints and the container was failing.
In this case the fix is just have the admin server bind to `[::]`, which
works for IPv4 and IPv6 clusters.
@alpeb
Copy link
Member

alpeb commented Jul 22, 2024

Thanks for bringing this up; I was able to replicate the issue. I've raised #12874; please keep an eye for when that gets included in an edge, and let us know how it goes! :-)

alpeb added a commit that referenced this issue Jul 23, 2024
Fixes #12864

The linkerd-cni network-validator container was binding to the IPv4
wildcard and connecting to an IPv4 address. This wasn't breaking things
in IPv6 clusters but it was only validating the iptables rules and not
the ip6tables ones. This change introduces logic to use addresses
according to the value of `disableIPv6`. If IPv6 is enabled, then the
ip6tables rules would get exercised. Note that a more complete change
would also exercise both iptables and ip6tables, but for now we're
defaulting to ip6tables.

This implied changing the helm value `networkValidator.connectAddr` to
`connectPort`. @mateiidavid could you please validate if this entry with
its simplified doc still makes sense, in light of #12797 ?

Similarly was the case with repair-controller, but given the IPv4
wildcard was used for the admin server, in IPv6 clusters the kubelet
wasn't able to reach the probe endpoints and the container was failing.
In this case the fix is just have the admin server bind to `[::]`, which
works for IPv4 and IPv6 clusters.
@alpeb alpeb closed this as completed in 6603409 Jul 24, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants