Linkerd CNI repair controller does not listen on IPv6 #12864

lwj5 · 2024-07-21T09:34:56Z

What is the issue?

Repair controller fail to start on EKS IPv6 cluster. Admin is only listening at 0.0.0.0:9990

  Warning  Unhealthy  21s (x6 over 71s)  kubelet            Liveness probe failed: Get "http://[ipv6]:9990/live": dial tcp [ipv6]:9990: connect: connection refused
  Warning  Unhealthy  21s (x7 over 61s)  kubelet            Readiness probe failed: Get "http://[ipv6]:9990/ready": dial tcp [ipv6]:9990: connect: connection refused
  Normal   Killing    21s (x2 over 51s)  kubelet            Container repair-controller failed liveness probe, will be restarted

How can it be reproduced?

Start a IPv6 only cluster and instal linkerd cni with repair controller enabled.

Logs, error output, etc

None

output of `linkerd check -o short`

Environment

EKS 1.18

Possible solution

  repair-controller:
    Container ID:    containerd://8cfbb7ced3c98c071c2078722ac385a914e44e5e6c8b23eed176a013231b0367
    Image:           cr.l5d.io/linkerd/cni-plugin:v1.5.1
    Image ID:        cr.l5d.io/linkerd/cni-plugin@sha256:adc21c4af0cfae6e6454b6aecac11f13c11edd351813bf0dd60260191fe4e375
    Port:            9990/TCP
    Host Port:       0/TCP
    SeccompProfile:  RuntimeDefault
    Command:
      /usr/lib/linkerd/linkerd-cni-repair-controller
    Args:
      --admin-addr=0.0.0.0:9990
      --log-format
      plain
      --log-level
      info

Change admin address to listen on IPv6 as well. [::]:9990

Additional context

No response

Would you like to work on fixing this bug?

yes

The text was updated successfully, but these errors were encountered:

@mateiidavid

Fixes #12864 The linkerd-cni network-validator container was binding to the IPv4 wildcard and connecting to an IPv4 address. This wasn't breaking things in IPv6 clusters but it was only validating the iptables rules and not the ip6tables ones. This change introduces logic to use addresses according to the value of `disableIPv6`. If IPv6 is enabled, then the ip6tables rules would get exercised. Note that a more complete change would also exercise both iptables and ip6tables, but for now we're defaulting to ip6tables. This implied changing the helm value `networkValidator.connectAddr` to `connectPort`. @mateiidavid could you please validate if this entry with its simplified doc still makes sense, in light of #12797 ? Similarly was the case with repair-controller, but given the IPv4 wildcard was used for the admin server, in IPv6 clusters the kubelet wasn't able to reach the probe endpoints and the container was failing. In this case the fix is just have the admin server bind to `[::]`, which works for IPv4 and IPv6 clusters.

alpeb · 2024-07-22T22:37:07Z

Thanks for bringing this up; I was able to replicate the issue. I've raised #12874; please keep an eye for when that gets included in an edge, and let us know how it goes! :-)

@mateiidavid

Fixes #12864 The linkerd-cni network-validator container was binding to the IPv4 wildcard and connecting to an IPv4 address. This wasn't breaking things in IPv6 clusters but it was only validating the iptables rules and not the ip6tables ones. This change introduces logic to use addresses according to the value of `disableIPv6`. If IPv6 is enabled, then the ip6tables rules would get exercised. Note that a more complete change would also exercise both iptables and ip6tables, but for now we're defaulting to ip6tables. This implied changing the helm value `networkValidator.connectAddr` to `connectPort`. @mateiidavid could you please validate if this entry with its simplified doc still makes sense, in light of #12797 ? Similarly was the case with repair-controller, but given the IPv4 wildcard was used for the admin server, in IPv6 clusters the kubelet wasn't able to reach the probe endpoints and the container was failing. In this case the fix is just have the admin server bind to `[::]`, which works for IPv4 and IPv6 clusters.

lwj5 added the bug label Jul 21, 2024

alpeb mentioned this issue Jul 22, 2024

Configure network-validator and repair-controller to work with IPv6 #12874

Merged

alpeb closed this as completed in #12874 Jul 24, 2024

alpeb closed this as completed in 6603409 Jul 24, 2024

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linkerd CNI repair controller does not listen on IPv6 #12864

Linkerd CNI repair controller does not listen on IPv6 #12864

lwj5 commented Jul 21, 2024 •

edited

Loading

alpeb commented Jul 22, 2024

Linkerd CNI repair controller does not listen on IPv6 #12864

Linkerd CNI repair controller does not listen on IPv6 #12864

Comments

lwj5 commented Jul 21, 2024 • edited Loading

What is the issue?

How can it be reproduced?

Logs, error output, etc

output of linkerd check -o short

Environment

Possible solution

Additional context

Would you like to work on fixing this bug?

alpeb commented Jul 22, 2024

lwj5 commented Jul 21, 2024 •

edited

Loading

output of `linkerd check -o short`