Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linkerd-cni does not chain correctly with OCI/OKE flannel cni #10413

Closed
blabu23 opened this issue Feb 28, 2023 · 12 comments
Closed

linkerd-cni does not chain correctly with OCI/OKE flannel cni #10413

blabu23 opened this issue Feb 28, 2023 · 12 comments
Assignees
Labels
bug env/oke Oracle Container Engine for Kubernetes wontfix

Comments

@blabu23
Copy link

blabu23 commented Feb 28, 2023

What is the issue?

In our Oracle OCI/OKE K8S environment we have the need to use linkerd-cni because of the requirement to use PodSecurity.
Oracle by default installs a flannel CNI right after provisioning the control plane.
Afterwards, I tried to install the linkerd-cni with helm and after that I tried to install cert-manager.

How can it be reproduced?

setup Oracle OKE cluster, node pool and a number of worker nodes

> kubectl get nodes
NAME           STATUS   ROLES   AGE    VERSION
10.27.41.101   Ready    node    4d7h   v1.25.4
10.27.41.197   Ready    node    4d7h   v1.25.4
10.27.41.250   Ready    node    4d7h   v1.25.4
10.27.41.91    Ready    node    4d7h   v1.25.4
10.27.41.97    Ready    node    4d7h   v1.25.4

install linkerd-cni with helm

> helm repo add linkerd-stable https://helm.linkerd.io/stable
"linkerd-stable" has been added to your repositories
> helm install linkerd-cni --namespace linkerd-cni --create-namespace linkerd-stable/linkerd2-cni
NAME: linkerd-cni
LAST DEPLOYED: Tue Feb 28 15:19:48 2023
NAMESPACE: linkerd-cni
STATUS: deployed
REVISION: 1
TEST SUITE: None
> kubectl get pods -n linkerd-cni
NAME                READY   STATUS    RESTARTS   AGE
linkerd-cni-66wst   1/1     Running   0          41s
linkerd-cni-8vqws   1/1     Running   0          41s
linkerd-cni-cp5jw   1/1     Running   0          41s
linkerd-cni-nbthn   1/1     Running   0          41s
linkerd-cni-svxd2   1/1     Running   0          41s

install cert-manager with helm

> helm repo add jetstack https://charts.jetstack.io
"jetstack" has been added to your repositories
> helm install cert-manager --namespace cert-manager --create-namespace --set installCRDs=true jetstack/cert-manager
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
> kubectl get pods -n cert-manager
NAME                                       READY   STATUS              RESTARTS   AGE
cert-manager-59bf757d77-kszm7              0/1     ContainerCreating   0          21m
cert-manager-cainjector-547c9b8f95-n2sb7   0/1     ContainerCreating   0          21m
cert-manager-startupapicheck-xrf67         0/1     ContainerCreating   0          21m
cert-manager-webhook-6787f645b9-zhfwg      0/1     ContainerCreating   0          21m
> kubectl describe pod -n cert-manager cert-manager-59bf757d77-kszm7
Name:           cert-manager-59bf757d77-kszm7
Namespace:      cert-manager
Priority:       0
Node:           10.27.41.101/10.27.41.101
Start Time:     Tue, 28 Feb 2023 15:23:58 +0100
Labels:         app=cert-manager
                app.kubernetes.io/component=controller
                app.kubernetes.io/instance=cert-manager
                app.kubernetes.io/managed-by=Helm
                app.kubernetes.io/name=cert-manager
                app.kubernetes.io/version=v1.11.0
                helm.sh/chart=cert-manager-v1.11.0
                pod-template-hash=59bf757d77
Annotations:    prometheus.io/path: /metrics
                prometheus.io/port: 9402
                prometheus.io/scrape: true
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/cert-manager-59bf757d77
Containers:
  cert-manager-controller:
    Container ID:  
    Image:         quay.io/jetstack/cert-manager-controller:v1.11.0
    Image ID:      
    Port:          9402/TCP
    Host Port:     0/TCP
    Args:
      --v=2
      --cluster-resource-namespace=$(POD_NAMESPACE)
      --leader-election-namespace=kube-system
      --acme-http01-solver-image=quay.io/jetstack/cert-manager-acmesolver:v1.11.0
      --max-concurrent-challenges=60
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      POD_NAMESPACE:  cert-manager (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sqf52 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-sqf52:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                 From               Message
  ----     ------                  ----                ----               -------
  Normal   Scheduled               23m                 default-scheduler  Successfully assigned cert-manager/cert-manager-59bf757d77-kszm7 to 10.27.41.101
  Warning  FailedCreatePodSandBox  22m                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-59bf757d77-kszm7_cert-manager_bf83dc3e-020c-42bd-8177-591fad2e8f3e_0(b01526904067d4c19240395178a70eda6f60d8698b81cc38e7c3e5db724502a1): error adding pod cert-manager_cert-manager-59bf757d77-kszm7 to CNI network "cbr0": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect
  Warning  FailedCreatePodSandBox  21m                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-59bf757d77-kszm7_cert-manager_bf83dc3e-020c-42bd-8177-591fad2e8f3e_0(de413933b372fe0bcfa543b40594480dc3c2b27a1cd1f571d1ad29ab8b29263b): error adding pod cert-manager_cert-manager-59bf757d77-kszm7 to CNI network "cbr0": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect
  Warning  FailedCreatePodSandBox  20m                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-59bf757d77-kszm7_cert-manager_bf83dc3e-020c-42bd-8177-591fad2e8f3e_0(f2d8b31127963deb5da58455335b085b9aab3048c1eae1105c7acd05443fa767): error adding pod cert-manager_cert-manager-59bf757d77-kszm7 to CNI network "cbr0": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect
  Warning  FailedCreatePodSandBox  19m                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-59bf757d77-kszm7_cert-manager_bf83dc3e-020c-42bd-8177-591fad2e8f3e_0(176eb5d8b8c2c6ca1b0754a3e8ea69f21a59723cb0fa5bcdb36eb391ccd69e46): error adding pod cert-manager_cert-manager-59bf757d77-kszm7 to CNI network "cbr0": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect
  Warning  FailedCreatePodSandBox  19m                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-59bf757d77-kszm7_cert-manager_bf83dc3e-020c-42bd-8177-591fad2e8f3e_0(f7bfc9120639710bf3bd60d34720b65ffcebf55e8d2d6f0b934d165d6056deee): error adding pod cert-manager_cert-manager-59bf757d77-kszm7 to CNI network "cbr0": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect
  Warning  FailedCreatePodSandBox  18m                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-59bf757d77-kszm7_cert-manager_bf83dc3e-020c-42bd-8177-591fad2e8f3e_0(6d87284a125770c869dacd7e11348cc6c2d7c2382bb6ddb2aeeb03d070209d63): error adding pod cert-manager_cert-manager-59bf757d77-kszm7 to CNI network "cbr0": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect
  Warning  FailedCreatePodSandBox  17m                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-59bf757d77-kszm7_cert-manager_bf83dc3e-020c-42bd-8177-591fad2e8f3e_0(76abb2bfc695fcd1e4cd93d5de2c5089b977dc37bc5eb79817f54271334b8f9c): error adding pod cert-manager_cert-manager-59bf757d77-kszm7 to CNI network "cbr0": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect
  Warning  FailedCreatePodSandBox  16m                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-59bf757d77-kszm7_cert-manager_bf83dc3e-020c-42bd-8177-591fad2e8f3e_0(1e3ee36bc8b2ef194f1883beb6da6fe62d5527e65ef9917c7c3cb77d2f817474): error adding pod cert-manager_cert-manager-59bf757d77-kszm7 to CNI network "cbr0": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect
  Warning  FailedCreatePodSandBox  15m                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-59bf757d77-kszm7_cert-manager_bf83dc3e-020c-42bd-8177-591fad2e8f3e_0(1f8fc4890a98b7d553c6ac5b335a8f58a41fbf9b31cf4d5e8e2460517d3c540b): error adding pod cert-manager_cert-manager-59bf757d77-kszm7 to CNI network "cbr0": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect
  Warning  FailedCreatePodSandBox  15s (x19 over 15m)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-59bf757d77-kszm7_cert-manager_bf83dc3e-020c-42bd-8177-591fad2e8f3e_0(8b496102a9ee7a05b72a32bfe24125a87b7541e9db5ae05e9e63ab9766990b0b): error adding pod cert-manager_cert-manager-59bf757d77-kszm7 to CNI network "cbr0": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect

Logs, error output, etc

Warning FailedCreatePodSandBox 15m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-59bf757d77-kszm7_cert-manager_bf83dc3e-020c-42bd-8177-591fad2e8f3e_0(1f8fc4890a98b7d553c6ac5b335a8f58a41fbf9b31cf4d5e8e2460517d3c540b): error adding pod cert-manager_cert-manager-59bf757d77-kszm7 to CNI network "cbr0": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect

output of linkerd check -o short

no linkerd installed yet... so

> linkerd check --pre --linkerd-cni-enabled
Linkerd core checks
===================

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

pre-kubernetes-setup
--------------------
√ control plane namespace does not already exist
√ can create non-namespaced resources
√ can create ServiceAccounts
√ can create Services
√ can create Deployments
√ can create CronJobs
√ can create ConfigMaps
√ can create Secrets
√ can read Secrets
√ can read extension-apiserver-authentication configmap
√ no clock skew detected

linkerd-cni-plugin
------------------
√ cni plugin ConfigMap exists
√ cni plugin ClusterRole exists
√ cni plugin ClusterRoleBinding exists
√ cni plugin ServiceAccount exists
√ cni plugin DaemonSet exists
√ cni plugin pod is running on all nodes

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

Status check results are √

Environment

  • Kubernetes Version: 1.25.4 (but same error with 1.24.1)
  • Oracle OCI OKE

Possible solution

I really wish I had one...

Additional context

No response

Would you like to work on fixing this bug?

maybe

@blabu23 blabu23 added the bug label Feb 28, 2023
@mateiidavid
Copy link
Member

Hi @blabu23, do you happen to know if your distribution configured flannel in a different directory than the default? You might need to tell linkerd-cni where to install itself. k3d (more specifically, k3s) has this problem, the CNI conflist is not in "/etc/cni/net.d/" and is instead "/var/lib/rancher/k3s/agent/etc/cni/net.d/" while the binary is expected to be installed in bin/ instead of /opt/cni/bin.

Can you confirm this is not the case with your distribution? You can either ssh onto one of your hosts or run a pod that attaches a hostPath volume mounted at the root path / (so you have access to the entire filesystem).

$ find . -type f -name '*flannel*.conf*'
# ./var/lib/rancher/k3s/agent/etc/cni/net.d/10-flannel.conflist in k3s so conflist needs to go somewhere else

$ which flannel
# /bin/flannel in k3s, so binary path needs to be /bin

@blabu23
Copy link
Author

blabu23 commented Feb 28, 2023

nope, afaik, everything is where it belongs:

> find / -type f -name '*flannel*.conf*'
/etc/cni/net.d/10-flannel.conflist

> find /opt/cni
/opt/cni/
/opt/cni/bin
/opt/cni/bin/bandwidth
/opt/cni/bin/bridge
/opt/cni/bin/dhcp
/opt/cni/bin/firewall
/opt/cni/bin/flannel
/opt/cni/bin/host-device
/opt/cni/bin/host-local
/opt/cni/bin/ipvlan
/opt/cni/bin/loopback
/opt/cni/bin/macvlan
/opt/cni/bin/portmap
/opt/cni/bin/ptp
/opt/cni/bin/sbr
/opt/cni/bin/static
/opt/cni/bin/tuning
/opt/cni/bin/vlan
/opt/cni/bin/vrf
/opt/cni/bin/linkerd-cni

@stevej
Copy link
Contributor

stevej commented Mar 2, 2023

Get "https://[10.96.0.1]:443/api/v1/namespaces/cert-manager/pods/cert-manager-59bf757d77-kszm7": cannotconnect

The IP address is passed into linkerd-cni via the KUBERNETES_SERVICE_HOST environment variable. Just spit-balling but it looks like a pod ip address to me rather than a node network ip address.

@blabu23
Copy link
Author

blabu23 commented Mar 3, 2023

AFAIK this is the cluster internal service address of the kubernetes api... The Pods are in the network 10.244.0.0/16

@stevej
Copy link
Contributor

stevej commented Mar 6, 2023

@blabu23 does your pod have IP connectivity otherwise?

@blabu23
Copy link
Author

blabu23 commented Mar 8, 2023

Does the pod already is alive when the FailedCreatePodSandBox error is thrown?

@jeremychase jeremychase added the env/oke Oracle Container Engine for Kubernetes label Mar 9, 2023
@stevej stevej changed the title linkerd-oci does not chain correctly with OCI/OKE flannel cni linkerd-cni does not chain correctly with OCI/OKE flannel cni Mar 9, 2023
@steve-gray
Copy link
Contributor

@stevej - 10.96.0.1 is the default Kubernetes services subnet used when using an out-of-the-box OKE cluster, running with the flannel network overlay mode. It's not a pod range.

@stevej
Copy link
Contributor

stevej commented Mar 15, 2023

It's not a pod range.

Thanks for that. There's clearly some assumption we're making in linkerd-cni that's not satisfied by OKI and we suspect it's KUBERNETES_SERVICE_HOST as used in the installer script https://github.com/linkerd/linkerd2-proxy-init/blob/main/cni-plugin/deployment/scripts/install-cni.sh#L186 but we don't have any contacts at Oracle or any credits to track this down.

@steve-gray
Copy link
Contributor

Do you have a specific issue you're trying to locate @stevej or specific inputs you need to guide this? I posted #10531 that has more logging information when I was trying to use the CNI. I switched back to flannel as I was on a deadline, but I can probably task an engineer to offside for this issue, or if there's a semi-commitment to actually look at this from the Buoyant end, I'm happy to pay the cost of running an OKE cluster for you to work against to test the issue for a while.

@blabu23
Copy link
Author

blabu23 commented May 4, 2023

Anything new on this topic? Any chance I might support?

@mateiidavid
Copy link
Member

@blabu23 sorry, we haven't really made any progress here. We've put this on the backburner for now. It's a bit of a weird problem, it sounds like some networking assumptions we have made do not hold in Oracle envs. If you want to investigate and come up with a solution here, I'm happy to assist you with pointers on how our CNI plugin works.

I'd probably start checking the connection to the API server as that seems to be the culprit here. The CNI plugin seems to just error out when retrieving pods.

@stale
Copy link

stale bot commented Aug 16, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 16, 2023
@stale stale bot closed this as completed Sep 3, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug env/oke Oracle Container Engine for Kubernetes wontfix
Projects
None yet
Development

No branches or pull requests

5 participants