-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use Linkerd-CNI #7945
Comments
Hey @BobyMCbobs, thanks for raising this. Is If you find it odd for Talos not to have For reference, I found this and associated issues: siderolabs/talos#4194, might be worth having a look through them? |
Hey @mateiidavid, Thank you for your reply! May I please have a link to how the CNI plugin uses |
|
Thank you, @olix0r! |
@olix0r, why is I'm taking a look that the implementation, to expand on what you said: Is it correct that the CNI plugin uses https://github.com/linkerd/linkerd2/blob/main/cni-plugin/main.go#L241 -> https://github.com/linkerd/linkerd2-proxy-init/blob/a556ca400132106db279ce8c3a79003a766bf707/iptables/iptables.go#L212-L228 |
Hey @BobyMCbobs, this is how I understand things. When a pod is scheduled on a node, the container runtime (CRI) is responsible for creating, starting and stopping the pod. After a pod is first created (i.e the CRI creates its sandbox -- in other words linux namespace -- and network namespace), its networking stack has to be created. For a pod to accept and send traffic without NAT, it needs to communicate with the host through a veth interface and get an IP address assigned to it. The CNI does all of this, and more. A networking stack (or simply put network ns in our case) is first created as a blank canvas, there are no routes, no rules, no devices, they're all added in by the different plugins. CNIs simly configure everything. In our case, we need to set up iptables and to do it, we need to enter the network namespace of the pod that's just been created. If we simply execute iptables commands without entering the namespace, they'll be applied to the host. So, to kind of directly answer the question, without using Now, on to the solution: I think we'd be open to bundling linkerd2/cni-plugin/Dockerfile Lines 23 to 26 in a98b722
There's no easy for us to test this solution with Talos so we'd need some additional help here, which is why we'd appreciate it a lot if you could contribute :D To test, we could do the following:
Wdyt? |
Hey @mateiidavid,
I gave this a go, and there doesn't appear to be any difference.
I know for sure that if it were possible to have just the binary of the cni contain everything it needs, it would for sure work what ever the environment. I'll keep look around for what's possible. Keen to have this work! I'm more than happy to contribute what I can! |
No probs :) to be clear, the way I understand CNIs: the plugin is a binary on the host that gets called by the kubelet, in that sense, our plugin will also call the iptables binary on the host, it doesn't do it in a container. We need to run it in the pod's network namespace though, which is different. I guess that's why the initial solution didn't really work. You're right that packaging it with the container won't work (unless we copy the binary on the host). Cilium might have a different use case for For us though, the usecase for iptables is different. We want to make sure that we set up routing rules for each pod's network ns in such a way that allows the proxy to take over packets -- we do not want, however, the host to have the same config -- running in the same network ns as the pods is a bit of a necessity afaik (and as far as I can tell). We can programatically enter the namespace, as opposed to using the Hm, with all this being said, not sure what we can do as a solution here. Our container that runs as an agent on the host is basically a bash script that copies over the plugin binary in the right location (and creates a network config file). Wonder if there's anything we can do in the install script 🤔 |
assuming that the linkerd pod runs with |
@frezbo that's true, the CNI binary itself could enter the namespace programatically, however, there are two points to consider here:
Does this make sense and line up with what you know about the space? We'd still be very happy to fix this. |
On Go and Linux namespaces: this should not be a problem anymore with Go, e.g. it's possible to switch to some network namespace and perform actions:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
We also ran into this issue (with talos). Any chance that changing to use go for switching namespace #7945 (comment) would solve this issue? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
seems to be the same issue as my problem described here: #10413 |
For the record, yup, we are looking into this... |
Any updates on this topic? |
@BobyMCbobs @NigelVanHattum @Cowboy-coder Talos supports system extensions and in fact maintains an official extension for Once the PR is merged, simply using the extension should resolve this issue. |
@BobyMCbobs @NigelVanHattum @Cowboy-coder Update: the PR for including |
Hey all, just wondering if any headway has been made here. I'm running Talos 1.7.4 with the
With the last line seemingly the biggest hint. I'm at a loss as to how to proceed here, and I haven't found a single thing anywhere explaining how I can get linkerd running on Talos. |
So I discovered that my problem was actually that I was using cilium and had set "cni.exclusive=false" in the helm chart install for that. This caused any attempted use of linkerd-cni to fail. As soon as I set that flag to true, linkerd-cni works in Talos as expected. |
@BobyMCbobs with this verification, would you mind terribly marking this issue as resolved? |
@djryanj Sorry you ran into that. We are learning about that (bizarre) flag for Cilium ourselves. We JUST merged a docs PR that mentions this so hopefully future Cilium + Linkerd + CNI users will be able to avoid the issue. linkerd/website#1794 |
Closing this out as there have been docs fixes for this issue. |
What is the issue?
When Linkerd is installed with CNI enabled, Pod sandboxes fail to create.
How can it be reproduced?
Logs, error output, etc
output of
linkerd check -o short
Environment
Possible solution
No response
Additional context
Using Cilium as the CNI. Using Flannel makes no difference.
This happens both on amd64 in a VM and arm64 on Raspberry Pis.
My goal is to improve app start time by using the CNI plugin instead of the init containers.
If I run
the CNI isn't used, and Linkerd Pods return back healthly.
Would you like to work on fixing this bug?
No response
The text was updated successfully, but these errors were encountered: