-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split Control Plane from Dataplane (aka split ingress controller and NGINX) #8034
Comments
First need: write two containers manifests, one containing only ingress-controller and other containing nginx. NGINX will need some dumb-init for it, otherwise reloading/killing the process can present some challenge into the respawning monitoring. Later: ONce all is split, need to glue together nginx with controller. The simplest approach would be to share the Process from NGINX with ingress-controlelr (but not the reverse!) and check if with small changes, controller can use NGINX Pid file (maybe shared as well?) to do the process operations |
Other thing we need to take care off: SSLPassthrough process today runs inside controller process. With the same rationale of "control plane does not receive any users traffic" we should probably think about sending the SSLPassthrough process to the proxy container. Maybe can be the same container running nginx, and being also the container "controlling" nginx |
This is valuable. I consider whether the model can be simplified? Use NGINX as a stateless container. If it has a problem, just restart it. Use the control plane to write the state to the data plane. |
@rikatz have you thought of taking a smaller step first and extracting only API interaction logic of controller first? Here's what I have in mind:
|
From what is discussed, I see the goal is to just split nginx and the controller process into 2 containers within the same pod. However from a scaling front, it makes more sense to have a few controller pods and lots of nginx pods to handle traffic. On that front I like the gRPC idea proposed above, it creates an interface between the control plane and data plane which can, as the implementation evolves, be co-located for now but eventually start to separate out. |
I LOVE the idea of gRPC central control plane and the dataplane subscribing it. I actually discussed this approach with James and some other folks past in time, that this was going even to be a way to make it easier to implement gateway api, for example. I just don't know where to start, but I can try (or are you willing to look into that? 🤔 ) I was just thinking on a way to make it a simple change (like share PID, issue reloads) but maybe creating a GRPC Stream endpoint and moving all the logics below syncIngress are actually better indeed. |
I'm wondering if we should open a branch in ingress-nginx for this and work on that. BTW there is a previous art in kpng (https://github.com/kubernetes-sigs/kpng) doing the same thing for kube-proxy. I discussed with folks on past (👋 @jayunit100 @mcluseau ) and Jay actually asked if the whole Control Plane for GatewayAPI'ish shouldn't be like a kpng layer7 controller. Right now, I think we can make it easier just splitting our logics the way @ElvinEfendi suggested, and the main controller of dataplane signing into the grpc endpoint from control plane but who knows the future and what lessons we can get from it :) |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
In principle I like @ElvinEfendi's idea. It achieves the most important goal, namely the data plane has no access to the Kubernetes API and to the control plane's service account token. Also there's no reason the data plane needs to be only an nginx process. It can readily have a managament process that does some of the things the controller does today, as long as that doesn't require Kubernetes API access. However, if the control plane provides a gRPC API that the data plane containers/Pods access, then that might still provide a way for an adversary to corrupt the control plane. I'm not an expert at gRPC, so please prove me wrong, but I can imagine something like the following is possible:
Maybe something like this can be prevented by using an architecture where data can only be sent from the control plane to the data plane but not vice versa. |
/lifecycle frozen |
About RCEs, I agree partially, but I think if we have a problem that there is an RCE in Nginx (doable, due to all the leaks and template sanitization we have) + gRPC we may have a much bigger problem. Also, as soon as we split all of this I want to make sure control plane is distroless and just the cp binary runs there. Finally, I was thinking also that we still need some counter measures and architecture definitions:
I will keep posting updates and discussions here |
I am thinking again on it based on some recent discussions with @strongjz and the new sidecar container feature (which we shouldn't rely yet as a lot of users already use old versions), and based on some discussions on twitter:
|
I think we can discuss again the architecture we expect.
|
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
do you have any updates on this? |
Due to recent CVEs, we started to discuss but never registered properly the need to split the ingress-controller process from NGINX process.
Here is some rationale on this:
While writing the template file is just a matter of a shared volume (empty volume, maybe) between both containers, the process of starting/stoping/monitoring is going to be a bit challenging.
I will use this issue as a side note for implementations attempts, and ideas are welcome!
/kind feature
/priority important-longterm
/triage accepted
The text was updated successfully, but these errors were encountered: