-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Egress NetworkPolicy are not enforced fast enough #197
Comments
Updating my plan to fix this: We can rely on the CNI ADD event to trigger the enforcement earlier, as the required data for policy enforcement (IP or OF port) would be ready after CNI ADD is processed. In detail, we can use a channel, to which the CNIServer would send an added pod upon CNI ADD, from which the PolicyController would get an added pod and trigger processing all rules it would affect. Of course, we should add the Pod to AppliedToGroup as long as it's scheduled to a Node, instead of waiting for it getting an IP. This approach is asynchronous but should be much faster than kubelet receiving CNI response and creating the workload containers and then starting them. I'm working on the code, expecting to finish and test it by EOW. |
Thanks Quan, this sounds like something we can add to the 0.2.0 release. For the longer term however, is this good enough or do we want something more "synchronous"? Should we instead try not to enable Pod networking until all "current" network policies (by "current" I mean network policies which predate the creation of the Pod by a non trivial amount of time)? Ideally maybe we should not enable forwarding for the Pod until we have a chance to enforce known network policies. It seems to me that the scope of such a change would not be much larger than what you are proposing. |
To clarify, my proposal is the following: do not install the Pod flows (i.e. hold off on calling |
If the cause of this issue is that the pod might be in running state before the controller gets the notification and agent creates flows for the pod, the same should be true for ingress traffic too correct? Why would egress policy only have this effect? |
@antoninbas thanks for the suggestion, I actually thought the same as you and mentioned to @jianjuns.
More importantly, I realized asynchronous is very simple and safe enough to solve this issue: what the asynchronous approach does is just changing the trigger of rule processing from Pod status update in kube-apiserver to its internal CNI ADD event, nothing else changes. The time of receiving an item from a go channel and processing its rules in parallel is much shorter than the time of kubelet receiving CNI response from a grpc channel and then creating the real workload container and starting it. Logs of antrea-agent:
Logs of kubelet:
The related NetworkPolicies passed after applying the patch (actually user won't encounter the issue unless they design the scenario like the K8s tests by intention), I'm improving it and can push for review tomorrow. |
The ingress test would wait for server Pod's IP and create a client Pod to test it, if the testing code knows the Pod's IP, antrea knows it too and would already enforced the policy. |
Describe the bug
Create an egress NetworkPolicy that block all outgoing traffic, and create a Pod to which the policy applies. The Pod can reach outside in a short time slot before the NetworkPolicy is enforced.
To Reproduce
It caused the following K8s NetworkPolicy e2e tests fail:
Expected
If the policy is created before the Pod, the Pod should be limited by the policy in its whole lifycycle.
Actual behavior
If the policy is created before the Pod, the Pod is not limited by the policy in a short slot.
Versions:
Please provide the following information:
0.1.1
kubectl version
). If your Kubernetes components have different versions, please provide the version for all of them.uname -r
).modinfo openvswitch
for the Kubernetes Nodes.Additional context
Add any other context about the problem here, such as Antrea logs, kubelet logs, etc.
(Please consider pasting long output into a GitHub gist or any other pastebin.)
The text was updated successfully, but these errors were encountered: