-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Pod network after realizing initial NetworkPolicies #5777
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,7 @@ import ( | |
"antrea.io/antrea/pkg/ovs/ovsconfig" | ||
"antrea.io/antrea/pkg/util/channel" | ||
"antrea.io/antrea/pkg/util/k8s" | ||
"antrea.io/antrea/pkg/util/wait" | ||
) | ||
|
||
type vethPair struct { | ||
|
@@ -413,7 +414,7 @@ func parsePrevResult(conf *types.NetworkConfig) error { | |
return nil | ||
} | ||
|
||
func (pc *podConfigurator) reconcile(pods []corev1.Pod, containerAccess *containerAccessArbitrator) error { | ||
func (pc *podConfigurator) reconcile(pods []corev1.Pod, containerAccess *containerAccessArbitrator, podNetworkWait *wait.Group) error { | ||
// desiredPods is the set of Pods that should be present, based on the | ||
// current list of Pods got from the Kubernetes API. | ||
desiredPods := sets.New[string]() | ||
|
@@ -445,21 +446,34 @@ func (pc *podConfigurator) reconcile(pods []corev1.Pod, containerAccess *contain | |
missingIfConfigs = append(missingIfConfigs, containerConfig) | ||
continue | ||
} | ||
// This interface matches an existing Pod. | ||
// We rely on the interface cache / store - which is initialized from the persistent | ||
// OVSDB - to map the Pod to its interface configuration. The interface | ||
// configuration includes the parameters we need to replay the flows. | ||
klog.V(4).InfoS("Syncing Pod interface", "Pod", klog.KRef(namespace, name), "iface", containerConfig.InterfaceName) | ||
if err := pc.ofClient.InstallPodFlows( | ||
containerConfig.InterfaceName, | ||
containerConfig.IPs, | ||
containerConfig.MAC, | ||
uint32(containerConfig.OFPort), | ||
containerConfig.VLANID, | ||
nil, | ||
); err != nil { | ||
klog.ErrorS(err, "Error when re-installing flows for Pod", "Pod", klog.KRef(namespace, name)) | ||
} | ||
go func(containerID, pod, namespace string) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. before there was a guarantee that the CNI server would not be started until reconciliation was complete (i.e., until this function returned). I assume that since the only thing that happens asynchronously is installation of Pod flows, a race condition should not be possible? |
||
// Do not install Pod flows until all preconditions are met. | ||
podNetworkWait.Wait() | ||
// To avoid race condition with CNIServer CNI event handlers. | ||
containerAccess.lockContainer(containerID) | ||
defer containerAccess.unlockContainer(containerID) | ||
|
||
containerConfig, exists := pc.ifaceStore.GetContainerInterface(containerID) | ||
if !exists { | ||
klog.InfoS("The container interface had been deleted, skip installing flows for Pod", "Pod", klog.KRef(namespace, name), "containerID", containerID) | ||
return | ||
} | ||
Comment on lines
+456
to
+460
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. related to my comment above, I assume this will take care of the case where a CNI DEL comes in before we have a chance to do reconciliation for this Pod? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If Pod IP or ofPort is reused, there can be an issue too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for ofport: wouldn't this require a situation where 1) the port has been released (deleted from OVSDB), yet 2) the interface configuration is still present in the ifaceStore? I don't know if that's possible. That would require a CNI DEL with a successful for the IP address: I think it's theoretically possible. The host-ipam release happens before OVS port deletion and
I guess this situation does not arise when reconciliation runs to completion before the CNI server starts. What do you think @tnqn ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, the lock and the check are to prevent race with CNI Del handler.
ofport reuse is not a problem as @antoninbas explained. The situation @antoninbas described for Pod IP reuse is possible, however, not specific to this case if that could happen. Even when processing CNI events, a similar situation can happen:
To avoid it, perhaps we could use a Pod-specific cookie ID to prevent removing Pod flows of Pod A from affecting Pod B even they share same IP. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this situation can already occur before that change, then there is no need to address it in this PR.
That sounds fine. BTW, would it help to just release the IP address last in the delete sequence, or do we expose ourselves to other problems in this case (e.g., easier to run out of IPs when we have a lot of Pods on a Node, > 200, and a lot of churn)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. Agree we need not to resolve all existing issues in this PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Great suggestion. I think it's better because ovs flows references to Pod IPs, deleting references before deleting the referenced resource makes more sense and avoids the above issue. Even if removing OVS resource succeeds and releasing Pod IP fails, the IP and the ofport has been decoupled, there is no problem to reuse the ofport with another Pod IP, and eventually container runtime should keep calling CNI to release the IP, or the startup cleanup will do it when container runtime behaves wrongly. I could create another PR for this improvement. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #5788 implements the suggestion. |
||
// This interface matches an existing Pod. | ||
// We rely on the interface cache / store - which is initialized from the persistent | ||
// OVSDB - to map the Pod to its interface configuration. The interface | ||
// configuration includes the parameters we need to replay the flows. | ||
klog.V(4).InfoS("Syncing Pod interface", "Pod", klog.KRef(namespace, name), "iface", containerConfig.InterfaceName) | ||
if err := pc.ofClient.InstallPodFlows( | ||
containerConfig.InterfaceName, | ||
containerConfig.IPs, | ||
containerConfig.MAC, | ||
uint32(containerConfig.OFPort), | ||
containerConfig.VLANID, | ||
nil, | ||
); err != nil { | ||
klog.ErrorS(err, "Error when re-installing flows for Pod", "Pod", klog.KRef(namespace, name)) | ||
} | ||
}(containerConfig.ContainerID, name, namespace) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think having several goroutines instead of a single goroutine taking care of the Pods can help reduce initialization time if we have 100+ Pods on the Node? Or would that be insignificant? I am just wondering if this the reason why you went this route. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's just for simple given goroutine is lightweight, otherwise it would need to creata another slice and iterate it. |
||
} else { | ||
// clean-up and delete interface | ||
klog.V(4).InfoS("Deleting interface", "Pod", klog.KRef(namespace, name), "iface", containerConfig.InterfaceName) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remind me when we flush existing flows in OVS after agent restart? Is that before or after this func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are asynchonous, likely after this func in practice:
antrea/pkg/agent/agent.go
Lines 568 to 580 in 4ba9451
We may synchronize them using another WaitGroup, but I feel keeping existing flows until new flows are installed doesn't seem working as expected because it has removed all groups and meters upon initialziation.