-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix statefulset Pod recreate case #827
Conversation
Thanks for your PR. The following commands are available:
These commands can only be run by members of the vmware-tanzu organization. |
6533e72
to
c970d3a
Compare
/test-all |
b680239
to
5c36540
Compare
/test-all |
edbdc80
to
c846ed5
Compare
/test-all |
/test-all |
AWESOME ! thanks quan ! |
is there a docker image pushed somewhere that i can test for this ? |
@jayunit100 Yes, "qtian/antrea-ubuntu:v0.8.0-dev-a7a41c3" has the commit. I'm testing it with both containerd and dockerd. Both have passed several times consistently. |
I have run "certified-conformance" with containerd 3 times and docker 1 time, all of them passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only a few nits with respect to comments. Thanks for fixing this so quickly and I like the switch to the K8s cache library for the interface store.
I'm pretty sure it is the case after reviewing the diff, but I just want to confirm / call out that:
- there is no change no the information we store in OVSDB (good thing we were storing the container ID to start with)
- there is no impact on the Antrea upgrade process: new version will be able to handle Cmd Checks and Cmd Dels for exiting Pods for which the networking was configured by a previous version of Antrea
There are a few typos in the commit message I think:
- s/graceful period/grace period
- s/Previously antrea-agent computes/Previously antrea-agent computed
- s/This patches uses/This patch uses
- s/then multiple Pods that have the name/therefore multiple Pods that have the same name
- s/always look up network config by container ID/always looks up network config by container ID
@@ -33,6 +34,10 @@ const ( | |||
|
|||
type InterfaceType uint8 | |||
|
|||
func (t InterfaceType) String() string { | |||
return strconv.Itoa(int(t)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this ever used for user-facing logs, or just for the cache index? If it is used for user-facing logs, maybe we could have a switch statement and return a more user-friendly string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only for cache index.
46f4557
to
0f8c711
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure it is the case after reviewing the diff, but I just want to confirm / call out that:
- there is no change no the information we store in OVSDB (good thing we were storing the container ID to start with)
- there is no impact on the Antrea upgrade process: new version will be able to handle Cmd Checks and Cmd Dels for exiting Pods for which the networking was configured by a previous version of Antrea
Both are correct. CmdDel and CmdCheck won't even read PodNamespace and PodName arguments and will use ContainerID to look up InterfaceCache to get the OVS port and hostIface.
Thanks a lot for your careful review. Addressed all the comments.
@@ -33,6 +34,10 @@ const ( | |||
|
|||
type InterfaceType uint8 | |||
|
|||
func (t InterfaceType) String() string { | |||
return strconv.Itoa(int(t)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only for cache index.
When deleting a StatefulSet Pod with 0 second grace period, the Pod might be removed from the Kubernetes API very quickly and a new Pod is created immediately, leading to kubelet processing the deletion of the previous Pod and the add of the new Pod simultaneously, so CNI cannot assume PodNamespace+PodName must be unique. Previously antrea-agent computed the network device name by hashing PodNamespace and PodName, the CNI DEL of the previous Pod could destroy the network device of the new Pod if it's executed after the new one's CNI ADD. To fix it, antrea-agent must no longer use PodNamespace+PodName as the identifier of the Pod's network device. This patch uses (sandbox) container ID as part of the network device name, and uses container ID as the key when caching interfaceConfig, therefore multiple Pods that have the same name will be handled individually without interfering with each other. To be compatible with network devices created by previous versions, CNI DEL handler doesn't assume how the network device name is computed and always looks up network config by container ID.
// See https://github.com/kubernetes/kubernetes/issues/57253#issuecomment-358897721. | ||
_, found := pc.ifaceStore.GetContainerInterface(podName, podNameSpace) | ||
_, found := pc.ifaceStore.GetContainerInterface(containerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that this containerID is the workload container on Windows? For I didn't see the logics to find the infraContainerID before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, createOVSPort is false for workload container, it doesn't try to create ovs port for workload container whether the port exists or not.
/test-all |
/test-networkpolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merging this for 0.7.2 release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not reviewing in time. I have no extra comments for the change, but have a question - I remember we had another approach of using Pod UID to the identifier which might be simpler, but I forgot how we get Pod UID in CNI. Probably let us discuss offline on this.
@jianjuns related to this kubernetes/kubernetes#62900 ? |
Yes I know this issue, but remember we used to have another solution to get Pod UID. Maybe I am wrong. |
Even if we are able to get Pod UID from CNI requests in future, the fact that multiple Pods using the same name may coexist at some point is still valid and the code must be aware of it. |
I meant if Agent knows about Pod UID, all inputs/code can use Pod UID as the key, and even Controller can add Pod UID to Agent messages. Then most code (except CLI/API) needs not to handle duplicated names. |
Agree, but it doesn't affect the change made in this PR: not to compute network interface name from Pod name, not to use Pod name as the primary key of config cache? |
Yes. I just meant the part of changing NetworkPolicyController etc. to update for multiple Pods with the same name. Anyway, as do not have a way to know Pod UID from CNI, probably let us keep the current approach. |
When deleting a StatefulSet Pod with 0 second grace period, the Pod
might be removed from the Kubernetes API very quickly and a new Pod is
created immediately, leading to kubelet processing the deletion of the
previous Pod and the add of the new Pod simultaneously, so CNI cannot
assume PodNamespace+PodName must be unique.
Previously antrea-agent computed the network device name by hashing
PodNamespace and PodName, the CNI DEL of the previous Pod could destroy
the network device of the new Pod if it's executed after the new one's
CNI ADD.
To fix it, antrea-agent must no longer use PodNamespace+PodName as the
identifier of the Pod's network device. This patch uses (sandbox)
container ID as part of the network device name, and uses container ID
as the key when caching interfaceConfig, therefore multiple Pods that
have the same name will be handled individually without interfering with
each other.
To be compatible with network devices created by previous versions, CNI
DEL handler doesn't assume how the network device name is computed and
always looks up network config by container ID.
For #785