Fix statefulset Pod recreate case #827

tnqn · 2020-06-11T08:49:19Z

When deleting a StatefulSet Pod with 0 second grace period, the Pod
might be removed from the Kubernetes API very quickly and a new Pod is
created immediately, leading to kubelet processing the deletion of the
previous Pod and the add of the new Pod simultaneously, so CNI cannot
assume PodNamespace+PodName must be unique.

Previously antrea-agent computed the network device name by hashing
PodNamespace and PodName, the CNI DEL of the previous Pod could destroy
the network device of the new Pod if it's executed after the new one's
CNI ADD.

To fix it, antrea-agent must no longer use PodNamespace+PodName as the
identifier of the Pod's network device. This patch uses (sandbox)
container ID as part of the network device name, and uses container ID
as the key when caching interfaceConfig, therefore multiple Pods that
have the same name will be handled individually without interfering with
each other.

To be compatible with network devices created by previous versions, CNI
DEL handler doesn't assume how the network device name is computed and
always looks up network config by container ID.

For #785

antrea-bot · 2020-06-11T08:49:34Z

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

/test-e2e: to trigger e2e tests.
/skip-e2e: to skip e2e tests.
/test-conformance: to trigger conformance tests.
/skip-conformance: to skip conformance tests.
/test-networkpolicy: to trigger networkpolicy tests.
/skip-networkpolicy: to skip networkpolicy tests.
/test-windows-conformance: to trigger windows conformance tests.
/skip-windows-conformance: to skip windows conformance tests.
/test-all: to trigger all tests.
/skip-all: to skip all tests.

These commands can only be run by members of the vmware-tanzu organization.

pkg/agent/cniserver/interface_configuration_windows.go

tnqn · 2020-06-11T11:36:32Z

/test-all

tnqn · 2020-06-11T13:52:08Z

/test-all

tnqn · 2020-06-11T15:27:08Z

/test-all

tnqn · 2020-06-11T15:38:05Z

/test-all

jayunit100 · 2020-06-11T16:03:56Z

AWESOME ! thanks quan !

jayunit100 · 2020-06-11T16:04:33Z

is there a docker image pushed somewhere that i can test for this ?

tnqn · 2020-06-11T16:09:51Z

is there a docker image pushed somewhere that i can test for this ?

@jayunit100 Yes, "qtian/antrea-ubuntu:v0.8.0-dev-a7a41c3" has the commit. I'm testing it with both containerd and dockerd. Both have passed several times consistently.
It would be great if you can try it on your cluster in case of any configuration difference that could affect the result.

tnqn · 2020-06-11T17:11:29Z

I have run "certified-conformance" with containerd 3 times and docker 1 time, all of them passed.
@wenyingd helped verify all sig-windows tests passed.

antoninbas

LGTM, only a few nits with respect to comments. Thanks for fixing this so quickly and I like the switch to the K8s cache library for the interface store.

I'm pretty sure it is the case after reviewing the diff, but I just want to confirm / call out that:

there is no change no the information we store in OVSDB (good thing we were storing the container ID to start with)
there is no impact on the Antrea upgrade process: new version will be able to handle Cmd Checks and Cmd Dels for exiting Pods for which the networking was configured by a previous version of Antrea

There are a few typos in the commit message I think:

s/graceful period/grace period
s/Previously antrea-agent computes/Previously antrea-agent computed
s/This patches uses/This patch uses
s/then multiple Pods that have the name/therefore multiple Pods that have the same name
s/always look up network config by container ID/always looks up network config by container ID

pkg/agent/cniserver/server_windows.go

pkg/agent/interfacestore/interface_cache.go

pkg/agent/util/net.go

pkg/agent/cniserver/pod_configuration.go

antoninbas · 2020-06-11T18:41:18Z

pkg/agent/interfacestore/types.go

@@ -33,6 +34,10 @@ const (

 type InterfaceType uint8

+func (t InterfaceType) String() string {
+	return strconv.Itoa(int(t))


is this ever used for user-facing logs, or just for the cache index? If it is used for user-facing logs, maybe we could have a switch statement and return a more user-friendly string?

Currently only for cache index.

tnqn

I'm pretty sure it is the case after reviewing the diff, but I just want to confirm / call out that:

there is no change no the information we store in OVSDB (good thing we were storing the container ID to start with)

there is no impact on the Antrea upgrade process: new version will be able to handle Cmd Checks and Cmd Dels for exiting Pods for which the networking was configured by a previous version of Antrea

Both are correct. CmdDel and CmdCheck won't even read PodNamespace and PodName arguments and will use ContainerID to look up InterfaceCache to get the OVS port and hostIface.

Thanks a lot for your careful review. Addressed all the comments.

pkg/agent/cniserver/server_windows.go

pkg/agent/interfacestore/interface_cache.go

tnqn · 2020-06-15T03:24:49Z

pkg/agent/interfacestore/types.go

@@ -33,6 +34,10 @@ const (

 type InterfaceType uint8

+func (t InterfaceType) String() string {
+	return strconv.Itoa(int(t))


Currently only for cache index.

pkg/agent/util/net.go

When deleting a StatefulSet Pod with 0 second grace period, the Pod might be removed from the Kubernetes API very quickly and a new Pod is created immediately, leading to kubelet processing the deletion of the previous Pod and the add of the new Pod simultaneously, so CNI cannot assume PodNamespace+PodName must be unique. Previously antrea-agent computed the network device name by hashing PodNamespace and PodName, the CNI DEL of the previous Pod could destroy the network device of the new Pod if it's executed after the new one's CNI ADD. To fix it, antrea-agent must no longer use PodNamespace+PodName as the identifier of the Pod's network device. This patch uses (sandbox) container ID as part of the network device name, and uses container ID as the key when caching interfaceConfig, therefore multiple Pods that have the same name will be handled individually without interfering with each other. To be compatible with network devices created by previous versions, CNI DEL handler doesn't assume how the network device name is computed and always looks up network config by container ID.

wenyingd · 2020-06-15T05:19:10Z

pkg/agent/cniserver/pod_configuration.go

 	// See https://github.com/kubernetes/kubernetes/issues/57253#issuecomment-358897721.
-	_, found := pc.ifaceStore.GetContainerInterface(podName, podNameSpace)
+	_, found := pc.ifaceStore.GetContainerInterface(containerID)


Is it possible that this containerID is the workload container on Windows? For I didn't see the logics to find the infraContainerID before.

No, createOVSPort is false for workload container, it doesn't try to create ovs port for workload container whether the port exists or not.

tnqn · 2020-06-15T06:18:43Z

/test-all

tnqn · 2020-06-15T07:51:05Z

/test-networkpolicy

antoninbas

LGTM

antoninbas · 2020-06-15T18:07:37Z

Merging this for 0.7.2 release

jianjuns

Sorry for not reviewing in time. I have no extra comments for the change, but have a question - I remember we had another approach of using Pod UID to the identifier which might be simpler, but I forgot how we get Pod UID in CNI. Probably let us discuss offline on this.

antoninbas · 2020-06-15T18:52:40Z

@jianjuns related to this kubernetes/kubernetes#62900 ?

jianjuns · 2020-06-15T18:55:12Z

@jianjuns related to this kubernetes/kubernetes#62900 ?

Yes I know this issue, but remember we used to have another solution to get Pod UID. Maybe I am wrong.
BTW, my main concern for this change is that going forward many features and more code might need to know about the possibility of multiple Pods using the same name (though seems the code change is simpler than I thought).

tnqn · 2020-06-16T07:30:15Z

BTW, my main concern for this change is that going forward many features and more code might need to know about the possibility of multiple Pods using the same name (though seems the code change is simpler than I thought).

Even if we are able to get Pod UID from CNI requests in future, the fact that multiple Pods using the same name may coexist at some point is still valid and the code must be aware of it.

jianjuns · 2020-06-16T15:45:33Z

Even if we are able to get Pod UID from CNI requests in future, the fact that multiple Pods using the same name may coexist at some point is still valid and the code must be aware of it.

I meant if Agent knows about Pod UID, all inputs/code can use Pod UID as the key, and even Controller can add Pod UID to Agent messages. Then most code (except CLI/API) needs not to handle duplicated names.

tnqn · 2020-06-16T15:50:53Z

Even if we are able to get Pod UID from CNI requests in future, the fact that multiple Pods using the same name may coexist at some point is still valid and the code must be aware of it.

I meant if Agent knows about Pod UID, all inputs/code can use Pod UID as the key, and even Controller can add Pod UID to Agent messages. Then most code (except CLI/API) needs not to handle duplicated names.

Agree, but it doesn't affect the change made in this PR: not to compute network interface name from Pod name, not to use Pod name as the primary key of config cache?

jianjuns · 2020-06-16T15:53:22Z

Yes. I just meant the part of changing NetworkPolicyController etc. to update for multiple Pods with the same name. Anyway, as do not have a way to know Pod UID from CNI, probably let us keep the current approach.

vmwclabot added the cla-not-required label Jun 11, 2020

tnqn force-pushed the ss-race branch 5 times, most recently from 6533e72 to c970d3a Compare June 11, 2020 10:09

wenyingd reviewed Jun 11, 2020

View reviewed changes

pkg/agent/cniserver/interface_configuration_windows.go Outdated Show resolved Hide resolved

pkg/agent/cniserver/interface_configuration_windows.go Outdated Show resolved Hide resolved

tnqn force-pushed the ss-race branch from c970d3a to c8e51f2 Compare June 11, 2020 11:35

tnqn force-pushed the ss-race branch 3 times, most recently from b680239 to 5c36540 Compare June 11, 2020 13:51

tnqn force-pushed the ss-race branch 2 times, most recently from edbdc80 to c846ed5 Compare June 11, 2020 15:26

tnqn changed the title ~~WIP: Fix statefulset Pod recreate case~~ Fix statefulset Pod recreate case Jun 11, 2020

tnqn requested review from jianjuns, wenyingd and antoninbas June 11, 2020 15:27

tnqn force-pushed the ss-race branch from c846ed5 to a7a41c3 Compare June 11, 2020 15:36

jayunit100 mentioned this pull request Jun 11, 2020

[sig-apps] Conformance tests being skipped. #785

Closed

antoninbas reviewed Jun 11, 2020

View reviewed changes

tnqn mentioned this pull request Jun 15, 2020

Fix IP not released after restarting antrea-agent #828

Merged

tnqn force-pushed the ss-race branch 2 times, most recently from 46f4557 to 0f8c711 Compare June 15, 2020 04:30

tnqn commented Jun 15, 2020

View reviewed changes

tnqn force-pushed the ss-race branch from 0f8c711 to ab574bb Compare June 15, 2020 04:39

wenyingd reviewed Jun 15, 2020

View reviewed changes

antoninbas approved these changes Jun 15, 2020

View reviewed changes

antoninbas merged commit e77a28a into antrea-io:master Jun 15, 2020

jianjuns reviewed Jun 15, 2020

View reviewed changes

jianjuns mentioned this pull request Jun 15, 2020

Create and mainatain connection store for Antrea flow exporter #773

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix statefulset Pod recreate case #827

Fix statefulset Pod recreate case #827

tnqn commented Jun 11, 2020 •

edited

Loading

antrea-bot commented Jun 11, 2020

tnqn commented Jun 11, 2020

tnqn commented Jun 11, 2020

tnqn commented Jun 11, 2020

tnqn commented Jun 11, 2020

jayunit100 commented Jun 11, 2020

jayunit100 commented Jun 11, 2020

tnqn commented Jun 11, 2020

tnqn commented Jun 11, 2020

antoninbas left a comment

antoninbas Jun 11, 2020

tnqn Jun 15, 2020

tnqn left a comment •

edited

Loading

tnqn Jun 15, 2020

wenyingd Jun 15, 2020

tnqn Jun 15, 2020

tnqn commented Jun 15, 2020

tnqn commented Jun 15, 2020

antoninbas left a comment

antoninbas commented Jun 15, 2020

jianjuns left a comment

antoninbas commented Jun 15, 2020

jianjuns commented Jun 15, 2020

tnqn commented Jun 16, 2020

jianjuns commented Jun 16, 2020

tnqn commented Jun 16, 2020

jianjuns commented Jun 16, 2020

Fix statefulset Pod recreate case #827

Fix statefulset Pod recreate case #827

Conversation

tnqn commented Jun 11, 2020 • edited Loading

antrea-bot commented Jun 11, 2020

tnqn commented Jun 11, 2020

tnqn commented Jun 11, 2020

tnqn commented Jun 11, 2020

tnqn commented Jun 11, 2020

jayunit100 commented Jun 11, 2020

jayunit100 commented Jun 11, 2020

tnqn commented Jun 11, 2020

tnqn commented Jun 11, 2020

antoninbas left a comment

Choose a reason for hiding this comment

antoninbas Jun 11, 2020

Choose a reason for hiding this comment

tnqn Jun 15, 2020

Choose a reason for hiding this comment

tnqn left a comment • edited Loading

Choose a reason for hiding this comment

tnqn Jun 15, 2020

Choose a reason for hiding this comment

wenyingd Jun 15, 2020

Choose a reason for hiding this comment

tnqn Jun 15, 2020

Choose a reason for hiding this comment

tnqn commented Jun 15, 2020

tnqn commented Jun 15, 2020

antoninbas left a comment

Choose a reason for hiding this comment

antoninbas commented Jun 15, 2020

jianjuns left a comment

Choose a reason for hiding this comment

antoninbas commented Jun 15, 2020

jianjuns commented Jun 15, 2020

tnqn commented Jun 16, 2020

jianjuns commented Jun 16, 2020

tnqn commented Jun 16, 2020

jianjuns commented Jun 16, 2020

tnqn commented Jun 11, 2020 •

edited

Loading

tnqn left a comment •

edited

Loading