Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 30 sec cooling for IP address reassignment #62

Merged
merged 1 commit into from
May 8, 2018

Conversation

liwenwu-amazon
Copy link
Contributor

Issue #59:
Pods stuck in ContainerCreating due to CNI Failing to Assing IP to Container Until aws-node is deleted*

Description of changes:
When an IP is returned to IP WarmPool, keep it there for 30 sec before reassigning to a new Pod. This was planned in the original proposal.

Tests Performed:
performs back-to-back pods create and pods delete continuously. Without fix, the issue can be reproduced easily uses "heavy pods"(e.g. from StatefulSet). With the fix, the issue is no-longer reproducible

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@@ -213,7 +214,7 @@ func (ds *DataStore) assignPodIPv4AddressUnsafe(k8sPod *k8sapi.K8SPodInfo) (stri
addr.address, k8sPod.Name, k8sPod.Namespace)
return addr.address, eni.DeviceNumber, nil
}
if !addr.Assigned && k8sPod.IP == "" {
if !addr.Assigned && k8sPod.IP == "" && curTime.Sub(addr.unAssignedTime) > 30*time.Second {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this time related to the addressCoolingPeriod const?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on Matt's question. The values seem related, though the defined constant seems to be used to guard when it's ok to free an ENI. If they are coupled, reuse the defined constant. If not, make another named constant instead of the inline 30*time.Second.

I note that addr.unAssignedTime may be nil. In this case Sub appears to compute a duration from epoch, but this is accidental, not documented behavior. It would be safer for code to handle the nil case explicitly, to protect for this undocumented behavior changing. I also note the time package has a Since function that would simplify the expression to Since(addr.unAssignedTime) > addressCoolingPeriod and save having to set curTime above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon deeper look, time.Time is struct, only one element of which can be nil. The Zero value of time is defined as the unix epoch, so the behavior is not undocumented. I still advocate a named constant, commentary, and use of Since in preference to now.Sub(prior).

Copy link
Contributor

@dbenhur dbenhur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be nice to explain in a comment why ipaddrs and enis need a cooling period, preferably near where you define the cooling duration constant(s). The comment can be brief and link to a design doc or issue for depth.

@@ -213,7 +214,7 @@ func (ds *DataStore) assignPodIPv4AddressUnsafe(k8sPod *k8sapi.K8SPodInfo) (stri
addr.address, k8sPod.Name, k8sPod.Namespace)
return addr.address, eni.DeviceNumber, nil
}
if !addr.Assigned && k8sPod.IP == "" {
if !addr.Assigned && k8sPod.IP == "" && curTime.Sub(addr.unAssignedTime) > 30*time.Second {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on Matt's question. The values seem related, though the defined constant seems to be used to guard when it's ok to free an ENI. If they are coupled, reuse the defined constant. If not, make another named constant instead of the inline 30*time.Second.

I note that addr.unAssignedTime may be nil. In this case Sub appears to compute a duration from epoch, but this is accidental, not documented behavior. It would be safer for code to handle the nil case explicitly, to protect for this undocumented behavior changing. I also note the time package has a Since function that would simplify the expression to Since(addr.unAssignedTime) > addressCoolingPeriod and save having to set curTime above.

@liwenwu-amazon
Copy link
Contributor Author

The cooling period (design-proposal ) is used to avoid an IP address being reassigned to a different Pod within cooling period.

Copy link
Contributor

@dbenhur dbenhur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to ship. But I'm confused why ENI cool period should be different from address cool period. I'd prefer the more concise time differential expression I suggested.


// addressCoolingPeriod is used to ensure an IP not get assigned to a Pod if this IP is used by a different Pod
// in addressCoolingPeriod
addressCoolingPeriod = 30 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these two cooling delays different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressCoolingPeriod is used to ensure an IP not get assigned to another Pod within addressCoolingPeriod. It impacts how soon the new Pods assigned to the node can re-use those IP addresses. For examples, a replica scale down which deletes all Pods on the node, and then immediately followed by a replica scale up that allocates all new Pods to the node. The addressCoolingPeriod can impact how soon these new Pods get their addresses. addressENICoolingPeriod is used to ensure ENI will not get freed back to EC2 control plane within addressENICoolingPeriod. ENI allocation/free are expensive operations. So, the timer is longer and we planning to add this a configurable parameter in future. For example, in some cases, ENI are never freed.

@@ -213,7 +220,7 @@ func (ds *DataStore) assignPodIPv4AddressUnsafe(k8sPod *k8sapi.K8SPodInfo) (stri
addr.address, k8sPod.Name, k8sPod.Namespace)
return addr.address, eni.DeviceNumber, nil
}
if !addr.Assigned && k8sPod.IP == "" {
if !addr.Assigned && k8sPod.IP == "" && curTime.Sub(addr.unAssignedTime) > addressCoolingPeriod {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since is more concise and simpler to understand than Sub(priorTime)

@@ -248,7 +255,7 @@ func (ds *DataStore) getDeletableENI() *ENIIPPool {
continue
}

if time.Now().Sub(eni.lastUnAssignedTime) < addressCoolingPeriod {
if time.Now().Sub(eni.lastUnAssignedTime) < addressENICoolingPeriod {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. Since is more concise and simpler to understand than Sub(priorTime)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants