Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CNI crashing when there is no available IP addresses. #1499

Merged
merged 3 commits into from
Jun 9, 2021

Conversation

M00nF1sh
Copy link
Contributor

@M00nF1sh M00nF1sh commented Jun 9, 2021

Currently when CNI is invoked to AddNetwork while there is no available IP address in IPAMD, CNI crashes.
It happens due to two bug below:

  1. when IPAMD returns "no ip available error", the err variable get overwritten to nil when get VPCCIDRs. Thus we are returning a success response with empty IPv4Address. When CNI tries to setup veth-pair with empty IPv4Address, it fails and we invokes DelNetwork, and triggers the second bug below
  2. when DelNetwork returns err(due to the pod sandbox is not found, which is expected), access r.Success will cause nil-pointer exception and crash CNI

What type of PR is this?

Which issue does this PR fix:

What does this PR do / Why do we need it:

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:

Testing done on this change:

Tested with 1.19 cluster:
{"level":"warn","ts":"2021-06-09T20:54:14.695Z","caller":"rpc/rpc.pb.go:501","msg":"Send AddNetworkReply: unable to assign IPv4 address for pod, err: assignPodIPv4AddressUnsafe: no available IP addresses"}

Automation added to e2e:

Will this break upgrades or downgrades. Has updating a running cluster been tested?:

Does this change require updates to the CNI daemonset config files to work?:

Does this PR introduce any user-facing change?:


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

1. when IPAMD returns "no ip available error", the err variable get overwritten to nil when get VPCCIDRs
2. when DelNetwork returns err, the r.Success will cause nil-pointer exception and crash CNI
}

if !r.Success {
} else if !r.Success {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We will end up here only if delErr is nil right? So, don't see any value printing delErr in the below error message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why not extend this to scenarios where add fails as well.

https://github.com/aws/amazon-vpc-cni-k8s/blob/master/cmd/routed-eni-cni-plugin/cni.go#L164

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achevuru because when add fails, we always return within err != nil check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We will end up here only if delErr is nil right? So, don't see any value printing delErr in the below error message.

make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think later sometime we need to simplify this, like for instance make Add and Del functions return just one structure with the response and error, now there are two variables and anyone adding a piece of code should be aware of setting response when err is nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this whole piece needs to restructured and better handled.
like how to handle err vs r.success. (seems there is a duplicate here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah agreed.

Copy link
Contributor

@jayanthvn jayanthvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

}

if !r.Success {
} else if !r.Success {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah agreed.

@M00nF1sh M00nF1sh merged commit a2c966e into aws:master Jun 9, 2021
M00nF1sh added a commit to M00nF1sh/amazon-vpc-cni-k8s that referenced this pull request Jun 9, 2021
* Fix two bug in CNI/IPamd code path
1. when IPAMD returns "no ip available error", the err variable get overwritten to nil when get VPCCIDRs
2. when DelNetwork returns err, the r.Success will cause nil-pointer exception and crash CNI

* fix test cases

* address commits
M00nF1sh added a commit that referenced this pull request Jun 9, 2021
)

* Fix two bug in CNI/IPamd code path
1. when IPAMD returns "no ip available error", the err variable get overwritten to nil when get VPCCIDRs
2. when DelNetwork returns err, the r.Success will cause nil-pointer exception and crash CNI

* fix test cases

* address commits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants