-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support allocating Egress IPs from ExternalIPPools #2237
Conversation
eee68d8
to
ee0754c
Compare
Codecov Report
@@ Coverage Diff @@
## main #2237 +/- ##
==========================================
+ Coverage 61.73% 65.02% +3.28%
==========================================
Files 276 277 +1
Lines 21306 21563 +257
==========================================
+ Hits 13153 14021 +868
+ Misses 6775 6101 -674
- Partials 1378 1441 +63
Flags with carried forward coverage won't be shown. Click here to find out more.
|
23cd8e3
to
48ca2dc
Compare
} | ||
|
||
// deleteEgress processes Egress DELETE events and deletes corresponding EgressGroup. | ||
func (c *EgressController) deleteExternalIPPool(obj interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we enqueue Egresses here to reclaim the Egress IPs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I was thinking whether we should have validationwebhook for externalIPPool so that one in use cannot be deleted, or just reclaim the IPs after the pool is deleted. Guess the latter one is more friendly to use. I could make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code to reclaim IPs when pool is deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think refusing to delete an externalIPPool that's in use may be a more user-friendly solution actually, as it could help prevent mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the scenario that user applying a manifest that has both egresses and externalIPPools. In current implementation, it's ok to create and delete egress and externalIPPool in any order. If we add the check, then externalIPPool can only be deleted after controller processes all egress delete events. But this may be not really important as pool is likely more persistent and your point of preventing mistakes makes sense to me. I'm ok with either strategy. @jianjuns do you have a strong opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong opinion, but slightly prefer to allow deletion without check. We can have more and more such relations between CRDs; feel we should categorize some as weak references and allow deletion with the references. Pods applied the Egress might lose egress connectivity after the pool is deleted, but seems not very bad to me.
I saw APIs using either strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I checked Tier resource already has the check that tier in use can not be deleted.
It's true, but we also have a check to prevent a policy from being created in a Tier that doesn't exist. Which we don't want to have for Egress.
I see merits to both approaches.
- Allowing deletion:
- symmetry with the creation case (one can create an Egress referencing a non-existing pool)
- may be convenient if the user's intent is indeed to release all IPs at once
- Preventing deletion:
- prevent errors if the user mistakenly thought that the pool was no longer used
- symmetry with what we would do for Pod IP pools, as in that case it wouldn't make sense IMO to allow deletion of a pool with IPs in use by Pods
What's the plan for supporting removing IP addresses from an EgressIPPool (e.g. removing an IP address range, which may be in use)? I think we should be consistent across both situations. If we don't plan to have a check for that case, I don't think we should have a check for deletion either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supporting removing or updating an existing IP address range requires more code to take care of cases. For example, changing 1.1.1.10-1.1.1.20 to 1.1.1.10-1.1.1.30 just means removing the former and adding the latter from code's perspective. We need to first know that the Egress IP's original ipRange is deleted (need to track which ipRange the IP is allocated from then) and then release the IP from the former pool and re-allocate them in the new pool. It can be done but I'm not sure if it's worth. One of my TODOs is to add a validatingwebhook to verify only new range can be added and existing ranges cannot be deleted. Guess to keep consistent experience, we propbably should have a check for deletion too.
@jianjuns @antoninbas how about let's start with simpler solution to get the core feature done first then consider optimizing the user experience, i.e. ipPool can expand and cannot shrink, ipPool can only be removed when it's no longer used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in my mind we can just support extending the CIDRs, which is also the strategy of many other IPAM solutions I know.
Again, I feel we should finally categorize some relations/references as weak relations, and allow object deletion with a weak relation (and IMO Egress - IPPool relation can be weak).
I am fine to do fix validation later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianjuns @antoninbas do you have any remaining comments want me to address in this PR? I plan to merge this first to make the regular operation work and will have a separate PR for validation enhancement.
pkg/controller/egress/controller.go
Outdated
c.releaseEgressIP(egress.Name, prevIP, prevIPPool) | ||
} | ||
|
||
// Skip allocating EgressIP if ExternalIPPool is not specified and returns whatever user specifies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if user originally specified an Egress IP, and then updated the Spec to add an IPPool, while keeping the Egress IP in the Spec unchanged?
Will we resign the Egress IP to a Node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if user originally specified an Egress IP, and then updated the Spec to add an IPPool, while keeping the Egress IP in the Spec unchanged?
Currently controller will only log error that the requested IP is not in the Pool. We could improve it by adding a validationwebhook which rejects the update if the IP doesn't match the pool.
Will we resign the Egress IP to a Node?
Good question. Just thought about it after you raised it. I think it will have the problem if we don't add the validationwebhook. Or we have to check if the EgressIP match the Pool in agent too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added TODO for the validation webhook. Will add it with a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jianjuns for the review.
pkg/controller/egress/controller.go
Outdated
c.releaseEgressIP(egress.Name, prevIP, prevIPPool) | ||
} | ||
|
||
// Skip allocating EgressIP if ExternalIPPool is not specified and returns whatever user specifies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if user originally specified an Egress IP, and then updated the Spec to add an IPPool, while keeping the Egress IP in the Spec unchanged?
Currently controller will only log error that the requested IP is not in the Pool. We could improve it by adding a validationwebhook which rejects the update if the IP doesn't match the pool.
Will we resign the Egress IP to a Node?
Good question. Just thought about it after you raised it. I think it will have the problem if we don't add the validationwebhook. Or we have to check if the EgressIP match the Pool in agent too.
} | ||
|
||
// deleteEgress processes Egress DELETE events and deletes corresponding EgressGroup. | ||
func (c *EgressController) deleteExternalIPPool(obj interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I was thinking whether we should have validationwebhook for externalIPPool so that one in use cannot be deleted, or just reclaim the IPs after the pool is deleted. Guess the latter one is more friendly to use. I could make the change.
f9e95af
to
69e0b33
Compare
} | ||
|
||
// deleteEgress processes Egress DELETE events and deletes corresponding EgressGroup. | ||
func (c *EgressController) deleteExternalIPPool(obj interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think refusing to delete an externalIPPool that's in use may be a more user-friendly solution actually, as it could help prevent mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoninbas thanks for the review and catching the locking error. I have addressed all comments except how to deal with externalIPPool removal. Want to achieve consensus first. I plan to have a separate PR for all validatingWebhook planned in the TODOs.
/test-all |
If users don't specify EgressIP of an Egress, antrea-controller will allocate one IP from the specified ExternalIPPool if it's available. Signed-off-by: Quan Tian <qtian@vmware.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
If users don't specify EgressIP of an Egress, antrea-controller will allocate one IP from the specified ExternalIPPool if it's available.
For #2128