Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support allocating Egress IPs from ExternalIPPools #2237

Merged
merged 1 commit into from
Jun 24, 2021

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Jun 3, 2021

If users don't specify EgressIP of an Egress, antrea-controller will allocate one IP from the specified ExternalIPPool if it's available.

For #2128

@tnqn tnqn force-pushed the egress-allocation branch 6 times, most recently from eee68d8 to ee0754c Compare June 3, 2021 15:28
@tnqn tnqn changed the title Support Allocating Egress IPs from ExternalIPPools Support allocating Egress IPs from ExternalIPPools Jun 3, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2021

Codecov Report

Merging #2237 (f40895b) into main (5953984) will increase coverage by 3.28%.
The diff coverage is 83.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2237      +/-   ##
==========================================
+ Coverage   61.73%   65.02%   +3.28%     
==========================================
  Files         276      277       +1     
  Lines       21306    21563     +257     
==========================================
+ Hits        13153    14021     +868     
+ Misses       6775     6101     -674     
- Partials     1378     1441      +63     
Flag Coverage Δ
e2e-tests 50.35% <0.00%> (?)
kind-e2e-tests 52.29% <0.00%> (-0.64%) ⬇️
unit-tests 41.86% <83.78%> (+0.60%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/controller/egress/ipallocator/allocator.go 83.14% <83.14%> (ø)
pkg/controller/egress/controller.go 87.20% <84.11%> (+3.32%) ⬆️
...gent/controller/networkpolicy/status_controller.go 72.60% <0.00%> (-5.48%) ⬇️
pkg/util/k8s/client.go 46.34% <0.00%> (-2.68%) ⬇️
pkg/agent/controller/traceflow/packetin.go 64.97% <0.00%> (-1.69%) ⬇️
...ntroller/networkpolicy/networkpolicy_controller.go 69.45% <0.00%> (-0.65%) ⬇️
pkg/ovs/openflow/ofctrl_bridge.go 50.36% <0.00%> (+0.36%) ⬆️
pkg/ovs/openflow/ofctrl_action.go 50.42% <0.00%> (+0.42%) ⬆️
pkg/agent/openflow/network_policy.go 76.41% <0.00%> (+0.59%) ⬆️
... and 45 more

@tnqn tnqn marked this pull request as draft June 3, 2021 16:03
@tnqn tnqn force-pushed the egress-allocation branch 4 times, most recently from 23cd8e3 to 48ca2dc Compare June 15, 2021 16:44
@tnqn tnqn marked this pull request as ready for review June 15, 2021 16:44
@tnqn tnqn requested review from jianjuns, antoninbas and wenqiq June 15, 2021 16:44
@tnqn tnqn force-pushed the egress-allocation branch from 48ca2dc to b93de0b Compare June 15, 2021 16:57
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
}

// deleteEgress processes Egress DELETE events and deletes corresponding EgressGroup.
func (c *EgressController) deleteExternalIPPool(obj interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we enqueue Egresses here to reclaim the Egress IPs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I was thinking whether we should have validationwebhook for externalIPPool so that one in use cannot be deleted, or just reclaim the IPs after the pool is deleted. Guess the latter one is more friendly to use. I could make the change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to reclaim IPs when pool is deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think refusing to delete an externalIPPool that's in use may be a more user-friendly solution actually, as it could help prevent mistakes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the scenario that user applying a manifest that has both egresses and externalIPPools. In current implementation, it's ok to create and delete egress and externalIPPool in any order. If we add the check, then externalIPPool can only be deleted after controller processes all egress delete events. But this may be not really important as pool is likely more persistent and your point of preventing mistakes makes sense to me. I'm ok with either strategy. @jianjuns do you have a strong opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion, but slightly prefer to allow deletion without check. We can have more and more such relations between CRDs; feel we should categorize some as weak references and allow deletion with the references. Pods applied the Egress might lose egress connectivity after the pool is deleted, but seems not very bad to me.
I saw APIs using either strategy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I checked Tier resource already has the check that tier in use can not be deleted.

It's true, but we also have a check to prevent a policy from being created in a Tier that doesn't exist. Which we don't want to have for Egress.

I see merits to both approaches.

  • Allowing deletion:
    • symmetry with the creation case (one can create an Egress referencing a non-existing pool)
    • may be convenient if the user's intent is indeed to release all IPs at once
  • Preventing deletion:
    • prevent errors if the user mistakenly thought that the pool was no longer used
    • symmetry with what we would do for Pod IP pools, as in that case it wouldn't make sense IMO to allow deletion of a pool with IPs in use by Pods

What's the plan for supporting removing IP addresses from an EgressIPPool (e.g. removing an IP address range, which may be in use)? I think we should be consistent across both situations. If we don't plan to have a check for that case, I don't think we should have a check for deletion either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting removing or updating an existing IP address range requires more code to take care of cases. For example, changing 1.1.1.10-1.1.1.20 to 1.1.1.10-1.1.1.30 just means removing the former and adding the latter from code's perspective. We need to first know that the Egress IP's original ipRange is deleted (need to track which ipRange the IP is allocated from then) and then release the IP from the former pool and re-allocate them in the new pool. It can be done but I'm not sure if it's worth. One of my TODOs is to add a validatingwebhook to verify only new range can be added and existing ranges cannot be deleted. Guess to keep consistent experience, we propbably should have a check for deletion too.
@jianjuns @antoninbas how about let's start with simpler solution to get the core feature done first then consider optimizing the user experience, i.e. ipPool can expand and cannot shrink, ipPool can only be removed when it's no longer used.

Copy link
Contributor

@jianjuns jianjuns Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in my mind we can just support extending the CIDRs, which is also the strategy of many other IPAM solutions I know.
Again, I feel we should finally categorize some relations/references as weak relations, and allow object deletion with a weak relation (and IMO Egress - IPPool relation can be weak).
I am fine to do fix validation later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns @antoninbas do you have any remaining comments want me to address in this PR? I plan to merge this first to make the regular operation work and will have a separate PR for validation enhancement.

pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
c.releaseEgressIP(egress.Name, prevIP, prevIPPool)
}

// Skip allocating EgressIP if ExternalIPPool is not specified and returns whatever user specifies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if user originally specified an Egress IP, and then updated the Spec to add an IPPool, while keeping the Egress IP in the Spec unchanged?
Will we resign the Egress IP to a Node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if user originally specified an Egress IP, and then updated the Spec to add an IPPool, while keeping the Egress IP in the Spec unchanged?

Currently controller will only log error that the requested IP is not in the Pool. We could improve it by adding a validationwebhook which rejects the update if the IP doesn't match the pool.

Will we resign the Egress IP to a Node?

Good question. Just thought about it after you raised it. I think it will have the problem if we don't add the validationwebhook. Or we have to check if the EgressIP match the Pool in agent too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO for the validation webhook. Will add it with a separate PR.

Copy link
Member Author

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jianjuns for the review.

pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
c.releaseEgressIP(egress.Name, prevIP, prevIPPool)
}

// Skip allocating EgressIP if ExternalIPPool is not specified and returns whatever user specifies.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if user originally specified an Egress IP, and then updated the Spec to add an IPPool, while keeping the Egress IP in the Spec unchanged?

Currently controller will only log error that the requested IP is not in the Pool. We could improve it by adding a validationwebhook which rejects the update if the IP doesn't match the pool.

Will we resign the Egress IP to a Node?

Good question. Just thought about it after you raised it. I think it will have the problem if we don't add the validationwebhook. Or we have to check if the EgressIP match the Pool in agent too.

}

// deleteEgress processes Egress DELETE events and deletes corresponding EgressGroup.
func (c *EgressController) deleteExternalIPPool(obj interface{}) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I was thinking whether we should have validationwebhook for externalIPPool so that one in use cannot be deleted, or just reclaim the IPs after the pool is deleted. Guess the latter one is more friendly to use. I could make the change.

@tnqn tnqn force-pushed the egress-allocation branch 3 times, most recently from f9e95af to 69e0b33 Compare June 16, 2021 16:36
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Show resolved Hide resolved
}

// deleteEgress processes Egress DELETE events and deletes corresponding EgressGroup.
func (c *EgressController) deleteExternalIPPool(obj interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think refusing to delete an externalIPPool that's in use may be a more user-friendly solution actually, as it could help prevent mistakes.

pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller_test.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller_test.go Show resolved Hide resolved
Copy link
Member Author

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas thanks for the review and catching the locking error. I have addressed all comments except how to deal with externalIPPool removal. Want to achieve consensus first. I plan to have a separate PR for all validatingWebhook planned in the TODOs.

pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller_test.go Show resolved Hide resolved
pkg/controller/egress/controller_test.go Outdated Show resolved Hide resolved
@tnqn
Copy link
Member Author

tnqn commented Jun 23, 2021

/test-all

If users don't specify EgressIP of an Egress, antrea-controller will
allocate one IP from the specified ExternalIPPool if it's available.

Signed-off-by: Quan Tian <qtian@vmware.com>
@tnqn tnqn force-pushed the egress-allocation branch from f40895b to 22780dc Compare June 23, 2021 13:21
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member Author

tnqn commented Jun 23, 2021

/test-all

@tnqn tnqn merged commit ef14ad1 into antrea-io:main Jun 24, 2021
@tnqn tnqn mentioned this pull request Aug 24, 2021
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants