-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Egress IP across multi subnets #4385
Comments
@jianjuns - On our EKS clusters we are using 3 subnets for the workers, 1 subnet per AZ. We were running into issues trying to get EgressIP working, the configuration etc of Antrea was correct however whenever validating the traffic from the SNAT IP, we always seen the source IP being the node the Egress was assigned to versus the EgressIP. We had a troubleshooting session with @tnqn last week and he was able to confirm that our issue is due to the workers not being in the same subnet, therefore the ARP is not succeeding. @tnqn does that sound correct, I'm sure there is some extra technical detail missing :) |
@robbo10 The Egress IP should be in the same subnet as the Node's IP, so if the node selector selects all Nodes, they need to be in the same subnet. Reminded by @jianjuns's comment, I wonder if you could limit the node selector to one AZ only and use Egress IPs from the subnet of that AZ. You could even have 3 externalIPPools, each of which selects only Nodes of one AZ and contains IPs in that AZ's subnet. |
@tnqn - For HA purposes, to ensure that we don’t have downtime for any products, if worst case say AZ1 went down which would result in all namespaces which have their Egress tied to nodes in that subnet having an outage. Is it possible as you mention to have 3 externalIPPools, and per namespace assign 3 Egress IP's to each namespace, one from each externalIPPool, so as if all nodes within an AZ where to fail we would not bring down a bunch of applications? Would that make sense to ensure we have HA? Thanks for the support :) |
I think I understand the requirement now and wonder if two backup Egress IPs are needed. It may happen one AZ is totally down so one backup should be enough? If yes, I'm considering a secondaryEgressIP field (and a corresponding secondaryExternalIPPool field), which would take over the Egress traffic when the primary EgressIP's nodes are all unavailable. It may be helpful for static Egress as well, as it also adds HA for it, tolerating one Egress Node's outage. But I haven't thought through what it means from implementation's perspective. Would like to hear whether the use case and the API change makes sense or not first. @robbo10 @jianjuns @antoninbas |
I also feel >1 Egress IPs are the only way for AWS, where a subnet is a single AZ. |
@tnqn - With the approach you outlined above would that allow for nodes in an EKS cluster to be in two AZ's, therefore the workers could be split across 2 subnets and EgressIP would work? Thanks |
@tnqn - just to clarify as things stand we can’t assign multi EgressIP to a namespace to solve the multi AZ subnet problem? Thanks |
Yes, but it's not supported yet, just an idea how to resolve the problem and need more evaluation on the implementation.
It's not supported yet as seen from the Egress API, only a single Egress IP and ExternalIPPool can be specified. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days |
We have Antrea running on EKS, however when trying to make use of the EgressIP feature we are limited to all the nodes being on the same subnet.
For availability purposes we have nodes in 2/3 subnets across different AZ's per cluster.
We would like the ability to use EgressIP in H/A mode by Antrea being able to support multiple subnets for nodes running in a cluster.
Thanks for all the work done on the project thus far! Everyone has been super helpful :)
The text was updated successfully, but these errors were encountered: