-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic high availability for auto egress IPs #19578
Basic high availability for auto egress IPs #19578
Conversation
2b044d1
to
473dcfd
Compare
LGTM |
473dcfd
to
b223227
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@openshift/networking PTAL
pkg/network/node/vxlan_monitor.go
Outdated
} | ||
|
||
const ( | ||
defaultPollInterval = 5 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we doc these constants please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the egressVXLANMonitor docs to clarify how the constants get used
pkg/network/node/vxlan_monitor.go
Outdated
// The fact that we (normally) use pod-to-egress traffic to do the monitoring rather than | ||
// actively pinging the nodes means that if an egress node falls over while no one is | ||
// using an egress IP on it, we won't notice the problem until someone does try to use it. | ||
// So, eg, the first pod created in a namespace might spend 5 seconds trying to talk to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's (max) 7 seconds... right? Or am I missing something.
Every pollInterval (5s) we check the traffic to each node. If there's a mismatch we go into retry mode which re-checks every 1s and call it failed after 2 retries. So we will call it bad in best case 2s, worst case 7s, and average 4.5s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Changed it from "5" to "several"
nodeIP: nodeIP, | ||
sdnIP: sdnIP, | ||
} | ||
if len(evm.monitorNodes) == 1 && evm.pollInterval != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth turning off polling in RemoveNode()? Maybe not, just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... I was doing it from poll() (it returned true if evm.monitorNodes was empty, causing the PollInfinite() to exit). But that creates a race condition if you delete the last node, then add a new node before poll() runs. So I redid it with a stop channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
b223227
to
2e101d1
Compare
2e101d1
to
940d775
Compare
If a namespace has multiple egress IPs, monitor egress traffic and switch to an alternate egress IP if the currently-selected one appears dead.
Most dump-flows calls are part of health checks and don't normally need to be logged about unless they fail.
9c76ad6
to
0439a83
Compare
/hold cancel |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, dcbw, knobunc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
/retest |
2 similar comments
/retest |
/retest |
for _, ip := range eg.namespaces[0].requestedIPs { | ||
eg2 := eip.egressIPs[ip] | ||
if eg2 != eg && len(eg2.nodes) == 1 && eg2.nodes[0] == eg.nodes[0] { | ||
return false, fmt.Errorf("Multiple EgressIPs (%s, %s) for VNID %d on node %s", eg.ip, eg2.ip, eg.namespaces[0].vnid, eg.nodes[0].nodeIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship What is the reason to make the egressIP unavailable when all the egressIPs which are claimed by namespace are held by the same node? IMO, this condition will break the HA, but should be harmless if only let the first element work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmeng If you set multiple egress IPs on a NetNamespace, then you are expecting to get high availability. If the IPs are both on the same node though, then you aren't actually getting any HA. So, as in other cases, we break things to make the admin notice the misconfiguration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that makes sense. Thanks.
OK, this changes the semantics of automatic egress IPs so that:
EgressIPs
field of aNetNamespace
, and if so, the first one in the list that is currently hosted on a node will be used.I tried to do this without changing the existing egressip code, but the existing code was based on the idea of processing each egress IP independently, and that doesn't work very well when changes to one egress IP can affect whether another egress IP is used. So I had to do some reorganization.
Some things to fix later (either before this gets merged, or in followup PRs, or in future patch releases):
@openshift/sig-networking PTAL