-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support creation of GCE_VM_PRIMARY_IP NEGs for L4 ILB Services. #959
Conversation
pkg/neg/manager.go
Outdated
@@ -316,6 +338,9 @@ func getSyncerKey(namespace, name string, servicePortKey negtypes.PortInfoMapKey | |||
if flags.F.EnableNonGCPMode { | |||
networkEndpointType = negtypes.NonGCPPrivateEndpointType | |||
} | |||
if portInfo.PortTuple.Port == 0 && portInfo.PortTuple.Name == string(negtypes.VmPrimaryIpEndpointType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if the PortTuple is empty to determine if it is VM_PRIMARY_IP instead of using the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/neg/types/types.go
Outdated
NegName: namer.NEG(namespace, name, svcPortTuple.Port), | ||
ReadinessGate: readinessGate, | ||
PortTuple: svcPortTuple, | ||
NegName: namer.NEG(namespace, name, svcPortTuple.Port), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a new namer func for VM_PRIMARY_IP NEG type with format k8s1-clusterhash-namespace-name-hash
Create a new func similar to NewPortInfoMap to create PortInfoMap for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/neg/syncers/transaction.go
Outdated
recorder record.EventRecorder | ||
cloud negtypes.NetworkEndpointGroupCloud | ||
zoneGetter negtypes.ZoneGetter | ||
|
||
// randomize indicates that the endpoints of the NEG can be picked at random, rather | ||
// than following the endpoints of the service. This only applies in the GCE_VM_PRIMARY_IP NEG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put
"This only applies in the GCE_VM_PRIMARY_IP NEG" in front
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/neg/syncers/transaction.go
Outdated
} | ||
|
||
// Merge the current state from cloud with the transaction table together | ||
// The combined state represents the eventual result when all transactions completed | ||
mergeTransactionIntoZoneEndpointMap(currentMap, s.transactions) | ||
// Calculate the endpoints to add and delete to transform the current state to desire state | ||
addEndpoints, removeEndpoints := calculateNetworkEndpointDifference(targetMap, currentMap) | ||
if s.randomize && len(removeEndpoints) > 0 { | ||
// Make removals minimum since the traffic will be abruptly stopped. Log removals | ||
klog.Infof("Removing endpoints %+v from GCE_VM_PRIMARY_IP NEG %s", removeEndpoints, s.negName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a log level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/neg/syncers/transaction.go
Outdated
@@ -156,7 +177,9 @@ func (s *transactionSyncer) syncInternal() error { | |||
// filter out the endpoints that are in transaction | |||
filterEndpointByTransaction(committedEndpoints, s.transactions) | |||
|
|||
s.commitPods(committedEndpoints, endpointPodMap) | |||
if s.NegSyncerKey.NegType != negtypes.VmPrimaryIpEndpointType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try not to skip it.
Just explore if it is okay to leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is ok to leave as is without the if check. Will make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/neg/syncers/transaction.go
Outdated
|
||
switch { | ||
case s.NegSyncerKey.NegType == negtypes.VmPrimaryIpEndpointType: | ||
nodeLister := listers.NewNodeLister(s.nodeLister) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need a NewNodeLister here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to select nodes at random for the NEG backing an "ExternalTrafficPolicy: Cluster" ilb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store NodeLister instead of the Cache.Indexer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The utils_test.go uses the Cache.Indexer in order to add nodes and test the transaction functions. Leaving it as is for now. Please let me know if you have ideas on how to continue testing this.
pkg/neg/syncers/utils.go
Outdated
@@ -169,6 +170,42 @@ func ensureNetworkEndpointGroup(svcNamespace, svcName, negName, zone, negService | |||
return nil | |||
} | |||
|
|||
func toZonePrimaryIPEndpointMap(endpoints *apiv1.Endpoints, nodeLister listers.NodeLister, zoneGetter negtypes.ZoneGetter, randomize bool, currentMap map[string]negtypes.NetworkEndpointSet, serviceKey string) (map[string]negtypes.NetworkEndpointSet, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/neg/syncers/subsets.go
Outdated
return subset | ||
} | ||
|
||
func getSubsetPerZone(nodes []*v1.Node, zoneGetter negtypes.ZoneGetter, svcID string, currentMap map[string]negtypes.NetworkEndpointSet, newEpCount int, randomize bool) (map[string]negtypes.NetworkEndpointSet, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/neg/syncers/subsets.go
Outdated
return result, nil | ||
} | ||
|
||
func getSubsetCount(currentCount, newCount, numZones int, randomize bool) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/neg/syncers/utils.go
Outdated
nodes, _ := nodeLister.ListWithPredicate(utils.GetNodeConditionPredicate()) | ||
return getSubsetPerZone(nodes, zoneGetter, serviceKey, currentMap, utils.NumEndpoints(endpoints), true) | ||
} | ||
nodes := []*v1.Node{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment here to describe the outcome:
basically pick the nodes where the endpionts are located.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/neg/syncers/subsets.go
Outdated
// If the input list is smaller than the desired subset count, the entire list is returned. The hash salt | ||
// is used so that a different subset is returned even when the same node list is passed in, for a different salt value. | ||
// It also keeps the subset relatively stable for the same service. | ||
func PickSubsetsNoRemovals(nodes []*v1.Node, salt string, count int, current []negtypes.NetworkEndpoint) []*v1.Node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to export this function, right?
Only used in this package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the function name is a bit confusing. PickSubsetsNoRemovals
seems to indicate No removal. But it actually is minimal removal? Or minimal change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, changed the name and not exported.
pkg/neg/syncers/subsets.go
Outdated
return hex.EncodeToString(hashSum[:]) | ||
} | ||
|
||
// PickSubsetsNoRemovals ensures that there are no node removals from current subset unless the node no longer exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some examples in comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} | ||
} | ||
if len(subset) >= count { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment remove excessive nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
) | ||
|
||
// NodeInfo stores node metadata used to sort nodes and pick a subset. | ||
type NodeInfo struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment each field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/neg/syncers/subsets.go
Outdated
// Pick all nodes from existing subset if still available. | ||
for _, ep := range current { | ||
for _, nodeInfo := range info { | ||
curHashName := getHashedName(ep.Node, salt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calculate this in the outter for loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, done.
return info[i].hashedName < info[j].hashedName | ||
}) | ||
// Pick all nodes from existing subset if still available. | ||
for _, ep := range current { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if you can make this easier if you use maps. (1map[hashedName]NetworkEndpoint, another map[hashedName]NodeInfo
After you picked the existing ones, the left over ones can be sorted and picked. Not sure if this is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that can be done.. but I am not sure if it simplifies it that much.
I can use a map for the current subset, another map for the current nodes - map[nodeName]NodeInfo.
Picking existing nodes becomes easier. It does get rid of the "skip" boolean as nodes can be deleted from the NodeInfo map as they are picked.
But we need translation from map to slice before sorting as well as translating current subset slice to map as well.
dee780b
to
202eba3
Compare
@@ -130,21 +140,32 @@ func (s *transactionSyncer) syncInternal() error { | |||
return nil | |||
} | |||
|
|||
targetMap, endpointPodMap, err := toZoneNetworkEndpointMap(ep.(*apiv1.Endpoints), s.zoneGetter, s.PortTuple.Name, s.podLister, s.NegSyncerKey.SubsetLabels, s.NegSyncerKey.NegType) | |||
currentMap, err := retrieveExistingZoneNetworkEndpointMap(s.negName, s.zoneGetter, s.cloud, s.NegSyncerKey.GetAPIVersion()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move mergeTransactionIntoZoneEndpointMap
after this. That merges the currentMap with the ongoing transactions (NEG API calls in flight). Then the result is the end state after all transactions are done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using the end result of mergeTransactionIntoZoneEndpointMap to calculate the subset, so we use the result of all pending transactions being applied. Do we want to do something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved offline, this is the correct behavior.
pkg/utils/utils.go
Outdated
} | ||
|
||
// IsLegacyL4ILBService returns true if the given LoadBalancer service is managed by service controller. | ||
func IsLegacyL4ILBService(g *gce.Cloud, svc *api_v1.Service) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gce.Cloud is not used, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, fixed.
pkg/neg/controller.go
Outdated
@@ -67,6 +74,7 @@ type Controller struct { | |||
ingressLister cache.Indexer | |||
serviceLister cache.Indexer | |||
client kubernetes.Interface | |||
gceCloud *gce.Cloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, fixed.
pkg/neg/types/types.go
Outdated
@@ -177,6 +206,7 @@ func (p1 PortInfoMap) Merge(p2 PortInfoMap) error { | |||
mergedInfo.NegName = portInfo.NegName | |||
// Turn on the readiness gate if one of them is on | |||
mergedInfo.ReadinessGate = mergedInfo.ReadinessGate || portInfo.ReadinessGate | |||
mergedInfo.RandomizeEndpoints = mergedInfo.RandomizeEndpoints || portInfo.RandomizeEndpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be an error instead of a merge.
I do not think a service should ever has this 2 states .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Max number of subsets in ExternalTrafficPolicy:Local | ||
maxSubsetSizeLocal = 250 | ||
// Max number of subsets in ExternalTrafficPolicy:Cluster, which is the default mode. | ||
maxSubsetSizeDefault = 25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a recommendation. consider 24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, 24 is a better number for equally splitting between 2, 3, 4 zones. We will have the same split for 25 too, since we do integer division.
|
||
// LocalL4ILBEndpointGetter implements methods to calculate Network endpoints for VM_PRIMARY_IP NEGs when the service | ||
// uses "ExternalTrafficPolicy: Local" mode. | ||
type LocalL4ILBEndpointsCalculator struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment to say which interface it implementations.
and the corresponding algorithm and example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// CalculateEndpoints determines the endpoints in the NEGs based on the current service endpoints and the current NEGs. | ||
func (l *LocalL4ILBEndpointsCalculator) CalculateEndpoints(ep *v1.Endpoints, currentMap map[string]types.NetworkEndpointSet) (map[string]types.NetworkEndpointSet, types.EndpointPodMap, error) { | ||
// List all nodes where the service endpoints are running. Get a subset of the desired count. | ||
nodeZoneMap := make(map[string][]*v1.Node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: zoneNodeMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
numZones := len(nodeZoneMap) | ||
perZoneCount := l.getPerZoneSubsetCount(numZones, numEndpoints) | ||
// Compute the networkEndpoints, with endpointSet size in each zone being atmost `perZoneCount` in size | ||
subsetMap, err := getSubsetPerZone(nodeZoneMap, perZoneCount, l.svcId, currentMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if the following case would happen?
OnlyLocal service with endpoints in 3 zone. Zone a has 240 endpoints, zone b/c each has 1 endpoint.
Will it put 80 endpoints in NEG in zone a and 1 endpoint in zone b and c?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think getSubsetPerZone
and pickSubsetsMinRemovals
need some reorganizing to solve this problem.
Just a thought:
given inputs:
- MAX_ENDPOINT_IN_SUBSET either 25 or 250.
- candidates nodes
- current map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, this can happen. I had added a TODO to fix this in a follow up, I will added it back. It can be fixed by picking upto limit/n nodes, but selecting more than that if other zones had fewer nodes. I can try to include that change in this same PR or in a follow up.
pkg/neg/syncers/transaction.go
Outdated
@@ -88,6 +100,19 @@ func NewTransactionSyncer(negSyncerKey negtypes.NegSyncerKey, networkEndpointGro | |||
return syncer | |||
} | |||
|
|||
func getEndpointsCalculator(syncer *transactionSyncer) negtypes.NetworkEndpointsCalculator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be consider initialize EndpointsCalculator in the SyncerManager?
so that randomize
is no longer needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -463,6 +463,14 @@ func (n *Namer) NEGWithSubset(namespace, name, subset string, port int32) string | |||
return fmt.Sprintf("%s-%s-%s-%s-%s-%s", n.negPrefix(), truncNamespace, truncName, truncPort, truncSubset, negSuffix(n.shortUID(), namespace, name, portStr, subset)) | |||
} | |||
|
|||
func (n *Namer) PrimaryIPNEG(namespace, name string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
40eec7f
to
d0a8d85
Compare
LoadBalancer services will be processed only if the runL4 flag is true. This is false by default. The neg controller also watches Nodes if runL4 is enabled. The subsetting logic minimizes node removals from Endpoint Group to avoid disruption to existing LB sessions.
Generated using "go mod vendor"
Added a controller test to ensure syncers created. Also added unit tests for newly added utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: freehan, prameshj The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This change adds support in transaction syncer to create GCE_VM_PRIMARY_IP NEGs.
The neg controller has 2 new flags - one for watching ingress and one for services.
LoadBalancer services will be processed only if the second flag is enabled. This is disabled by default and can be enabled by adding "ILBSubsets" Alpha feature gate.
/assign @freehan @bowei