-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add byobject filter on nodes #2888
Conversation
Closing per the discussion here - #2887 |
A couple more datapoints in favor of this change. We have a few large node count clusters with a higher rate of churn for these nodes. The number of Node events pushed to the daemonset was causing excesive bandwidth usage. Adding the filter drastically reduced the number of events distributed to the system. Similarly the number of bytes pushed fell drastically (sharing a screenshot without absolute numbers, happy to share in private, but you can see the relative drop in bytes processed).
|
@GnatorX - The PR is still in draft mode. Please feel free to move to review. Also can you run |
ya i can do that |
ran |
Updated the branch and started the test workflow.. |
Thanks |
@@ -37,7 +37,11 @@ func getIPAMDCacheFilters() map[client.Object]cache.ByObject { | |||
return map[client.Object]cache.ByObject{ | |||
&corev1.Pod{}: { | |||
Field: fields.Set{"spec.nodeName": nodeName}.AsSelector(), | |||
}} | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous two related to changes to the cache improvements were these.
- https://github.com/aws/amazon-vpc-cni-k8s/pull/1419/files#diff-f48443f100e2f9072cbb1924ed0a9090831ade5986d10535641ad5c0a63f51b4
- ea10123#diff-f48443f100e2f9072cbb1924ed0a9090831ade5986d10535641ad5c0a63f51b4
I am trying to understand why the only Pod{}
to the filter here in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect it wasn't as obvious because this requires large clusters to show the increase in memory and networking bandwidth. We only saw this because our cluster is 4k node count
@dl-stripe - What is the LHS value in the first graph here - #2888 (comment) |
It's the number of Node events pushed through informers (based off |
When we were using node labels to signal security group for pods feature - https://docs.aws.amazon.com/eks/latest/userguide/security-groups-pods-deployment.html this change for letting kubernetes client cache the node calls, could have had a impact, and dependent on the cache invalidation by the client when the labels change. Right now since, we don't depend on the CNI node labels, this change looks good to me. @dl-stripe / @GnatorX - Do you have Security Groups for Pods in your cluster under test and did you see any impact this change? |
This is filter down the call for nodes when informer performs the list + watch. It shouldn't affect any label changes. It should narrow the informer cache to just watch for the node the VPC CNI is running on and ignore all other node's updates and events |
We do run with Security Groups for Pods and this change has no effect on it. |
Got it. This makes sense, and the change looks correct and helpful too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change LGTM.
The filtermap used here -
amazon-vpc-cni-k8s/pkg/k8sapi/k8sutils.go
Line 68 in d49c8a3
ByObject: filterMap, |
This correct.
I understand adding an additional cache field can increase the memory usage by VPC CNI, but it did not per here - #2887
We will verify memory usage VPC CNI after this change before the release of this change.
The reason initially I wasn't able to get the memory improvement from this change is because our internal setup vendored the code, so when I made the change it didn't actually take effect.
|
https://github.com/kubernetes-sigs/controller-runtime/blob/main/designs/use-selectors-at-cache.md Some docs I found that explains what these ByObject filters do |
Thank you for the explanation and the links to behavior. This change looks good and helpful. |
* Add byobject filter on nodes --------- Co-authored-by: Jayanth Varavani <1111446+jayanthvn@users.noreply.github.com> Co-authored-by: Garvin Pang <garvinp@stripe.com>
This is a great change/any other calls to API Server or AWS APIs that could use filtering should use it. This cut the memory usage of a 4-500 node cluster in half from 40GB+ to ~20GB across all cni pods. |
What type of PR is this?
improvement
Which issue does this PR fix?:
#2887
What does this PR do / Why do we need it?:
This PR adds a filter to reduce the number of node object VPC CNI watches for to just the node it is managing.
Testing done on this change:
Tested on our cluster of 3-4k node by @dl-stripe
Will this PR introduce any new dependencies?:
No
Will this break upgrades or downgrades? Has updating a running cluster been tested?:
No.
Does this change require updates to the CNI daemonset config files to work?:
No
Does this PR introduce any user-facing change?:
No
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.