-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update metric when there are zero disruption candidates #1187
fix: update metric when there are zero disruption candidates #1187
Conversation
Pull Request Test Coverage Report for Build 8980552943Details
💛 - Coveralls |
c7b8845
to
91f95b4
Compare
a7ee1a2
to
ae89213
Compare
e9446b6
to
ff1b84b
Compare
ff1b84b
to
db934c3
Compare
BeforeEach(func() { | ||
eligibleNodesMetric = ExpectFullyQualifiedNameFromCollector(disruption.EligibleNodesGauge) | ||
}) | ||
It("should correctly report eligible nodes", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand making this its own separate test is nice to test individual things, but I'd rather not add more tests (when we're already having to increase the timeouts). Can you just add this eligible metric check into the existing disruption tests? If you can do this for each of the disruption tests, it'll make sure that the metric is working properly in all the different ways we're testing the codepaths too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a pretty large set of updates for the disruption suite in general in my original consolidation race condition fix PR that I'm going to incorporate into a new PR. The main change was a rework of how we handle faking the clock which significantly sped up the test suite (~5x speed improvement IIRC). I think with that coming as well we can justify a standalone test, but I could also see incorporating this elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced offline, opting for followups to solve this problem
Nice work tracking this down! This is a great simplifying change and some solid testing added! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jmdeal, njtran The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes #N/A
Description
Moves the eligible node metric update to the top-level disruption controller from the individual consolidation implementations. This both reduces code duplication and, more importantly, fixes a bug where the karpenter_disruption_eligible_nodes metric is not updated if the number of candidates is zero. This results in the last non-zero number of candidates being reported indefinitely.
How was this change tested?
Tested in a personal cluster via the Karpenter AWS provider
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.