-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: add limit-aware scheduling #1495
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @eahydra |
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## main #1495 +/- ##
==========================================
+ Coverage 64.09% 64.76% +0.67%
==========================================
Files 341 347 +6
Lines 34943 35376 +433
==========================================
+ Hits 22396 22913 +517
+ Misses 10899 10780 -119
- Partials 1648 1683 +35
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
a4b552b
to
123276d
Compare
0c01996
to
23f65ab
Compare
There are several issues to consider:
|
0564dfc
to
1846c64
Compare
Signed-off-by: wangjianyu.wjy <wangjianyu.wjy@alibaba-inc.com>
1846c64
to
516e2bf
Compare
|
||
To schedule a new pod pod5 (request: 1, limit: 4), although node1's total request/allocatable ratio is smaller than node2's, our `LimitAware` plugin considers node1's resource limit is already oversubscribed and hence place pod5 on node 2 instead. | ||
|
||
#### Story 2: Use limit aware filter plugin to filter out nodes with high risk of resource over-subscription |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we should be extra cautious on introducing a Filter plugin. Several reasons:
- users, to my experience, may prefer their pods get scheduled (and possible OOMed / CPU throttled afterwards) than not being able to get scheduled at first place
- it'd introduce a new semantics on KRM, which has a large blasting radius - CA has to be aware of this, so does other integrators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Huang-Wei Thanks for reviewing the PR.
Thanks for reviewing this PR.
Indeed, as you said, introducing a new Filter will cause many problems.
Specific to the Limit-aware scheduling scenario, considering the runtime stability of the node, it is risky if the total limit of a node is too high. IMO there are two ways to control this risk:
- Lower the weight of such nodes based on the scoring algorithm;
- Introduce Filter to strongly restrict the total amount of Limit to not exceed the threshold.
The first method is a must. As for the second method, it can be given to users as an option, so that they have the opportunity to use it when necessary.
WDYT? @Huang-Wei
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2nd option is not necessarily a big no, it depends on your use-case. It's just when offering it, it'd break KRM (kubernetes resource model), you have to document the restrictions clearly like vanilla CA are unable to handle this kind of filter error (unless it gets re-compiled), etc.
This issue has been automatically marked as stale because it has not had recent activity.
|
This issue has been automatically closed because it has not had recent activity.
|
Ⅰ. Describe what this PR does
This PR proposes a resource limit aware plugin to mitigate the risk of overcommitment.
Ⅱ. Does this pull request fix one issue?
Ⅲ. Describe how to verify it
Ⅳ. Special notes for reviews
V. Checklist
make test