Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coscheduilng. #639

Merged
merged 1 commit into from
Jan 7, 2019
Merged

Coscheduilng. #639

merged 1 commit into from
Jan 7, 2019

Conversation

k82cn
Copy link
Member

@k82cn k82cn commented Dec 2, 2018

Signed-off-by: Da K. Ma klaus1982.cn@gmail.com

Moved kubernetes/community#2337 to k/enhancements.

/cc @bsalamat, @vishh, @mindprince, @jlewi

/cc @kubernetes/sig-scheduling-feature-requests

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/pm size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 2, 2018
@k82cn
Copy link
Member Author

k82cn commented Dec 2, 2018

xref #583

@k82cn
Copy link
Member Author

k82cn commented Dec 2, 2018

/assign bsalamat

@yastij
Copy link
Member

yastij commented Dec 2, 2018

/cc

adding this one to my backlog

@bsalamat
Copy link
Member

bsalamat commented Dec 4, 2018

I think this proposal is a good starting point for introducing gang scheduling. Given that our current plan is to build an early version in kube-batch (which is an incubator, not a core component), this looks good to me. We should expand the proposal in the future with more in-depth information about life-cycle and quota management of gangs.

@MaciekPytel
Copy link

I would like a section on how this will interact with cluster autoscaling added to the doc. I don't expect it to work with CA in initial version, but we should make sure the design is not fundamentally conflicting with how CA works if we want to promote it to core kubernetes later on.
Cluster Autoscaler works by importing and internally running scheduler predicates to simulate how scheduler would behave if nodes were added to cluster or removed from it. Ideally any scheduling logic should be in form of predicates that we can inform. Otherwise we should make sure it's something that can be duplicated in CA.

@k82cn
Copy link
Member Author

k82cn commented Dec 28, 2018

Ideally any scheduling logic should be in form of predicates that we can inform.

Coscheduling is not a predicate in scheduler; if necessary, CA need to check PodGroup.Status and pending pods to get how many additional resource are required.

Cluster Autoscaler works by importing and internally running scheduler predicates ...

How CA handle scheduler extender ?

@k82cn
Copy link
Member Author

k82cn commented Jan 3, 2019

@MaciekPytel

When discussing CA + kube-batch at kubernetes-retired/kube-batch#526 (comment) , I'm thinking to add more info in PodGroupStatus for CA to scale out.

@MaciekPytel
Copy link

I replied on kubernetes-retired/kube-batch#533 (review). Let's continue the discussion in a single thread.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 3, 2019
@k82cn
Copy link
Member Author

k82cn commented Jan 3, 2019

I replied on kubernetes-retired/kube-batch#533 (review). Let's continue the discussion in a single thread.

Thanks for your reply. I marked Cluster-Autoscaler as out-of-scope in this version; I'll open other PRs when we have alternative solution for it.

@MaciekPytel
Copy link

Thanks for your reply. I marked Cluster-Autoscaler as out-of-scope in this version; I'll open other PRs when we have alternative solution for it.

I'm perfectly fine with that as long as the feature lives in kube-batch (which is incompatible with CA anyway). We need to figure out how to make it work with CA if/when we want to move the feature to default scheduler.

Copy link
Contributor

@mattfarina mattfarina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you could format this following the template?

For example, what would the graduation criteria for this look like?

@k82cn k82cn mentioned this pull request Jan 4, 2019
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
@k82cn
Copy link
Member Author

k82cn commented Jan 4, 2019

Any chance you could format this following the template?

Done :)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 7, 2019
Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

More improvements to this design is needed in the area of life-cycle management of pod groups, but those can be added later, given that this design is going to an incubator project for now and won't be integrated into standard components.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, k82cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants