Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: scheduler (12/): add more scheduler logic #418

Merged
merged 5 commits into from
Jul 10, 2023

Conversation

michaelawyu
Copy link
Contributor

Description of your changes

This PR is part of the PRs that implement the Fleet workload scheduling.

It features more scheduling logic for PickAll type CRPs.

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

  • Unit tests

Special notes for your reviewer

To control the size of the PR, certain unit tests are not checked in; they will be sent in a separate PR.

@michaelawyu
Copy link
Contributor Author

A special note:

Currently the API has a limit of 100 for cluster decisions, but it also dictates that we need to keep all decisions from selected clusters. Considering that we do not have a schema-side range set for numOfClusters, do we drop decisions when there are over 100 selected clusters or do we set a limit of 100 to numOfClusters as well?

@ryanzhang-oss
Copy link
Contributor

A special note:

Currently the API has a limit of 100 for cluster decisions, but it also dictates that we need to keep all decisions from selected clusters. Considering that we do not have a schema-side range set for numOfClusters, do we drop decisions when there are over 100 selected clusters or do we set a limit of 100 to numOfClusters as well?

I wonder where is the 100 limit come from? I think k8s scheduler has a parameter about how many nodes to consider. Do we need to add that?

pkg/scheduler/framework/frameworkutils.go Show resolved Hide resolved
pkg/scheduler/framework/frameworkutils.go Outdated Show resolved Hide resolved
pkg/scheduler/framework/frameworkutils.go Show resolved Hide resolved
pkg/scheduler/framework/frameworkutils.go Outdated Show resolved Hide resolved
@zhiying-lin
Copy link
Contributor

A special note:

Currently the API has a limit of 100 for cluster decisions, but it also dictates that we need to keep all decisions from selected clusters. Considering that we do not have a schema-side range set for numOfClusters, do we drop decisions when there are over 100 selected clusters or do we set a limit of 100 to numOfClusters as well?

we cannot limit the numberOfClusters for the selectAll type and still need to handle the 100 limit size.

A special note:
Currently the API has a limit of 100 for cluster decisions, but it also dictates that we need to keep all decisions from selected clusters. Considering that we do not have a schema-side range set for numOfClusters, do we drop decisions when there are over 100 selected clusters or do we set a limit of 100 to numOfClusters as well?

I wonder where is the 100 limit come from? I think k8s scheduler has a parameter about how many nodes to consider. Do we need to add that?

100 limit comes from https://github.com/Azure/fleet/blob/main/apis/placement/v1beta1/policysnapshot_types.go#L77

@michaelawyu
Copy link
Contributor Author

A special note:
Currently the API has a limit of 100 for cluster decisions, but it also dictates that we need to keep all decisions from selected clusters. Considering that we do not have a schema-side range set for numOfClusters, do we drop decisions when there are over 100 selected clusters or do we set a limit of 100 to numOfClusters as well?

I wonder where is the 100 limit come from? I think k8s scheduler has a parameter about how many nodes to consider. Do we need to add that?

Hi Ryan! The API has a maxItems = 100 limit on ClusterDecisions, so it may be in conflict with the status updating process if the customer picks over 100 clusters. This does not concern the node threshold much I think; the threshold does not kick unless the cluster has over 100 nodes IIRC. I guess it's kind of safe right now for us to assume that most fleets will not have over 100 member clusters (the design target was 500?)

zhiying-lin
zhiying-lin previously approved these changes Jul 10, 2023
Copy link
Contributor

@zhiying-lin zhiying-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
nit, i feel maxClusterDecisionCount is better than the current one. leaving the decision to you :)

@michaelawyu
Copy link
Contributor Author

michaelawyu commented Jul 10, 2023

Rebased to solve conflicts.

I will keep the old name for now and see if there's a better name.

@michaelawyu michaelawyu merged commit b99ee6b into Azure:main Jul 10, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants