feat: scheduler (12/): add more scheduler logic #418

michaelawyu · 2023-07-05T07:22:00Z

Description of your changes

This PR is part of the PRs that implement the Fleet workload scheduling.

It features more scheduling logic for PickAll type CRPs.

I have:

Run make reviewable to ensure this PR is ready for review.

How has this code been tested

Unit tests

Special notes for your reviewer

To control the size of the PR, certain unit tests are not checked in; they will be sent in a separate PR.

michaelawyu · 2023-07-05T07:24:29Z

A special note:

Currently the API has a limit of 100 for cluster decisions, but it also dictates that we need to keep all decisions from selected clusters. Considering that we do not have a schema-side range set for numOfClusters, do we drop decisions when there are over 100 selected clusters or do we set a limit of 100 to numOfClusters as well?

ryanzhang-oss · 2023-07-05T21:25:12Z

A special note:

Currently the API has a limit of 100 for cluster decisions, but it also dictates that we need to keep all decisions from selected clusters. Considering that we do not have a schema-side range set for numOfClusters, do we drop decisions when there are over 100 selected clusters or do we set a limit of 100 to numOfClusters as well?

I wonder where is the 100 limit come from? I think k8s scheduler has a parameter about how many nodes to consider. Do we need to add that?

pkg/scheduler/framework/frameworkutils.go

zhiying-lin · 2023-07-06T07:04:38Z

A special note:

Currently the API has a limit of 100 for cluster decisions, but it also dictates that we need to keep all decisions from selected clusters. Considering that we do not have a schema-side range set for numOfClusters, do we drop decisions when there are over 100 selected clusters or do we set a limit of 100 to numOfClusters as well?

we cannot limit the numberOfClusters for the selectAll type and still need to handle the 100 limit size.

A special note:
Currently the API has a limit of 100 for cluster decisions, but it also dictates that we need to keep all decisions from selected clusters. Considering that we do not have a schema-side range set for numOfClusters, do we drop decisions when there are over 100 selected clusters or do we set a limit of 100 to numOfClusters as well?

I wonder where is the 100 limit come from? I think k8s scheduler has a parameter about how many nodes to consider. Do we need to add that?

100 limit comes from https://github.com/Azure/fleet/blob/main/apis/placement/v1beta1/policysnapshot_types.go#L77

pkg/scheduler/framework/framework.go

michaelawyu · 2023-07-06T15:11:04Z

A special note:
Currently the API has a limit of 100 for cluster decisions, but it also dictates that we need to keep all decisions from selected clusters. Considering that we do not have a schema-side range set for numOfClusters, do we drop decisions when there are over 100 selected clusters or do we set a limit of 100 to numOfClusters as well?

I wonder where is the 100 limit come from? I think k8s scheduler has a parameter about how many nodes to consider. Do we need to add that?

Hi Ryan! The API has a maxItems = 100 limit on ClusterDecisions, so it may be in conflict with the status updating process if the customer picks over 100 clusters. This does not concern the node threshold much I think; the threshold does not kick unless the cluster has over 100 nodes IIRC. I guess it's kind of safe right now for us to assume that most fleets will not have over 100 member clusters (the design target was 500?)

pkg/scheduler/framework/frameworkutils.go

zhiying-lin

LGTM
nit, i feel maxClusterDecisionCount is better than the current one. leaving the decision to you :)

michaelawyu · 2023-07-10T11:21:00Z

Rebased to solve conflicts.

I will keep the old name for now and see if there's a better name.

zhiying-lin reviewed Jul 6, 2023

View reviewed changes

pkg/scheduler/framework/frameworkutils.go Show resolved Hide resolved

pkg/scheduler/framework/frameworkutils.go Outdated Show resolved Hide resolved

pkg/scheduler/framework/frameworkutils.go Show resolved Hide resolved

pkg/scheduler/framework/frameworkutils.go Outdated Show resolved Hide resolved

zhiying-lin reviewed Jul 6, 2023

View reviewed changes

pkg/scheduler/framework/framework.go Show resolved Hide resolved

zhiying-lin reviewed Jul 7, 2023

View reviewed changes

pkg/scheduler/framework/frameworkutils.go Show resolved Hide resolved

michaelawyu added 5 commits July 10, 2023 06:54

Added more scheduler logic

731465c

Minor fixes

986a804

Minor fixes

0fe7586

Minor fixes

ab4fa12

Rebased

96325d1

zhiying-lin previously approved these changes Jul 10, 2023

View reviewed changes

michaelawyu dismissed zhiying-lin’s stale review via 96325d1 July 10, 2023 11:19

michaelawyu force-pushed the scheduler-framework-logic-4 branch from 87dd498 to 96325d1 Compare July 10, 2023 11:19

michaelawyu merged commit b99ee6b into Azure:main Jul 10, 2023
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: scheduler (12/): add more scheduler logic #418

feat: scheduler (12/): add more scheduler logic #418

michaelawyu commented Jul 5, 2023

michaelawyu commented Jul 5, 2023

ryanzhang-oss commented Jul 5, 2023

zhiying-lin commented Jul 6, 2023

michaelawyu commented Jul 6, 2023

zhiying-lin left a comment

michaelawyu commented Jul 10, 2023 •

edited

Loading

feat: scheduler (12/): add more scheduler logic #418

feat: scheduler (12/): add more scheduler logic #418

Conversation

michaelawyu commented Jul 5, 2023

Description of your changes

How has this code been tested

Special notes for your reviewer

michaelawyu commented Jul 5, 2023

ryanzhang-oss commented Jul 5, 2023

zhiying-lin commented Jul 6, 2023

michaelawyu commented Jul 6, 2023

zhiying-lin left a comment

Choose a reason for hiding this comment

michaelawyu commented Jul 10, 2023 • edited Loading

michaelawyu commented Jul 10, 2023 •

edited

Loading