Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support custom scheduler config (without extenders) #5708

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions cluster-autoscaler/config/autoscaling_options.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ package config

import (
"time"

scheduler_config "k8s.io/kubernetes/pkg/scheduler/apis/config"
)

// GpuLimits define lower and upper bound on GPU instances of given type in cluster
Expand Down Expand Up @@ -166,6 +168,9 @@ type AutoscalingOptions struct {
// ScaleDownSimulationTimeout defines the maximum time that can be
// spent on scale down simulation.
ScaleDownSimulationTimeout time.Duration
// SchedulerConfig allows changing configuration of in-tree
// scheduler plugins acting on PreFilter and Filter extension points
SchedulerConfig *scheduler_config.KubeSchedulerConfiguration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think SchedulerConfig belongs in options. SchedulerConfigPath probably would, though - the types here should stay simple (ints/strings/time.Durations).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about this 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the reason we want to keep it simple is because these are AutoscalingOptions and they should represent options instead of result of options.

// NodeDeletionDelayTimeout is maximum time CA waits for removing delay-deletion.cluster-autoscaler.kubernetes.io/ annotations before deleting the node.
NodeDeletionDelayTimeout time.Duration
// WriteStatusConfigMap tells if the status information should be written to a ConfigMap
Expand Down
4 changes: 4 additions & 0 deletions cluster-autoscaler/config/const.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ package config
import "time"

const (
// SchedulerConfigFileFlag is the name of the flag
// for passing in custom scheduler config for in-tree scheduelr plugins
SchedulerConfigFileFlag = "scheduler-config-file"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for putting this here instead of inline with the existing flags?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that it is easier to refer again here.


// DefaultMaxClusterCores is the default maximum number of cores in the cluster.
DefaultMaxClusterCores = 5000 * 64
// DefaultMaxClusterMemory is the default maximum number of gigabytes of memory in cluster.
Expand Down
68 changes: 68 additions & 0 deletions cluster-autoscaler/config/test/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/*
Copyright 2023 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package test

const (
// Custom scheduler configs for testing

// SchedulerConfigNodeResourcesFitDisabled is scheduler config
// with `NodeResourcesFit` plugin disabled
SchedulerConfigNodeResourcesFitDisabled = `
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- pluginConfig:
plugins:
multiPoint:
disabled:
- name: NodeResourcesFit
weight: 1
schedulerName: custom-scheduler`

// SchedulerConfigTaintTolerationDisabled is scheduler config
// with `TaintToleration` plugin disabled
SchedulerConfigTaintTolerationDisabled = `
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- pluginConfig:
plugins:
multiPoint:
disabled:
- name: TaintToleration
weight: 1
schedulerName: custom-scheduler`

// SchedulerConfigMinimalCorrect is the minimal
// correct scheduler config
SchedulerConfigMinimalCorrect = `
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration`

// SchedulerConfigDecodeErr is the scheduler config
// which throws decoding error when we try to load it
SchedulerConfigDecodeErr = `
kind: KubeSchedulerConfiguration`

// SchedulerConfigInvalid is invalid scheduler config
// because we specify percentageOfNodesToScore > 100
SchedulerConfigInvalid = `
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
# percentageOfNodesToScore has to be between 0 and 100
percentageOfNodesToScore: 130`
)
16 changes: 15 additions & 1 deletion cluster-autoscaler/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ import (
"k8s.io/autoscaler/cluster-autoscaler/simulator/clustersnapshot"
"k8s.io/autoscaler/cluster-autoscaler/utils/errors"
kube_util "k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes"
scheduler_util "k8s.io/autoscaler/cluster-autoscaler/utils/scheduler"
"k8s.io/autoscaler/cluster-autoscaler/utils/units"
"k8s.io/autoscaler/cluster-autoscaler/version"
kube_client "k8s.io/client-go/kubernetes"
Expand All @@ -68,6 +69,7 @@ import (
"k8s.io/component-base/config/options"
"k8s.io/component-base/metrics/legacyregistry"
"k8s.io/klog/v2"
scheduler_config "k8s.io/kubernetes/pkg/scheduler/apis/config"
)

// MultiStringFlag is a flag for passing multiple parameters using same flag
Expand Down Expand Up @@ -133,6 +135,7 @@ var (
"for scale down when some candidates from previous iteration are no longer valid."+
"When calculating the pool size for additional candidates we take"+
"max(#nodes * scale-down-candidates-pool-ratio, scale-down-candidates-pool-min-count).")
schedulerConfigFile = flag.String(config.SchedulerConfigFileFlag, "", "scheduler-config allows changing configuration of in-tree scheduler plugins acting on PreFilter and Filter extension points")
nodeDeletionDelayTimeout = flag.Duration("node-deletion-delay-timeout", 2*time.Minute, "Maximum time CA waits for removing delay-deletion.cluster-autoscaler.kubernetes.io/ annotations before deleting the node.")
nodeDeletionBatcherInterval = flag.Duration("node-deletion-batcher-interval", 0*time.Second, "How long CA ScaleDown gather nodes to delete them in batch.")
scanInterval = flag.Duration("scan-interval", 10*time.Second, "How often cluster is reevaluated for scale up or down")
Expand Down Expand Up @@ -274,6 +277,15 @@ func createAutoscalingOptions() config.AutoscalingOptions {
*maxEmptyBulkDeleteFlag = *maxScaleDownParallelismFlag
}

var parsedSchedConfig *scheduler_config.KubeSchedulerConfiguration
// if scheduler config flag was set by the user
if pflag.CommandLine.Changed(config.SchedulerConfigFileFlag) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that if the --scheduler-config-file flag is not used, we don't try to look for the scheduler config in the default path set by the flag i.e., "". We want to use the default scheduler config. Setting the scheduler config to nil ensures that the default scheduler config is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it better than just comparing with ""? If someone specifies an empty value for the flag, shouldn't they get the default scheduler config, rather than an error from trying to open a filename of empty string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two lines of thought here:

  1. User has set --scheduler-config-file="" (note that this is a file path) which means they have made a mistake. Hence, fail.
  2. User has set --scheduler-config-file="" which means they don't want to set a scheduler config by themselves. Fall back to using a default one.

Falling back to default in case of 2 doesn't seem as valuable to me because if the user wants to use the default config, they can just skip the flag altogether because the flag specifies path to a config file (and not scheduler config itself) and if they have set it to "", I think it's more likely that they have made a mistake (which can happen if say this value comes from a helm chart file and it's set to empty there by mistake) because they can just skip specifying the flag if they want to use the default config.

A counter-argument could be, if the user sets only one or two fields in the custom scheduler config we anyway fill rest of the required fields by default (in this PR). It can be argued that 2 is inline with this behavior.

I think failing here makes more sense than using default config but I am open to change it if you think there's a strong reason for not doing so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about someone templating the manifest and using empty string to mean "default", but I guess such templating can also just skip the flag in such cases, so maybe it is fine to just leave as is.

parsedSchedConfig, err = scheduler_util.ConfigFromPath(*schedulerConfigFile)
}
if err != nil {
klog.Fatalf("Failed to get scheduler config: %v", err)
}

return config.AutoscalingOptions{
NodeGroupDefaults: config.NodeGroupAutoscalingOptions{
ScaleDownUtilizationThreshold: *scaleDownUtilizationThreshold,
Expand Down Expand Up @@ -316,6 +328,7 @@ func createAutoscalingOptions() config.AutoscalingOptions {
ScaleDownNonEmptyCandidatesCount: *scaleDownNonEmptyCandidatesCount,
ScaleDownCandidatesPoolRatio: *scaleDownCandidatesPoolRatio,
ScaleDownCandidatesPoolMinCount: *scaleDownCandidatesPoolMinCount,
SchedulerConfig: parsedSchedConfig,
WriteStatusConfigMap: *writeStatusConfigMapFlag,
StatusConfigMapName: *statusConfigMapName,
BalanceSimilarNodeGroups: *balanceSimilarNodeGroupsFlag,
Expand Down Expand Up @@ -422,7 +435,8 @@ func buildAutoscaler(debuggingSnapshotter debuggingsnapshot.DebuggingSnapshotter

eventsKubeClient := createKubeClient(getKubeConfig())

predicateChecker, err := predicatechecker.NewSchedulerBasedPredicateChecker(kubeClient, make(chan struct{}))
predicateChecker, err := predicatechecker.NewSchedulerBasedPredicateChecker(kubeClient,
autoscalingOptions.SchedulerConfig, make(chan struct{}))
if err != nil {
return nil, err
}
Expand Down
21 changes: 14 additions & 7 deletions cluster-autoscaler/simulator/predicatechecker/schedulerbased.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import (
kube_client "k8s.io/client-go/kubernetes"
v1listers "k8s.io/client-go/listers/core/v1"
klog "k8s.io/klog/v2"
"k8s.io/kubernetes/pkg/scheduler/apis/config"
scheduler_config "k8s.io/kubernetes/pkg/scheduler/apis/config/latest"
schedulerframework "k8s.io/kubernetes/pkg/scheduler/framework"
scheduler_plugins "k8s.io/kubernetes/pkg/scheduler/framework/plugins"
Expand All @@ -44,21 +45,26 @@ type SchedulerBasedPredicateChecker struct {
}

// NewSchedulerBasedPredicateChecker builds scheduler based PredicateChecker.
func NewSchedulerBasedPredicateChecker(kubeClient kube_client.Interface, stop <-chan struct{}) (*SchedulerBasedPredicateChecker, error) {
func NewSchedulerBasedPredicateChecker(kubeClient kube_client.Interface, schedConfig *config.KubeSchedulerConfiguration, stop <-chan struct{}) (*SchedulerBasedPredicateChecker, error) {
informerFactory := informers.NewSharedInformerFactory(kubeClient, 0)
config, err := scheduler_config.Default()
if err != nil {
return nil, fmt.Errorf("couldn't create scheduler config: %v", err)

if schedConfig == nil {
var err error
schedConfig, err = scheduler_config.Default()
if err != nil {
return nil, fmt.Errorf("couldn't create scheduler config: %v", err)
}
}
if len(config.Profiles) != 1 || config.Profiles[0].SchedulerName != apiv1.DefaultSchedulerName {
return nil, fmt.Errorf("unexpected scheduler config: expected default scheduler profile only (found %d profiles)", len(config.Profiles))

if len(schedConfig.Profiles) != 1 {
return nil, fmt.Errorf("unexpected scheduler config: expected one scheduler profile only (found %d profiles)", len(schedConfig.Profiles))
}
sharedLister := NewDelegatingSchedulerSharedLister()

framework, err := schedulerframeworkruntime.NewFramework(
context.TODO(),
scheduler_plugins.NewInTreeRegistry(),
&config.Profiles[0],
&schedConfig.Profiles[0],
schedulerframeworkruntime.WithInformerFactory(informerFactory),
schedulerframeworkruntime.WithSnapshotSharedLister(sharedLister),
)
Expand Down Expand Up @@ -181,6 +187,7 @@ func (p *SchedulerBasedPredicateChecker) CheckPredicates(clusterSnapshot cluster
filterReasons,
p.buildDebugInfo(filterName, nodeInfo))
}

return nil
}

Expand Down
Loading