-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support custom scheduler config (without extenders) #5708
feat: support custom scheduler config (without extenders) #5708
Conversation
cluster-autoscaler/main.go
Outdated
return &schedconfig.KubeSchedulerConfiguration{} | ||
} | ||
|
||
obj, gvk, err := schedscheme.Codecs.UniversalDecoder().Decode(data, nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line creates a default scheduler config if none is specified.
TODO: return output of this call as default. If it errors, fall back to the old behavior of getting a default config.
Not doing the TODO above leads to a problem where the scheduler config ends up as empty and it shows up in the error log like this:
$ ./cluster-autoscaler --kubeconfig=/home/suraj/.kube/config --namespace=default
I0428 19:48:18.545718 1865631 leaderelection.go:245] attempting to acquire leader lease default/cluster-autoscaler...
I0428 19:48:40.443736 1865631 leaderelection.go:255] successfully acquired lease default/cluster-autoscaler
E0428 19:48:40.444001 1865631 main.go:385] couldn't load scheduler config (falling back to default scheduler config): open : no such file or directory
F0428 19:48:40.446465 1865631 main.go:504] Failed to create autoscaler: unexpected scheduler config: expected one scheduler profile only (found 0 profiles)
Deleted my comment around test results because the screenshots have sensitive info. Will post another comment with sensitive details redacted. |
Note that Go tests are probably failing because of #5708 (comment) Testing for scheduler extenders
Custom scheduler manifestsapiVersion: v1
kind: ServiceAccount
metadata:
name: custom-scheduler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: custom-scheduler-as-kube-scheduler
subjects:
- kind: ServiceAccount
name: custom-scheduler
namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-admin # only for testing
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
name: custom-scheduler-config
namespace: kube-system
data:
custom-scheduler-config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: custom-scheduler
leaderElection:
leaderElect: false
# I want scheduler to call an external http extender to
# filter schedulable nodes
extenders:
- urlPrefix: http://sample-scheduler-extender-service.default
filterVerb: apiv1/filter
nodeCacheCapable: false
ignorable: false
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
component: scheduler
tier: control-plane
name: custom-scheduler
namespace: kube-system
spec:
selector:
matchLabels:
component: scheduler
tier: control-plane
replicas: 1
template:
metadata:
labels:
component: scheduler
tier: control-plane
version: second
spec:
serviceAccountName: custom-scheduler
volumes:
- name: config-volume
configMap:
name: custom-scheduler-config
containers:
- command:
- /usr/local/bin/kube-scheduler
- --config=/etc/kubernetes/custom-scheduler/custom-scheduler-config.yaml
- --v=5
image: k8s.gcr.io/kube-scheduler:v1.25.3
volumeMounts:
- name: config-volume
mountPath: /etc/kubernetes/custom-scheduler
livenessProbe:
httpGet:
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 15
name: kube-second-scheduler
readinessProbe:
httpGet:
path: /healthz
port: 10259
scheme: HTTPS
resources:
requests:
cpu: '0.1'
securityContext:
privileged: false
hostNetwork: false
hostPID: false
When nodes don't have
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign
cluster-autoscaler/main.go
Outdated
@@ -368,6 +376,36 @@ func getKubeConfig() *rest.Config { | |||
return kubeConfig | |||
} | |||
|
|||
func getSchedulerConfig(path string) *schedconfig.KubeSchedulerConfiguration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function represents non-trivial logic which I think should live in a separate package, not main. (Same goes for a lot of the existing code, but I think least we can do is not making things worse.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea. Thanks for the feedback.
P.S.:
I created this PR as a PoC to evaluate the idea and get an OK from the community before proceeding ahead (one of the items on my agenda for today i.e., 8th May 2023's SIG meeting since we didn't have 2 meetings before). I had planned on refactoring this function and the PR as a whole (+ tests) but stopped myself so that I don't end up spending too much time in case this feature doesn't sound like a good idea to the community.
func (p *SchedulerBasedPredicateChecker) runExtendersForNode(pod *apiv1.Pod, nodeInfo *schedulerframework.NodeInfo) *PredicateError { | ||
for _, extender := range p.framework.Extenders() { | ||
if extender.IsInterested(pod) { | ||
filteredNodes, failedNodesMap, unresolvableNodesMap, err := extender.Filter(pod, []*apiv1.Node{nodeInfo.Node()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Looks like extender.Filter
accepts a list of nodes - wouldn't it make more sense to run this once after PreFilter
, but before Filter
and pass all the nodes that came out of PreFilter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing all nodes extender.Filter
at once so that we don't call it multiple times makes sense to me but same could be said of running filter plugins (even though cost of calling in-tree filter plugins per node won't be as high). I just took a safe bet here and went with the same approach as running filter plugins (might not be the best approach as you rightfully pointed out). I guess the question here would be, why didn't we do the same for running filter plugins?
extender.Filter
is called in the upstream scheduler code after all PreFilter
and Filter
plugins are called. I don't see much benefit in changing the order.
File: pkg/scheduler/schedule_one.go
385: // Filters the nodes to find the ones that fit the pod based on the framework
386: // filter plugins and filter extenders.
387: func (sched *Scheduler) findNodesThatFitPod(ctx context.Context, fwk framework.Framework, state *framework.CycleState, pod *v1.Pod) ([]*v1.Node, framework.Diagnosis, error) {
...
393: allNodes, err := sched.nodeInfoSnapshot.NodeInfos().List()
394: if err != nil {
395: return nil, diagnosis, err
396: }
397: // Run "prefilter" plugins.
398: preRes, s := fwk.RunPreFilterPlugins(ctx, state, pod)
399: ...
413:
414: ...
438: feasibleNodes, err := sched.findNodesThatPassFilters(ctx, fwk, state, pod, diagnosis, nodes)
...
446:
447: feasibleNodes, err = findNodesThatPassExtenders(sched.Extenders, pod, feasibleNodes, diagnosis.NodeToStatusMap)
448: if err != nil {
449: return nil, diagnosis, err
450: }
File: pkg/scheduler/schedule_one.go
576: func findNodesThatPassExtenders(extenders []framework.Extender, pod *v1.Pod, feasibleNodes []*v1.Node, statuses framework.NodeToStatusMap) ([]*v1.Node, error) {
577: // Extenders are called sequentially.
578: // Nodes in original feasibleNodes can be excluded in one extender, and pass on to the next
579: // extender in a decreasing manner.
...
587:
588: // Status of failed nodes in failedAndUnresolvableMap will be added or overwritten in <statuses>,
589: // so that the scheduler framework can respect the UnschedulableAndUnresolvable status for
590: // particular nodes, and this may eventually improve preemption efficiency.
591: // Note: users are recommended to configure the extenders that may return UnschedulableAndUnresolvable
592: // status ahead of others.
593: feasibleList, failedMap, failedAndUnresolvableMap, err := extender.Filter(pod, feasibleNodes)
...
} | ||
|
||
if err := p.runExtendersForNode(pod, nodeInfo); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not ignore err
here? I think it has some useful information when not nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I will think something around this. Thanks for the comment.
Supporting configuration of existing PreFilters/Filters seems non-controversial to me. I'm worried about extenders though. The 2 specific problems I see are:
The problem with those 2 issues is that they are subtle and a simple enough test (an extender that only looks at the pending pod and is independent from any other pods in a small cluster) will work just fine. Even a more complex extender will not fail every time and the probability of problems likely depends on scale and complexity of the cluster (ie. my tests may pass just fine and it will only blow up once I hit prod). |
Thank you for the comment.
I had a discussion around 2 with @mwielgus in yesterday's SIG meeting (8th May 2023) and he had similar concerns (summary of the discussion). There are concerns around performance of HTTP extender in scheduler as well (ref). That being said,
I understand your concern. Personally, I think we can add some checks to disable the extender when the cluster size becomes too big but there are problems with that approach too since it would surprise our users (e.g., I will drop the idea of supporting external extenders and proceed ahead with supporting custom scheduler config for in-tree scheduler plugins. Thanks for taking the time to write all of this. Really helps understand the |
@@ -27,7 +27,7 @@ import ( | |||
kube_client "k8s.io/client-go/kubernetes" | |||
v1listers "k8s.io/client-go/listers/core/v1" | |||
klog "k8s.io/klog/v2" | |||
scheduler_config "k8s.io/kubernetes/pkg/scheduler/apis/config/latest" | |||
config "k8s.io/kubernetes/pkg/scheduler/apis/config" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This alias is a no-op, the package is called config
already.
cluster-autoscaler/main.go
Outdated
@@ -132,6 +133,7 @@ var ( | |||
"for scale down when some candidates from previous iteration are no longer valid."+ | |||
"When calculating the pool size for additional candidates we take"+ | |||
"max(#nodes * scale-down-candidates-pool-ratio, scale-down-candidates-pool-min-count).") | |||
schedulerConfig = flag.String("scheduler-config-file", "", "scheduler-config allows adding extenders and changing configuration of default scheduler plugins acting on PreFilter and Filter extension points") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is not accurate anymore as extenders won't be used.
// GetSchedulerConfig loads scheduler config from a path | ||
// (falls back to default config in case of error) | ||
func GetSchedulerConfig(path string) (*schedconfig.KubeSchedulerConfiguration, error) { | ||
defaultFallbackMsg := "(falling back to default scheduler config)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the fallback really the desired behavior here? This can hide some errors - e.g. user making a typo in config path will expect custom config to be used, but CA instead of crashing right away will silently fall back to the default configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using default config instead of failing fast might surprise our users. I will change this. Thanks for the suggestion.
|
||
// GetSchedulerConfig loads scheduler config from a path | ||
// (falls back to default config in case of error) | ||
func GetSchedulerConfig(path string) (*schedconfig.KubeSchedulerConfiguration, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is actually the main code change in this PR now. Maybe worth adding some unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have added a comment after #5708 (comment). Since we have moved on from the PoC phase, I was planning to do another pass on PR (cleanup, add unit tests etc.,) and the change the PR from draft to Ready for Review. Maybe I can mention you again here once I am done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, sounds good!
clientsetfake "k8s.io/client-go/kubernetes/fake" | ||
) | ||
|
||
// NewTestPredicateChecker builds test version of PredicateChecker. | ||
func NewTestPredicateChecker() (PredicateChecker, error) { | ||
|
||
cfg, err := scheduler.GetSchedulerConfig("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why not use the default config directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me think about this 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been updated to use the default scheduler config in case no scheduler config is passed through the function parameter: https://github.com/kubernetes/autoscaler/pull/5708/files#diff-5818325464670a5cd54280fe253b0841a163da5de2764491a3065805785ffb1bR51-R56
|
||
// this needs to be set explicitly because config's api version is empty after decoding | ||
// check kubernetes/cmd/kube-scheduler/app/options/configfile.go for more info | ||
cfgObj.TypeMeta.APIVersion = gvk.GroupVersion().String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is based on upstream scheduler's private function called loadConfigFromFile
File: cmd/kube-scheduler/app/options/configfile.go
34: func loadConfigFromFile(logger klog.Logger, file string) (*config.KubeSchedulerConfiguration, error) {
35: data, err := os.ReadFile(file)
36: if err != nil {
37: return nil, err
38: }
39:
40: return loadConfig(logger, data)
41: }
42:
43: func loadConfig(logger klog.Logger, data []byte) (*config.KubeSchedulerConfiguration, error) {
44: // The UniversalDecoder runs defaulting and returns the internal type by default.
45: obj, gvk, err := scheme.Codecs.UniversalDecoder().Decode(data, nil, nil)
46: if err != nil {
47: return nil, err
48: }
49: if cfgObj, ok := obj.(*config.KubeSchedulerConfiguration); ok {
50: // We don't set this field in pkg/scheduler/apis/config/{version}/conversion.go
51: // because the field will be cleared later by API machinery during
52: // conversion. See KubeSchedulerConfiguration internal type definition for
53: // more details.
54: cfgObj.TypeMeta.APIVersion = gvk.GroupVersion().String()
55: switch cfgObj.TypeMeta.APIVersion {
56: case configv1beta2.SchemeGroupVersion.String():
57: logger.Info("KubeSchedulerConfiguration v1beta2 is deprecated in v1.25, will be removed in v1.28")
58: case configv1beta3.SchemeGroupVersion.String():
59: logger.Info("KubeSchedulerConfiguration v1beta3 is deprecated in v1.26, will be removed in v1.29")
60: }
61: return cfgObj, nil
62: }
63: return nil, fmt.Errorf("couldn't decode as KubeSchedulerConfiguration, got %s: ", gvk)
64: }
Note that I haven't included the deprecated log messages because they might change and we'd have to remember updating them every time. I don't see a lot of value in printing these messages because we'd have to support the scheduler config even if it is deprecated anyway (open to other opinions here).
Note that if the user uses an old scheduler api version which isn't supported anymore like say v1alpha1
, UniversalDecoder().Decode
throws an error like this:
panic: no kind "KubeSchedulerConfiguration" is registered for version "kubescheduler.config.k8s.io/v1alpha1" in scheme "pkg/runtime/scheme.go:100"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this function public in scheduler codebase instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. I thought about this but I was also on the fence because scheduler has a similar function to write the config to a file
File: cmd/kube-scheduler/app/options/configfile.go
091: // LogOrWriteConfig logs the completed component config and writes it into the given file name as YAML, if either is enabled
092: func LogOrWriteConfig(logger klog.Logger, fileName string, cfg *config.KubeSchedulerConfiguration, completedProfiles []config.KubeSchedulerProfile) error {
093: loggerV := logger.V(2)
094: if !loggerV.Enabled() && len(fileName) == 0 {
095: return nil
096: }
097: cfg.Profiles = completedProfiles
098:
099: buf, err := encodeConfig(cfg)
100: if err != nil {
101: return err
102: }
103:
104: if loggerV.Enabled() {
105: loggerV.Info("Using component config", "config", buf.String())
106: }
107:
108: if len(fileName) > 0 {
109: configFile, err := os.Create(fileName)
110: if err != nil {
111: return err
112: }
113: defer configFile.Close()
114: if _, err := io.Copy(configFile, buf); err != nil {
115: return err
116: }
117: logger.Info("Wrote configuration", "file", fileName)
118: os.Exit(0)
119: }
120: return nil
121: }
Notice how it is calling os.Exit(0)
on line 118. If you import and use this function in your code, the program is going to exit when you call this function. My concern is, if we don't implement the function to read config file on our own we might make ourselves dependent on the implementation of LoadConfigFromFile
(if loadConfigFromFile
is converted to a public function) whose implementation is dependent on use-case of scheduler and not CA. If there's a change in the LoadConfigFromFile
in the future, it will break things for us. That being said, same thing can be argued for Filter
and PostFilter
extension points (public functions which scheduler framework exposes) but I think there's an important difference between Filter
/PostFilter
extension points and LoadConfigFromFile
or LogOrWriteConfig
. Former is similar to a public API and it is unlikely to change without a strong deprecation policy or notice to users but that might not be the case with latter.
If we decide to go with LoadConfigFromFile
way, we have 2 possible options:
Option 1:
- Get this PR in to CA codebase
- Talk with sig-scheduling about setting expectation about
LoadConfigFromFile
- Create another PR to replace our implementation with sig-scheduling's implementation (might have to wait until they release 1.28 so that we can import their code OR ask for a patch to 1.27)
Option 2:
- Put this PR on hold
- Talk with sig-scheduling about setting expectation about
LoadConfigFromFile
- Create another PR to replace our implementation with sig-scheduling's implementation (might have to wait until they release 1.28 so that we can import their code or ask for a patch to 1.27)
If we decide to go with asking for a public implementation of LoadConfigFromFile
and we want to go with Option 1, I will create a separate issue to co-ordinate with sig-scheduling and create another PR to use the public implementation. If we go with Option 2, we will have to put this PR on hold until we get LoadConfigFromFile
.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried changes in scheduler will never happen if we merge here first. OTOH, waiting for k8s release is a bit too long, so I'd:
- Send a PR to update scheduler code
- Add TODO here to switch to the upstream function and merge as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR to update scheduler code: kubernetes/kubernetes#119057
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added TODO here
// Note that even if we are passing minimal config like below | ||
// `GetSchedulerConfig` will set the rest of the default fields | ||
// on its own (including default profile and default plugins) | ||
correctConfigFile := filepath.Join(tmpDir, "correct_config.yaml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a simple program to test this. Just put the scheduler config in test-config.yaml
import (
"fmt"
"os"
"gopkg.in/yaml.v2"
config "k8s.io/kubernetes/pkg/scheduler/apis/config"
"k8s.io/kubernetes/pkg/scheduler/apis/config/scheme"
"k8s.io/kubernetes/pkg/scheduler/apis/config/validation"
)
func main() {
file := "test-config.yaml"
data, err := os.ReadFile(file)
if err != nil {
panic(err)
}
obj, gvk, err := scheme.Codecs.UniversalDecoder().Decode(data, nil, nil)
if err != nil {
panic(err)
}
cfgObj, ok := obj.(*config.KubeSchedulerConfiguration)
if !ok {
panic(fmt.Errorf("couldn't decode as KubeSchedulerConfiguration, got %s: ", gvk))
}
cfgObj.TypeMeta.APIVersion = gvk.GroupVersion().String()
if err := validation.ValidateKubeSchedulerConfiguration(cfgObj); err != nil {
panic(err)
}
data, err = yaml.Marshal(cfgObj)
if err != nil {
panic(err)
}
fmt.Println(string(data))
}
test-config.yaml
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
Click here to see the output of the program
$ go run main.go
typemeta:
kind: ""
apiversion: kubescheduler.config.k8s.io/v1
parallelism: 16
leaderelection:
leaderelect: true
leaseduration:
duration: 15s
renewdeadline:
duration: 10s
retryperiod:
duration: 2s
resourcelock: leases
resourcename: kube-scheduler
resourcenamespace: kube-system
clientconnection:
kubeconfig: ""
acceptcontenttypes: ""
contenttype: application/vnd.kubernetes.protobuf
qps: 50
burst: 100
healthzbindaddress: ""
metricsbindaddress: ""
debuggingconfiguration:
enableprofiling: true
enablecontentionprofiling: true
percentageofnodestoscore: 0
podinitialbackoffseconds: 1
podmaxbackoffseconds: 10
profiles:
- schedulername: default-scheduler
percentageofnodestoscore: null
plugins:
preenqueue:
enabled: []
disabled: []
queuesort:
enabled: []
disabled: []
prefilter:
enabled: []
disabled: []
filter:
enabled: []
disabled: []
postfilter:
enabled: []
disabled: []
prescore:
enabled: []
disabled: []
score:
enabled: []
disabled: []
reserve:
enabled: []
disabled: []
permit:
enabled: []
disabled: []
prebind:
enabled: []
disabled: []
bind:
enabled: []
disabled: []
postbind:
enabled: []
disabled: []
multipoint:
enabled:
- name: PrioritySort
weight: 0
- name: NodeUnschedulable
weight: 0
- name: NodeName
weight: 0
- name: TaintToleration
weight: 3
- name: NodeAffinity
weight: 2
- name: NodePorts
weight: 0
- name: NodeResourcesFit
weight: 1
- name: VolumeRestrictions
weight: 0
- name: EBSLimits
weight: 0
- name: GCEPDLimits
weight: 0
- name: NodeVolumeLimits
weight: 0
- name: AzureDiskLimits
weight: 0
- name: VolumeBinding
weight: 0
- name: VolumeZone
weight: 0
- name: PodTopologySpread
weight: 2
- name: InterPodAffinity
weight: 2
- name: DefaultPreemption
weight: 0
- name: NodeResourcesBalancedAllocation
weight: 1
- name: ImageLocality
weight: 1
- name: DefaultBinder
weight: 0
disabled: []
pluginconfig:
- name: DefaultPreemption
args:
typemeta:
kind: ""
apiversion: ""
mincandidatenodespercentage: 10
mincandidatenodesabsolute: 100
- name: InterPodAffinity
args:
typemeta:
kind: ""
apiversion: ""
hardpodaffinityweight: 1
ignorepreferredtermsofexistingpods: false
- name: NodeAffinity
args:
typemeta:
kind: ""
apiversion: ""
addedaffinity: null
- name: NodeResourcesBalancedAllocation
args:
typemeta:
kind: ""
apiversion: ""
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
- name: NodeResourcesFit
args:
typemeta:
kind: ""
apiversion: ""
ignoredresources: []
ignoredresourcegroups: []
scoringstrategy:
type: LeastAllocated
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
requestedtocapacityratio: null
- name: PodTopologySpread
args:
typemeta:
kind: ""
apiversion: ""
defaultconstraints: []
defaultingtype: System
- name: VolumeBinding
args:
typemeta:
kind: ""
apiversion: ""
bindtimeoutseconds: 600
shape: []
extenders: []
Note that direct marshalling to yaml is not the right way to serialize the scheduler config but I have used to for the sake of simplicity.
07313a7
to
db8507e
Compare
|
||
var parsedSchedConfig *scheduler_config.KubeSchedulerConfiguration | ||
// if scheduler config flag was set by the user | ||
if pflag.CommandLine.Changed(config.SchedulerConfigFileFlag) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so that if the --scheduler-config-file
flag is not used, we don't try to look for the scheduler config in the default path set by the flag i.e., ""
. We want to use the default scheduler config. Setting the scheduler config to nil
ensures that the default scheduler config is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it better than just comparing with ""
? If someone specifies an empty value for the flag, shouldn't they get the default scheduler config, rather than an error from trying to open a filename of empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two lines of thought here:
- User has set
--scheduler-config-file=""
(note that this is a file path) which means they have made a mistake. Hence, fail. - User has set
--scheduler-config-file=""
which means they don't want to set a scheduler config by themselves. Fall back to using a default one.
Falling back to default in case of 2 doesn't seem as valuable to me because if the user wants to use the default config, they can just skip the flag altogether because the flag specifies path to a config file (and not scheduler config itself) and if they have set it to ""
, I think it's more likely that they have made a mistake (which can happen if say this value comes from a helm chart file and it's set to empty there by mistake) because they can just skip specifying the flag if they want to use the default config.
A counter-argument could be, if the user sets only one or two fields in the custom scheduler config we anyway fill rest of the required fields by default (in this PR). It can be argued that 2 is inline with this behavior.
I think failing here makes more sense than using default config but I am open to change it if you think there's a strong reason for not doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about someone templating the manifest and using empty string to mean "default", but I guess such templating can also just skip the flag in such cases, so maybe it is fine to just leave as is.
scheduledPods: []*apiv1.Pod{p450}, | ||
testPod: p500, | ||
expectError: false, | ||
name: "custom - other pod - ok", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test cases for custom predicate checker are a copy of the default predicate checker 's test cases except we don't expect an error for any of the custom predicate checker test cases.
assert.True(t, nodeName == "n1000" || nodeName == "n2000") | ||
// custom predicate checker test cases | ||
{ | ||
name: "custom - small pod - no error", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test cases for custom predicate checker are a copy of the default predicate checker's test cases except we don't expect an error for any of the custom predicate checker test cases.
cluster-autoscaler/config/const.go
Outdated
@@ -32,4 +36,53 @@ const ( | |||
DefaultScaleDownUnreadyTimeKey = "scaledownunreadytime" | |||
// DefaultMaxNodeProvisionTimeKey identifies MaxNodeProvisionTime autoscaling option | |||
DefaultMaxNodeProvisionTimeKey = "maxnodeprovisiontime" | |||
|
|||
// Custom scheduler configs for testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this file is a good home for test data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess putting test consts in const.go
feels a little out of place. But I don't have a strong opinion on where this should go. I will take another look at this (P.S.: open to suggestions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved the test config here to a separate package under at cluster-autoscaler/config/test/config.go
This test config is used at two different places:
- simulator/predicatechecker/schedulerbased_test.go
- utils/scheduler/scheduler_test.go
Other place this can go into is the existing cluster-autoscaler/utils/test/test_utils.go
. Since this is config for testing, putting it under a separate test
package under cluster-autoscaler/config
seems like a better place.
Let me know what you think.
|
||
// this needs to be set explicitly because config's api version is empty after decoding | ||
// check kubernetes/cmd/kube-scheduler/app/options/configfile.go for more info | ||
cfgObj.TypeMeta.APIVersion = gvk.GroupVersion().String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried changes in scheduler will never happen if we merge here first. OTOH, waiting for k8s release is a bit too long, so I'd:
- Send a PR to update scheduler code
- Add TODO here to switch to the upstream function and merge as is
@@ -106,3 +117,31 @@ func ResourceToResourceList(r *schedulerframework.Resource) apiv1.ResourceList { | |||
} | |||
return result | |||
} | |||
|
|||
// GetSchedulerConfig loads scheduler config from a path. | |||
func GetSchedulerConfig(path string) (*scheduler_config.KubeSchedulerConfiguration, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ConfigFromPath
? (So it becomes scheduler.ConfigFromPath(path)
outside of this package.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I have changed the name to ConfigFromPath
. scheduler.ConfigFromPath
feels less vague than scheduler.GetSchedulerConfig
.
08d8af8
to
ea715bb
Compare
@x13n I have addressed all the review comments. Waiting for your approval. |
Apologies for the delay. The change LGTM, but there's a ton of commits here. Can you squash them before merging? |
0966967
to
aba1df8
Compare
@x13n I have squashed the commits. |
…ithout extenders) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `--scheduler-config` -> `--scheduler-config-file` to avoid confusion Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: `goto` causing infinite loop - abstract out running extenders in a separate function Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove code around extenders - we decided not to use scheduler extenders for checking if a pod would fit on a node Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: move scheduler config to a `utils/scheduler` package` - use default config as a fallback Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix static_autoscaler test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: `GetSchedulerConfiguration` fn - remove falling back - add mechanism to detect if the scheduler config file flag was set - Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: wip add tests for `GetSchedulerConfig` - tests are failing now Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `GetSchedulerConfig` - abstract error messages so that we can use them in the tests - set api version explicitly (this is what upstream does as well) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: do a round of cleanup to make PR ready for review - make import names consistent Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: use `pflag` to check if the `--scheduler-config-file` flag was set Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add comments for exported error constants Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't export error messages - exporting is not needed Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: add underscore in test file name Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix test failing because of no comment on exported `SchedulerConfigFileFlag` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refacotr: change name of flag variable `schedulerConfig` -> `schedulerConfigFile` - avoids confusion Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add extra test cases for predicate checker - where the predicate checker uses custom scheduler config Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `setFlags` variable - not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: abstract custom scheduler configs into `conifg` package - make them constants Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix linting error Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: introduce a new custom test predicate checker - instead of adding a param to the current one - this is so that we don't have to pass `nil` to the existing test predicate checker in many places Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `NewCustomPredicateChecker` -> `NewTestPredicateCheckerWithCustomConfig` - latter narrows down meaning of the function better than former Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `GetSchedulerConfig` -> `ConfigFromPath` - `scheduler.ConfigFromPath` is shorter and feels less vague than `scheduler.GetSchedulerConfig` - move test config to a new package `test` under `config` package Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add `TODO` for replacing code to parse scheduler config - with upstream function Signed-off-by: vadasambar <surajrbanakar@gmail.com>
aba1df8
to
23f03e1
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vadasambar, x13n The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: vadasambar surajrbanakar@gmail.com
What type of PR is this?
/kind feature
What this PR does / why we need it:
Problem
PreFilter
andFilter
extension points (which is used in predicate checker).This PR introduces
--scheduler-config-file
flag to2. add support for scheduler extenders so that the predicate checker can call an external filter URL.DECIDED AGAINST GOING FORWARD WITH EXTENDERS. Check feat: support custom scheduler config (without extenders) #5708 (comment) and feat: support custom scheduler config (without extenders) #5708 (comment)Which issue(s) this PR fixes:
Fixes #5106
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: