Skip to content

Commit

Permalink
Add dynamic percentage of node scoring to user docs
Browse files Browse the repository at this point in the history
  • Loading branch information
bsalamat committed Jan 16, 2019
1 parent 551489f commit 8ca52cc
Showing 1 changed file with 35 additions and 33 deletions.
68 changes: 35 additions & 33 deletions content/en/docs/concepts/configuration/scheduler-perf-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,31 +8,37 @@ weight: 70

{{% capture overview %}}

{{< feature-state for_k8s_version="1.12" >}}
{{< feature-state for_k8s_version="1.14" >}}

Kube-scheduler is the Kubernetes default scheduler. It is responsible for
placement of Pods on Nodes in a cluster. Nodes in a cluster that meet the
scheduling requirements of a Pod are called "feasible" Nodes for the Pod. The
scheduler finds feasible Nodes for a Pod and then runs a set of functions to
score the feasible Nodes and picks a Node with the highest score among the
feasible ones to run the Pod. The scheduler then notifies the API server about this
decision in a process called "Binding".
feasible ones to run the Pod. The scheduler then notifies the API server about
this decision in a process called "Binding".

{{% /capture %}}

{{% capture body %}}

## Percentage of Nodes to Score

Before Kubernetes 1.12, Kube-scheduler used to check the feasibility of all the
nodes in a cluster and then scored the feasible ones. Kubernetes 1.12 has a new
feature that allows the scheduler to stop looking for more feasible nodes once
it finds a certain number of them. This improves the scheduler's performance in
large clusters. The number is specified as a percentage of the cluster size and
is controlled by a configuration option called `percentageOfNodesToScore`. The
range should be between 1 and 100. Other values are considered as 100%. The
default value of this option is 50%. A cluster administrator can change this value by providing a
different value in the scheduler configuration. However, it may not be necessary to change this value.
Before Kubernetes 1.12, Kube-scheduler used to check the feasibility of all
nodes in a cluster and then scored the feasible ones. Kubernetes 1.12 added a
new feature that allows the scheduler to stop looking for more feasible nodes
once it finds a certain number of them. This improves the scheduler's
performance in large clusters. The number is specified as a percentage of the
cluster size. The percentage can be controlled by a configuration option called
`percentageOfNodesToScore`. The range should be between 1 and 100. Larger values
are considered as 100%. Kubernetes 1.14 has logic to find the percentage of
nodes to score based on the size of the cluster if it is not specified in the
configuration. It uses a linear formula which yields 50% for a 100-node cluster.
The formula yields 10% for a 5000-node cluster. The lower bound is 5%. In other
words, the scheduler will always scores at least 5% of the cluster no matter how
large the cluster is.

Below is an example configuration that sets `percentageOfNodesToScore` to 50%.

```yaml
apiVersion: componentconfig/v1alpha1
Expand All @@ -45,41 +51,37 @@ algorithmSource:
percentageOfNodesToScore: 50
```
{{< note >}}
In clusters with zero or less than 50 feasible nodes, the
scheduler still checks all the nodes, simply because there are not enough
feasible nodes to stop the scheduler's search early.
{{< /note >}}
{{< note >}} In clusters with less than 50 feasible nodes, the scheduler still
checks all the nodes, simply because there are not enough feasible nodes to stop
the scheduler's search early. {{< /note >}}
**To disable this feature**, you can set `percentageOfNodesToScore` to 100.

### Tuning percentageOfNodesToScore

`percentageOfNodesToScore` must be a value between 1 and 100
with the default value of 50. There is also a hardcoded minimum value of 50
nodes which is applied internally. The scheduler tries to find at
least 50 nodes regardless of the value of `percentageOfNodesToScore`. This means
that changing this option to lower values in clusters with several hundred nodes
will not have much impact on the number of feasible nodes that the scheduler
tries to find. This is intentional as this option is unlikely to improve
performance noticeably in smaller clusters. In large clusters with over a 1000
nodes setting this value to lower numbers may show a noticeable performance
improvement.
`percentageOfNodesToScore` must be a value between 1 and 100 with the default
value being calculated based on the cluster size. There is also a hardcoded
minimum value of 50 nodes or 5% whichever is larger. This means that changing
this option to lower values in clusters with several hundred nodes will not have
much impact on the number of feasible nodes that the scheduler tries to find.
This is intentional as this option is unlikely to improve performance noticeably
in smaller clusters. In large clusters with over a 1000 nodes setting this value
to lower numbers may show a noticeable performance improvement.

An important note to consider when setting this value is that when a smaller
number of nodes in a cluster are checked for feasibility, some nodes are not
sent to be scored for a given Pod. As a result, a Node which could possibly
score a higher value for running the given Pod might not even be passed to the
scoring phase. This would result in a less than ideal placement of the Pod. For
this reason, the value should not be set to very low percentages. A general rule
of thumb is to never set the value to anything lower than 30. Lower values
of thumb is to never set the value to anything lower than 10. Lower values
should be used only when the scheduler's throughput is critical for your
application and the score of nodes is not important. In other words, you prefer
to run the Pod on any Node as long as it is feasible.

It is not recommended to lower this value from its default if your cluster has
only several hundred Nodes. It is unlikely to improve the scheduler's
performance significantly.
If your cluster has several hundred Nodes or fewer, we do not recommend lowering
the default value of this configuration option. It is unlikely to improve the
scheduler's performance significantly.

### How the scheduler iterates over Nodes

Expand All @@ -91,8 +93,8 @@ for running Pods, the scheduler iterates over the nodes in a round robin
fashion. You can imagine that Nodes are in an array. The scheduler starts from
the start of the array and checks feasibility of the nodes until it finds enough
Nodes as specified by `percentageOfNodesToScore`. For the next Pod, the
scheduler continues from the point in the Node array that it stopped at when checking
feasibility of Nodes for the previous Pod.
scheduler continues from the point in the Node array that it stopped at when
checking feasibility of Nodes for the previous Pod.

If Nodes are in multiple zones, the scheduler iterates over Nodes in various
zones to ensure that Nodes from different zones are considered in the
Expand Down

0 comments on commit 8ca52cc

Please sign in to comment.