Skip to content

Commit

Permalink
Make using sysctls a task instead of a concept
Browse files Browse the repository at this point in the history
  • Loading branch information
tengqm committed Jan 2, 2018
1 parent 53e0535 commit ac04f8b
Show file tree
Hide file tree
Showing 4 changed files with 47 additions and 40 deletions.
1 change: 0 additions & 1 deletion _data/concepts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ toc:
section:
- docs/concepts/cluster-administration/network-plugins.md
- docs/concepts/cluster-administration/device-plugins.md
- docs/concepts/cluster-administration/sysctl-cluster.md
- docs/concepts/service-catalog/index.md

- title: Containers
Expand Down
1 change: 1 addition & 0 deletions _data/tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ toc:
- docs/tasks/administer-cluster/access-cluster-api.md
- docs/tasks/administer-cluster/access-cluster-services.md
- docs/tasks/administer-cluster/securing-a-cluster.md
- docs/tasks/administer-cluster/sysctl-cluster.md
- docs/tasks/administer-cluster/encrypt-data.md
- docs/tasks/administer-cluster/configure-upgrade-etcd.md
- docs/tasks/administer-cluster/static-pod.md
Expand Down
3 changes: 2 additions & 1 deletion _redirects
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
/docs/admin/resourcequota/limitstorageconsumption/ /docs/tasks/administer-cluster/limit-storage-consumption/ 301
/docs/admin/resourcequota/walkthrough/ /docs/tasks/administer-cluster/quota-api-object/ 301
/docs/admin/static-pods/ /docs/tasks/administer-cluster/static-pod/ 301
/docs/admin/sysctls/ /docs/concepts/cluster-administration/sysctl-cluster/ 301
/docs/admin/sysctls/ /docs/tasks/administer-cluster/sysctl-cluster/ 301
/docs/admin/upgrade-1-6/ /docs/tasks/administer-cluster/upgrade-1-6/ 301
/docs/admin/resource-quota/ /docs/concepts/policy/resource-quotas/ 301

Expand Down Expand Up @@ -99,6 +99,7 @@
/docs/concepts/cluster-administration/multiple-clusters/ /docs/concepts/cluster-administration/federation/ 301
/docs/concepts/cluster-administration/out-of-resource/ /docs/tasks/administer-cluster/out-of-resource/ 301
/docs/concepts/cluster-administration/resource-usage-monitoring /docs/tasks/debug-application-cluster/resource-usage-monitoring/ 301
/docs/concepts/cluster-administration/sysctl-cluster/ /docs/tasks/administer-cluster/sysctl-cluster/ 301
/docs/concepts/cluster-administration/static-pod/ /docs/tasks/administer-cluster/static-pod/ 301
/docs/concepts/clusters/logging/ /docs/concepts/cluster-administration/logging/ 301
/docs/concepts/configuration/container-command-arg/ /docs/tasks/inject-data-application/define-command-argument-container/ 301
Expand Down
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
---
title: Using Sysctls in a Kubernetes Cluster
approvers:
- sttts
title: Using Sysctls in a Kubernetes Cluster
---

* TOC
{:toc}
{% capture overview %}

This document describes how sysctls are used within a Kubernetes cluster.

## What is a Sysctl?
{% endcapture %}

## Listing all Sysctl Parameters?

In Linux, the sysctl interface allows an administrator to modify kernel
parameters at runtime. Parameters are available via the `/proc/sys/` virtual
Expand All @@ -27,31 +28,7 @@ To get a list of all parameters, you can run
$ sudo sysctl -a
```

## Namespaced vs. Node-Level Sysctls

A number of sysctls are _namespaced_ in today's Linux kernels. This means that
they can be set independently for each pod on a node. Being namespaced is a
requirement for sysctls to be accessible in a pod context within Kubernetes.

The following sysctls are known to be _namespaced_:

- `kernel.shm*`,
- `kernel.msg*`,
- `kernel.sem`,
- `fs.mqueue.*`,
- `net.*`.

Sysctls which are not namespaced are called _node-level_ and must be set
manually by the cluster admin, either by means of the underlying Linux
distribution of the nodes (e.g. via `/etc/sysctls.conf`) or using a DaemonSet
with privileged containers.

**Note**: it is good practice to consider nodes with special sysctl settings as
_tainted_ within a cluster, and only schedule pods onto them which need those
sysctl settings. It is suggested to use the Kubernetes [_taints and toleration_
feature](/docs/user-guide/kubectl/{{page.version}}/#taint) to implement this.

## Safe vs. Unsafe Sysctls
## Enabling Unsafe Sysctls

Sysctls are grouped into _safe_ and _unsafe_ sysctls. In addition to proper
namespacing a _safe_ sysctl must be properly _isolated_ between pods on the same
Expand All @@ -63,8 +40,7 @@ node. This means that setting a _safe_ sysctl for one pod
of a pod.

By far, most of the _namespaced_ sysctls are not necessarily considered _safe_.

For Kubernetes 1.4, the following sysctls are supported in the _safe_ set:
The following sysctls are supported in the _safe_ set:

- `kernel.shm_rmid_forced`,
- `net.ipv4.ip_local_port_range`,
Expand All @@ -82,28 +58,54 @@ scheduled, but will fail to launch.
**Warning**: Due to their nature of being _unsafe_, the use of _unsafe_ sysctls
is at-your-own-risk and can lead to severe problems like wrong behavior of
containers, resource shortage or complete breakage of a node.

## Enabling Unsafe Sysctls
{: .warning}

With the warning above in mind, the cluster admin can allow certain _unsafe_
sysctls for very special situations like e.g. high-performance or real-time
application tuning. _Unsafe_ sysctls are enabled on a node-by-node basis with a
flag of the kubelet, e.g.:

```shell
$ kubelet --experimental-allowed-unsafe-sysctls 'kernel.msg*,net.ipv4.route.min_pmtu' ...
$ kubelet --experimental-allowed-unsafe-sysctls \
'kernel.msg*,net.ipv4.route.min_pmtu' ...
```

For minikube, this can be done via the `extra-config` flag:

```shell
$ minikube start --extra-config="kubelet.AllowedUnsafeSysctls=kernel.msg*,net.ipv4.route.min_pmtu"...
```

Only _namespaced_ sysctls can be enabled this way.


## Setting Sysctls for a Pod

The sysctl feature is an alpha API in Kubernetes 1.4. Therefore, sysctls are set
using annotations on pods. They apply to all containers in the same pod.
A number of sysctls are _namespaced_ in today's Linux kernels. This means that
they can be set independently for each pod on a node. Being namespaced is a
requirement for sysctls to be accessible in a pod context within Kubernetes.

The following sysctls are known to be _namespaced_:

- `kernel.shm*`,
- `kernel.msg*`,
- `kernel.sem`,
- `fs.mqueue.*`,
- `net.*`.

Sysctls which are not namespaced are called _node-level_ and must be set
manually by the cluster admin, either by means of the underlying Linux
distribution of the nodes (e.g. via `/etc/sysctls.conf`) or using a DaemonSet
with privileged containers.

**Note**: It is good practice to consider nodes with special sysctl settings as
_tainted_ within a cluster, and only schedule pods onto them which need those
sysctl settings. It is suggested to use the Kubernetes [_taints and toleration_
feature](/docs/user-guide/kubectl/{{page.version}}/#taint) to implement this.
{: .note}

The sysctl feature is an alpha API. Therefore, sysctls are set using annotations
on pods. They apply to all containers in the same pod.

Here is an example, with different annotations for _safe_ and _unsafe_ sysctls:

Expand All @@ -121,6 +123,10 @@ spec:
**Note**: a pod with the _unsafe_ sysctls specified above will fail to launch on
any node which has not enabled those two _unsafe_ sysctls explicitly. As with
_node-level_ sysctls it is recommended to use [_taints and toleration_
feature](/docs/user-guide/kubectl/{{page.version}}/#taint) or [taints on nodes](/docs/concepts/configuration/taint-and-toleration/)
_node-level_ sysctls it is recommended to use
[_taints and toleration_ feature](/docs/user-guide/kubectl/{{page.version}}/#taint) or
[taints on nodes](/docs/concepts/configuration/taint-and-toleration/)
to schedule those pods onto the right nodes.
{: .note}
{% include templates/task.md %}

0 comments on commit ac04f8b

Please sign in to comment.