Skip to content

Commit

Permalink
docs: add control plane in-depth guide
Browse files Browse the repository at this point in the history
Add FAQ on initial time sync.

Add 0.9 new videos.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
  • Loading branch information
smira authored and talos-bot committed Mar 17, 2021
1 parent ecf0344 commit 8810440
Show file tree
Hide file tree
Showing 6 changed files with 87 additions and 4 deletions.
6 changes: 6 additions & 0 deletions website/content/docs/v0.9/Guides/converting-control-plane.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ After Talos OS upgrade to version 0.9 Kubernetes control plane should be convert

This guide describes automated conversion script and also shows detailed manual conversion process.

## Video Walkthrough

To see a live demo of this writeup, see the video below:

<iframe width="560" height="315" src="https://www.youtube.com/embed/nUuFYLEp7wQ" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

## Automated Conversion

First, make sure all nodes are updated to Talos 0.9:
Expand Down
4 changes: 1 addition & 3 deletions website/content/docs/v0.9/Guides/upgrading-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,7 @@ refer to 0.8 docs.

To see a live demo of this writeup, see the video below:

<!-- TODO: update the video for 0.9 -->

<iframe width="560" height="315" src="https://www.youtube.com/embed/sw78qS8vBGc" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/_N_vhB_ZI2c" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

## Automated Kubernetes Upgrade

Expand Down
6 changes: 6 additions & 0 deletions website/content/docs/v0.9/Guides/vip.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ Talos has (as of version 0.9) built-in support for this form of shared IP addres
and it can utilize this for both the Kubernetes API server and the Talos endpoint set.
Talos uses `etcd` for elections and leadership (control) of the IP address.

## Video Walkthrough

To see a live demo of this writeup, see the video below:

<iframe width="560" height="315" src="https://www.youtube.com/embed/BfMGInHtFBc" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

## Choose your Shared IP

To begin with, you should choose your shared IP address.
Expand Down
66 changes: 66 additions & 0 deletions website/content/docs/v0.9/Learn More/control-plane.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: "Control Plane"
weight: 8
---

This guide provides details on how Talos runs and bootstraps the Kubernetes control plane.

### High-level Overview

Talos cluster bootstrap flow:

1. The `etcd` service is started on control plane nodes. Instances of `etcd` on control plane nodes build the `etcd` cluster.
2. The `kubelet` service is started.
3. Control plane components are started as static pods via the `kubelet`, and the `kube-apiserver` component connects to the local (running on the same node) `etcd` instance.
4. The `kubelet` issues client certificate using the bootstrap token using the control plane endpoint (via `kube-apiserver` and `kube-controller-manager`).
5. The `kubelet` registers the node in the API server.
6. Kubernetes control plane schedules pods on the nodes.

### Cluster Bootstrapping

All nodes start the `kubelet` service.
The `kubelet` tries to contact the control plane endpoint, but as it is not up yet, it keeps retrying.

One of the control plane nodes is chosen as the bootstrap node.
The node's type can be either `init` or `controlplane`, where the `controlplane` type is promoted using the bootstrap API (`talosctl bootstrap`).
The bootstrap node initiates the `etcd` bootstrap process by initializing `etcd` as the first member of the cluster.

> Note: there should be only one bootstrap node for the cluster lifetime.
> Once `etcd` is bootstrapped, the bootstrap node has no special role and acts the same way as other control plane nodes.
Services `etcd` on non-bootstrap nodes try to get `Endpoints` resource via control plane endpoint, but that request fails as control plane endpoint is not up yet.

As soon as `etcd` is up on the bootstrap node, static pod definitions for the Kubernetes control plane components (`kube-apiserver`, `kube-controller-manager`, `kube-scheduler`) are rendered to disk.
The `kubelet` service on the bootstrap node picks up the static pod definitions and starts the Kubernetes control plane components.
As soon as `kube-apiserver` is launched, the control plane endpoint comes up.

The bootstrap node acquires an `etcd` mutex and injects the bootstrap manifests into the API server.
The set of the bootstrap manifests specify the Kubernetes join token and kubelet CSR auto-approval.
The `kubelet` service on all the nodes is now able to issue client certificates for themselves and register nodes in the API server.

Other bootstrap manifests specify additional resources critical for Kubernetes operations (i.e. CNI, PSP, etc.)

The `etcd` service on non-bootstrap nodes is now able to discover other members of the `etcd` cluster via the Kubernetes `Endpoints` resource.
The `etcd` cluster is now formed and consists of all control plane nodes.

All control plane nodes render static pod manifests for the control plane components.
Each node now runs a full set of components to make the control plane HA.

The `kubelet` service on worker nodes is now able to issue the client certificate and register itself with the API server.

### Scaling Up the Control Plane

When new nodes are added to the control plane, the process is the same as the bootstrap process above: the `etcd` service discovers existing members of the control plane via the
control plane endpoint, joins the `etcd` cluster, and the control plane components are scheduled on the node.

### Scaling Down the Control Plane

Scaling down the control plane involves removing a node from the cluster.
The most critical part is making sure that the node which is being removed leaves the etcd cluster.
When using `talosctl reset` command, the targeted control plane node leaves the `etcd` cluster as part of the reset sequence.

### Upgrading Control Plane Nodes

When a control plane node is upgraded, Talos leaves `etcd`, wipes the system disk, installs a new version of itself, and reboots.
The upgraded node then joins the `etcd` cluster on reboot.
So upgrading a control plane node is equivalent to scaling down the control plane node followed by scaling up with a new version of Talos.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Controllers and Resources"
weight: 8
weight: 9
---

<!-- markdownlint-disable MD038 -->
Expand Down
7 changes: 7 additions & 0 deletions website/content/docs/v0.9/Learn More/faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,10 @@ We envision Talos being a great place for the application of [control theory](ht

Talos was an automaton created by the Greek God of the forge to protect the island of Crete.
He would patrol the coast and enforce laws throughout the land. We felt it was a fitting name for a security focused operating system designed to run Kubernetes.

## Why does Talos query `pool.ntp.org` on boot even if configured to use a different time server?

When Talos boots, before the config is loaded, Talos performs a non-blocking attempt to sync the time with the default nameserver (`pool.ntp.org`).
This initial time sync is required if the node doesn't have an RTC or the RTC is out of sync because TLS (e.g. HTTPS) requires time to be in sync for certificate validation.
As soon as the config is available, Talos starts syncing the time with the configured time server.
Time sync errors on initial boot can be safely ignored.

0 comments on commit 8810440

Please sign in to comment.