Skip to content

Commit

Permalink
Refine backup and restore documentation (#518)
Browse files Browse the repository at this point in the history
* Refine backup and restore documentation

Signed-off-by: Aylei <rayingecho@gmail.com>

* Fix dead link

Signed-off-by: Aylei <rayingecho@gmail.com>

* Update docs/backup-restore.md

Co-Authored-By: Lilian Lee <lilin@pingcap.com>

* Refine backup documents

Signed-off-by: Aylei <rayingecho@gmail.com>

* Fix code block indention

Signed-off-by: Aylei <rayingecho@gmail.com>

* Add namespace parameter in tidb-backup configuration document
  • Loading branch information
aylei authored and tennix committed Jun 2, 2019
1 parent 46695be commit 8a110da
Show file tree
Hide file tree
Showing 7 changed files with 204 additions and 66 deletions.
2 changes: 1 addition & 1 deletion charts/tidb-backup/templates/backup-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
apiVersion: batch/v1
kind: Job
metadata:
name: {{ .Values.clusterName }}-{{ .Values.name }}
name: {{ .Values.clusterName }}-{{ tpl .Values.name . }}
labels:
app.kubernetes.io/name: {{ template "chart.name" . }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
Expand Down
3 changes: 2 additions & 1 deletion charts/tidb-backup/templates/backup-pvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: {{ .Values.name }}
name: {{ tpl .Values.name . }}
labels:
app.kubernetes.io/name: {{ template "chart.name" . }}
app.kubernetes.io/managed-by: tidb-operator
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: backup
pingcap.com/backup-cluster-name: {{ .Values.clusterName }}
helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
spec:
accessModes:
Expand Down
2 changes: 1 addition & 1 deletion charts/tidb-backup/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ clusterName: demo

mode: backup # backup | restore
# name is the backup name
name: fullbackup-20190306
name: fullbackup-{{ date "200601021504" .Release.Time }}
image:
pullPolicy: IfNotPresent
binlog: pingcap/tidb-binlog:v3.0.0-rc.1
Expand Down
1 change: 1 addition & 0 deletions charts/tidb-cluster/templates/scheduled-backup-pvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ metadata:
app.kubernetes.io/managed-by: tidb-operator
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: scheduled-backup
pingcap.com/backup-cluster-name: {{ template "cluster.name" . }}
helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
spec:
accessModes:
Expand Down
116 changes: 116 additions & 0 deletions docs/backup-restore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Backup and Restore a TiDB Cluster
## Overview

TiDB Operator supports two kinds of backup:

* [Full backup](#full-backup)(scheduled or ad-hoc) via [`mydumper`](https://www.pingcap.com/docs/dev/reference/tools/mydumper/), which helps you to logically back up a TiDB cluster.
* [Incremental backup](#incremental-backup) via [`TiDB-Binlog`](https://www.pingcap.com/docs/dev/reference/tools/tidb-binlog/overview/), which helps you replicate the data in a TiDB cluster to other databases or back up the data in real time.

Currently, TiDB Operator only supports automatic [restore operation](#restore) for full backup taken by `mydumper`. Restoring the backup data captured by TiDB Binlog requires manual intervention.

## Full backup

Full backup uses `mydumper` to make a logical backup of TiDB cluster. The backup job will create a PVC([PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims), the same below) to store backup data.

By default, the backup uses PV ([Persistent Volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistent-volumes)) to store the backup data. You can also store the backup data to [Google Cloud Storage](https://cloud.google.com/storage/) bucket or [Ceph Object Storage](https://ceph.com/ceph-storage/object-storage/) by changing the configuration. This way the PV temporarily stores backup data before it is placed in object storage. Refer to [TiDB cluster Backup configuration](./references/tidb-backup-configuration.md) for full configuration guide of backup and restore.

You can either set up a scheduled full backup or take a full backup in an ad-hoc manner.

### Scheduled full backup

Scheduled full backup is created alongside the TiDB cluster, and it runs periodically like the crontab job.

To configure a scheduled full backup, modify the `scheduledBackup` section in the `charts/tidb-cluster/values.yaml` file of the TiDB cluster:

* Set `scheduledBackup.create` to `true`
* Set `scheduledBackup.storageClassName` to the PV storage class name used for backup data

> **Note:**
>
> You must set the scheduled full backup PV's [reclaim policy](https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy) to `Retain` to keep your backup data safe.
* Configure `scheduledBackup.schedule` in the [Cron](https://en.wikipedia.org/wiki/Cron) format to define the scheduling.
* Create a Kubernetes [Secret](https://kubernetes.io/docs/concepts/configuration/secret/) containing the username and password that has the privilege to backup the database:

```shell
$ kubectl create secret generic backup-secret -n ${namespace} --from-literal=user=<user> --from-literal=password=<password>
```

Then, create a new cluster with the scheduled full backup configured by `helm install`, or enabling scheduled full backup for the existing cluster by `helm upgrade`:

```shell
$ helm upgrade ${RELEASE_NAME} charts/tidb-cluster -f charts/tidb-cluster/values.yaml
```

### Ad-Hoc full backup

Ad-hoc backup is encapsulated in another helm chart, `charts/tidb-backup`. According to the `mode` in `charts/tidb-backup/values.yaml`, this chart can perform either full backup or restore. We will cover restore operation in the [restore section](#restore) of this document.

To create an ad-hoc full backup job, modify the `charts/tidb-backup/values.yaml` file:

* Set `clusterName` to the target TiDB cluster name
* Set `mode` to `backup`
* Set `storage.className` to the PV storage class name used for backup data
* Adjust the `storage.size` according to your database size

> **Note:**

> You must set the ad-hoc full backup PV's [reclaim policy](https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy) to `Retain` to keep your backup data safe.
Create a Kubernetes [Secret](https://kubernetes.io/docs/concepts/configuration/secret/) containing the username and password that has the privilege to backup the database:
```shell
$ kubectl create secret generic backup-secret -n ${namespace} --from-literal=user=<user> --from-literal=password=<password>
```
Then run the following command to create an ad-hoc backup job:
```shell
$ helm install charts/tidb-backup --name=<backup-name> --namespace=${namespace}
```
### View backups
For backups stored in PV, you can view the PVs by using the following command:
```shell
$ kubectl get pvc -n ${namespace} -l app.kubernetes.io/component=backup,pingcap.com/backup-cluster-name=${cluster_name}
```
If you store your backup data to [Google Cloud Storage](https://cloud.google.com/storage/) or [Ceph Object Storage](https://ceph.com/ceph-storage/object-storage/), you can view the backups by using the related GUI or CLI tool.
## Restore
The helm chart `charts/tidb-backup` helps restore a TiDB cluster using backup data. To perform a restore operation, modify the `charts/tidb-backup/values.yaml` file:
* Set `clusterName` to the target TiDB cluster name
* Set `mode` to `restore`
* Set `name` to the backup name you want to restore([view backups](#view-backups) helps you view all the backups available). If the backup is stored in `Google Cloud Storage` or `Ceph Object Storage`, you must configure the corresponding section too (you might continue to use the same configuration you set in the [adhoc full backup](#ad-hoc-full-backup)).
Create a Kubernetes [Secret](https://kubernetes.io/docs/concepts/configuration/secret/) containing the user and password that has the privilege to restore the database (skip this if you have already created one in the [adhoc full backup](#ad-hoc-full-backup) section):
```shell
$ kubectl create secret generic backup-secret -n ${namespace} --from-literal=user=<user> --from-literal=password=<password>
```
Then, restore the backup:
```shell
$ helm install charts/tidb-backup --namespace=${namespace}
```
## Incremental backup
Incremental backup leverages the [TiDB Binlog](https://www.pingcap.com/docs/dev/reference/tools/tidb-binlog/overview/) tool to collect binlog data from TiDB and provide real-time backup and replication to downstream platforms.
Incremental backup is disabled in the TiDB cluster by default. To create a TiDB cluster with incremental backup enabled or enable incremental backup in existing TiDB cluster, modify the `charts/tidb-cluster/values.yaml` file:
* Set `binlog.pump.create` to `true`
* Set `binlog.drainer.create` to `true`
* Set `binlog.pump.storageClassName` and `binlog.drainer.storageClassName` to a proper `storageClass` available in your kubernetes cluster
* Set `binlog.drainer.destDBType` to your desired downstream, explained in detail below
Three types of downstream platforms available for incremental backup:
* PersistenceVolume: default downstream. You can consider configuring a large PV for `drainer` (the `binlog.drainer.storage` variable) in this case
* MySQL compatible database: enabled by setting `binlog.drainer.destDBType` to `mysql`. You must configure the target address and credential in the `binlog.drainer.mysql` section too.
* Kafka: enable by setting `binlog.drainer.destDBType` to `kafka`. You must configure the zookeeper address and Kafka address in the `binlog.drainer.kafka` section too.
66 changes: 3 additions & 63 deletions docs/operation-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,68 +222,8 @@ To retrieve logs from multiple pods, [`stern`](https://github.com/wercker/stern)
$ stern -n ${namespace} tidb -c slowlog
```

## Backup
## Backup and restore

Currently, TiDB Operator supports two kinds of backup: incremental backup via binlog and full backup(scheduled or ad-hoc) via [Mydumper](https://github.com/maxbube/mydumper).
TiDB Operator provides highly automated backup and recovery operations for a TiDB cluster. You can easily take full backup or setup incremental backup of a TiDB cluster, and restore the TiDB cluster when the cluster fails.

### Incremental backup

To enable incremental backup, set `binlog.pump.create` and `binlog.drainer.create` to `true`. By default the incremental backup data is stored in protobuffer format in a PV. You can change `binlog.drainer.destDBType` from `pb` to `mysql` or `kafka` and configure the corresponding downstream.

### Full backup

Currently, full backup requires a PersistentVolume. The backup job will create a PVC to store backup data.

By default, the backup uses PV to store the backup data.
> **Note:** You must set the ad-hoc full backup PV's [reclaim policy](https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy) to `Retain` to keep your backup data safe.
You can also store the backup data to [Google Cloud Storage](https://cloud.google.com/storage/) bucket or [Ceph object storage](https://ceph.com/ceph-storage/object-storage/) by configuring the corresponding section in `values.yaml`. This way the PV temporarily stores backup data before it is placed in object storage.
The comments in `values.yaml` is self-explanatory for both GCP backup and Ceph backup.
### Scheduled full backup
Scheduled full backup can be ran periodically just like crontab job.
To create a scheduled full backup job, modify `scheduledBackup` section in `values.yaml` file.
* `create` must be set to `true`
* Set `storageClassName` to the PV storage class name used for backup data
* `schedule` takes the [Cron](https://en.wikipedia.org/wiki/Cron) format
* `user` and `password` must be set to the correct user which has the permission to read the database to be backuped.
> **Note:** You must set the scheduled full backup PV's [reclaim policy](https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy) to `Retain` to keep your backup data safe.


### Ad-Hoc full backup

> **Note:** The rest of the document will use `values.yaml` to reference `charts/tidb-backup/values.yaml`

Ad-Hoc full backup can be done once just like job.

To create an ad-hoc full backup job, modify `backup` section in `values.yaml` file.

* `mode` must be set to `backup`
* Set `storage.className` to the PV storage class name used for backup data

Create a secret containing the user and password that has the permission to backup the database:

```shell
$ kubectl create secret generic backup-secret -n ${namespace} --from-literal=user=<user> --from-literal=password=<password>
```

Then run the following command to create an ad-hoc backup job:

```shell
$ helm install charts/tidb-backup --name=<backup-name> --namespace=${namespace}
```

## Restore

Restore is similar to backup. See the `values.yaml` file for details.

Modified the variables in `values.yaml` and then create restore job using the following command:

```shell
$ helm install charts/tidb-backup --name=<backup-name> --namespace=${namespace}
```
For detailed operation guides of backup and restore, refer to [Backup and Restore TiDB Cluster](./backup-restore.md).
80 changes: 80 additions & 0 deletions docs/references/tidb-backup-configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# TiDB Backup Configuration Reference

`TiDB-Backup` is a helm chart designed for TiDB cluster backup and restore via the [`mydumper`](https://www.pingcap.com/docs/dev/reference/tools/mydumper/) and [`loader`](https://www.pingcap.com/docs-cn/tools/loader/). This documentation explains `TiDB-Backup` configuration. Refer to [Restore and Backup TiDB cluster](#tidb-backup-configuration-reference) for user guide with example.

## Configuration

### `mode`

- To choose the operation, backup or restore, required
- Default: "backup"

### `clusterName`

- The name of the TiDB cluster that data is backed up from or restore to, required
- Default: "demo"

### `name`

- The backup name
- Default: "fullbackup-${date}", date is the start time of backup, accurate to minute

### `secretName`

- The name of the secret which stores user and password used for backup/restore
- Default: "backup-secret"
- You can create the secret by `kubectl create secret generic backup-secret -n ${namespace} --from-literal=user=root --from-literal=password=<password>`

### `storage.className`

- The storageClass used to store the backup data
- Default: "local-storage"

### `storage.size`

- The storage size of PersistenceVolume
- Default: "100Gi"

### `backupOptions`

- The options that are passed to [`mydumper`](https://github.com/maxbube/mydumper/blob/master/docs/mydumper_usage.rst#options)
- Default: "--chunk-filesize=100"

### `restoreOptions`

- The options that are passed to [`loader`](https://www.pingcap.com/docs-cn/tools/loader/)
- Default: "-t 16"

### `gcp.bucket`

- The name of the GCP bucket used to store backup data

> **Note:**
> Once you set any variables under `gcp` section, the backup data will be uploaded to Google Cloud Storage, namely, you have to keep the configuration intact.
### `gcp.secretName`

- The name of the secret which stores the gcp service account credentials json file
- You can create the secret by `kubectl create secret generic gcp-backup-secret -n ${namespace} --from-file=./credentials.json`. To download credentials json, refer to [Google Cloud Documentation](https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually)

### `ceph.endpoint`

- The endpoint of ceph object storage

> **Note:**
> Once you set any variables under `ceph` section, the backup data will be uploaded to ceph object storage, namely, you have to keep the configuration intact.
### `ceph.bucket`

- The bucket name of ceph object storage

### `ceph.secretName`

- The name of the secret which stores ceph object store access key and secret key
- You can create the secret by:

```shell
$ kubectl create secret generic ceph-backup-secret -n ${namespace} --from-literal=access_key=<access-key> --from-literal=secret_key=<secret-key>
```

0 comments on commit 8a110da

Please sign in to comment.