Skip to content

Commit

Permalink
backup: support split big region into small backup files (tikv#9283) (t…
Browse files Browse the repository at this point in the history
…ikv#9448)

cherry-pick tikv#9283 to release-4.0
---

<!--
Thank you for contributing to TiKV!

If you haven't already, please read TiKV's [CONTRIBUTING](https://github.com/tikv/tikv/blob/master/CONTRIBUTING.md) document.

If you're unsure about anything, just ask; somebody should be along to answer within a day or two.

PR Title Format:
1. module [, module2, module3]: what's changed
2. *: what's changed

If you want to open the **Challenge Program** pull request, please use the following template:
https://raw.githubusercontent.com/tikv/.github/master/.github/PULL_REQUEST_TEMPLATE/challenge-program.md
You can use it with query parameters: https://github.com/tikv/tikv/compare/master...${you branch}?template=challenge-program.md
-->

### What problem does this PR solve?

Issue Number: close tikv#9144 <!-- REMOVE this line if no issue to close -->

Problem Summary: BR will read all data of a region and fill it in a SST writer. But it is in-memory. If there is a huge region, TiKV may crash for OOM because of keeping all data of this region in memory.

### What is changed and how it works?

What's Changed: Record the written txn entries' size. When it reaches `region_max_size`, we will save the data cached in RocksDB to a SST file and then switch to the next file.

### Related changes

- Need to cherry-pick to the release branch

### Check List <!--REMOVE the items that are not applicable-->

Tests <!-- At least one of them must be included. -->

- Unit test
- Integration test
- Manual test (add detailed scripts or steps below)
1. Set `sst-max-size` to 15MiB.
```
mysql> select * from CLUSTER_CONFIG where `TYPE`="tikv";
+------+-----------------+---------------------------------------------------------------+------------------------------------------------------+
| TYPE | INSTANCE        | KEY                                                           | VALUE                                                |
+------+-----------------+---------------------------------------------------------------+------------------------------------------------------+
| tikv | 127.0.0.1:20160 | backup.batch-size                                             | 8                                                    |
| tikv | 127.0.0.1:20160 | backup.num-threads                                            | 9                                                    |
| tikv | 127.0.0.1:20160 | backup.sst-max-size                                           | 15MiB                                                |
...
```
2. Backup around 100MB data(without compaction) successfully.
```
$ ./br backup full -s ./backup --pd http://127.0.0.1:2379 
Full backup <--------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00%
Checksum <-----------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00%
[2020/12/31 14:39:12.534 +08:00] [INFO] [collector.go:60] ["Full backup Success summary: total backup ranges: 2, total success: 2, total failed: 0, total take(Full backup time): 4.273097395s, total take(real time): 8.133315406s, total kv: 8000000, total size(MB): 361.27, avg speed(MB/s): 84.55"] ["backup checksum"=901.754111ms] ["backup fast checksum"=6.09384ms] ["backup total regions"=10] [BackupTS=421893700168974340] [Size=48023090]
```
3. The big region can be split into several files:
```
-rw-r--r-- 1 * * 1.5M Dec 31 14:39 1_60_28_74219326eeb0a4ae3a0f5190f7784132bb0e44791391547ef66862aaeb668579_1609396745730_write.sst
-rw-r--r-- 1 * * 1.2M Dec 31 14:39 1_60_28_b7a5509d9912c66a21589d614cfc8828acd4051a7eeea3f24f5a7b337b5a389e_1609396746062_write.sst
-rw-r--r-- 1 * * 1.5M Dec 31 14:39 1_60_28_cdcc2ce1c18a30a2b779b574f64de9f0e3be81c2d8720d5af0a9ef9633f8fbb7_1609396745429_write.sst
-rw-r--r-- 1 * * 2.4M Dec 31 14:39 1_62_28_4259e616a6e7b70c33ee64af60230f3e4160af9ac7aac723f033cddf6681826a_1609396747038_write.sst
-rw-r--r-- 1 * * 2.4M Dec 31 14:39 1_62_28_5d0de44b65fb805e45c93278661edd39792308c8ce90855b54118c4959ec9f16_1609396746731_write.sst
-rw-r--r-- 1 * * 2.4M Dec 31 14:39 1_62_28_ef7ab4b5471b088ee909870e316d926f31f4f6ec771754690eac61af76e8782c_1609396747374_write.sst
-rw-r--r-- 1 * * 1.5M Dec 31 14:39 1_64_29_74211aae8215fe9cde8bd7ceb8494afdcc18e5c6a8c5830292a577a9859d38e1_1609396746671_write.sst
-rw-r--r-- 1 * * 1.2M Dec 31 14:39 1_64_29_81e152c98742938c1662241fac1c841319029e800da6881d799a16723cb42888_1609396747010_write.sst
-rw-r--r-- 1 * * 1.5M Dec 31 14:39 1_64_29_ce0dde9826aee9e5ccac0a516f18b9871d3897effd559ff7450b8e56ac449bbd_1609396746349_write.sst
-rw-r--r-- 1 * *   78 Dec 31 14:39 backup.lock
-rw-r--r-- 1 * * 229K Dec 31 14:39 backupmeta
```
4. Restore backuped data. It works successfully and passes the manual check.
```
./br restore full -s ./backup --pd http://127.0.0.1:2379
Full restore <-------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00%
[2020/12/31 14:42:49.983 +08:00] [INFO] [collector.go:60] ["Full restore Success summary: total restore files: 27, total success: 27, total failed: 0, total take(Full restore time): 5.063048828s, total take(real time): 7.84620924s, total kv: 8000000, total size(MB): 361.27, avg speed(MB/s): 71.36"] ["split region"=26.217737ms] ["restore checksum"=4.10792638s] ["restore ranges"=26] [Size=48023090]
```

### Release note <!-- bugfixes or new feature need a release note -->

- Fix the problem that TiKV OOM when we backup a huge region.
  • Loading branch information
ti-srebot authored and gengliqi committed Feb 19, 2021
1 parent 3eaa05b commit 1dca687
Show file tree
Hide file tree
Showing 9 changed files with 358 additions and 70 deletions.
155 changes: 100 additions & 55 deletions components/backup/src/endpoint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ use yatp::task::callback::{Handle, TaskCell};
use yatp::ThreadPool;

use crate::metrics::*;
use crate::writer::BackupWriterBuilder;
use crate::Error;
use crate::*;

const BACKUP_BATCH_LIMIT: usize = 1024;
Expand Down Expand Up @@ -142,11 +144,12 @@ impl BackupRange {
/// Get entries from the scanner and save them to storage
fn backup<E: Engine>(
&self,
writer: &mut BackupWriter,
writer_builder: BackupWriterBuilder,
engine: &E,
backup_ts: TimeStamp,
begin_ts: TimeStamp,
) -> Result<Statistics> {
storage: &LimitedStorage,
) -> Result<(Vec<File>, Statistics)> {
assert!(!self.is_raw_kv);

let mut ctx = Context::default();
Expand Down Expand Up @@ -181,7 +184,17 @@ impl BackupRange {
.unwrap();

let start_scan = Instant::now();
let mut files: Vec<File> = Vec::with_capacity(2);
let mut batch = EntryBatch::with_capacity(BACKUP_BATCH_LIMIT);
let mut last_key = self
.start_key
.clone()
.map_or_else(Vec::new, |k| k.into_raw().unwrap());
let mut cur_key = self
.end_key
.clone()
.map_or_else(Vec::new, |k| k.into_raw().unwrap());
let mut writer = writer_builder.build(last_key.clone())?;
loop {
if let Err(e) = scanner.scan_entries(&mut batch) {
error!(?e; "backup scan entries failed");
Expand All @@ -191,6 +204,48 @@ impl BackupRange {
break;
}
debug!("backup scan entries"; "len" => batch.len());

if writer.need_split_keys() {
let res = {
batch.iter().next().map_or_else(
|| Err(Error::Other(box_err!("get entry error"))),
|x| match x.to_key() {
Ok(k) => {
cur_key = k.into_raw().unwrap();
writer_builder.build(cur_key.clone())
}
Err(e) => {
error!(?e; "backup save file failed");
Err(Error::Other(box_err!("Decode error: {:?}", e)))
}
},
)
};
match writer.save(&storage.storage) {
Ok(mut split_files) => {
for file in split_files.iter_mut() {
file.set_start_key(last_key.clone());
file.set_end_key(cur_key.clone());
}
last_key = cur_key.clone();
files.append(&mut split_files);
}
Err(e) => {
error!(?e; "backup save file failed");
return Err(e);
}
}
match res {
Ok(w) => {
writer = w;
}
Err(e) => {
error!(?e; "backup writer failed");
return Err(e);
}
}
}

// Build sst files.
if let Err(e) = writer.write(batch.drain(), true) {
error!(?e; "backup build sst failed");
Expand All @@ -200,8 +255,29 @@ impl BackupRange {
BACKUP_RANGE_HISTOGRAM_VEC
.with_label_values(&["scan"])
.observe(start_scan.elapsed().as_secs_f64());

if writer.need_flush_keys() {
match writer.save(&storage.storage) {
Ok(mut split_files) => {
cur_key = self
.end_key
.clone()
.map_or_else(Vec::new, |k| k.into_raw().unwrap());
for file in split_files.iter_mut() {
file.set_start_key(last_key.clone());
file.set_end_key(cur_key.clone());
}
files.append(&mut split_files);
}
Err(e) => {
error!(?e; "backup save file failed");
return Err(e);
}
}
}

let stat = scanner.take_statistics();
Ok(stat)
Ok((files, stat))
}

fn backup_raw<E: Engine>(
Expand Down Expand Up @@ -264,44 +340,6 @@ impl BackupRange {
Ok(statistics)
}

fn backup_to_file<E: Engine>(
&self,
engine: &E,
db: Arc<DB>,
storage: &LimitedStorage,
file_name: String,
backup_ts: TimeStamp,
start_ts: TimeStamp,
compression_type: Option<SstCompressionType>,
compression_level: i32,
) -> Result<(Vec<File>, Statistics)> {
let mut writer = match BackupWriter::new(
db,
&file_name,
storage.limiter.clone(),
compression_type,
compression_level,
) {
Ok(w) => w,
Err(e) => {
error!(?e; "backup writer failed");
return Err(e);
}
};
let stat = match self.backup(&mut writer, engine, backup_ts, start_ts) {
Ok(s) => s,
Err(e) => return Err(e),
};
// Save sst files to storage.
match writer.save(&storage.storage) {
Ok(files) => Ok((files, stat)),
Err(e) => {
error!(?e; "backup save file failed");
Err(e)
}
}
}

fn backup_raw_kv_to_file<E: Engine>(
&self,
engine: &E,
Expand Down Expand Up @@ -586,6 +624,7 @@ impl<E: Engine, R: RegionInfoProvider> Endpoint<E, R> {
let db = self.db.clone();
let store_id = self.store_id;
let batch_size = self.config_manager.0.read().unwrap().batch_size;
let sst_max_size = self.config_manager.0.read().unwrap().sst_max_size.0;

// TODO: make it async.
self.pool.borrow_mut().spawn(move || {
Expand Down Expand Up @@ -654,17 +693,17 @@ impl<E: Engine, R: RegionInfoProvider> Endpoint<E, R> {
brange.end_key.map_or_else(Vec::new, |k| k.into_encoded()),
)
} else {
let writer_builder = BackupWriterBuilder::new(
store_id,
storage.limiter.clone(),
brange.region.clone(),
db.clone(),
ct,
request.compression_level,
sst_max_size,
);
(
brange.backup_to_file(
&engine,
db.clone(),
&storage,
name,
backup_ts,
start_ts,
ct,
request.compression_level,
),
brange.backup(writer_builder, &engine, backup_ts, start_ts, &storage),
brange
.start_key
.map_or_else(Vec::new, |k| k.into_raw().unwrap()),
Expand Down Expand Up @@ -692,8 +731,10 @@ impl<E: Engine, R: RegionInfoProvider> Endpoint<E, R> {
"details" => ?stat);

for file in files.iter_mut() {
file.set_start_key(start_key.clone());
file.set_end_key(end_key.clone());
if is_raw_kv {
file.set_start_key(start_key.clone());
file.set_end_key(end_key.clone());
}
file.set_start_version(start_ts.into_inner());
file.set_end_version(end_ts.into_inner());
}
Expand Down Expand Up @@ -817,7 +858,7 @@ fn get_max_start_key(start_key: Option<&Key>, region: &Region) -> Option<Key> {
/// A name consists with five parts: store id, region_id, a epoch version, the hash of range start key and timestamp.
/// range start key is used to keep the unique file name for file, to handle different tables exists on the same region.
/// local unix timestamp is used to keep the unique file name for file, to handle receive the same request after connection reset.
fn backup_file_name(store_id: u64, region: &Region, key: Option<String>) -> String {
pub fn backup_file_name(store_id: u64, region: &Region, key: Option<String>) -> String {
let start = SystemTime::now();
let since_the_epoch = start
.duration_since(UNIX_EPOCH)
Expand Down Expand Up @@ -871,6 +912,7 @@ pub mod tests {
use txn_types::SHORT_VALUE_MAX_LEN;

use super::*;
use tikv_util::config::ReadableSize;

#[derive(Clone)]
pub struct MockRegionInfoProvider {
Expand Down Expand Up @@ -939,6 +981,7 @@ pub mod tests {
BackupConfig {
num_threads: 4,
batch_size: 8,
sst_max_size: ReadableSize::mb(144),
},
),
)
Expand Down Expand Up @@ -1191,8 +1234,10 @@ pub mod tests {
let resp = resp.unwrap();
assert!(!resp.has_error(), "{:?}", resp);
let file_len = if *len <= SHORT_VALUE_MAX_LEN { 1 } else { 2 };
let files = resp.get_files();
info!("{:?}", files);
assert_eq!(
resp.get_files().len(),
files.len(),
file_len, /* default and write */
"{:?}",
resp
Expand Down
2 changes: 1 addition & 1 deletion components/backup/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ mod metrics;
mod service;
mod writer;

pub use endpoint::{Endpoint, Task};
pub use endpoint::{backup_file_name, Endpoint, Task};
pub use errors::{Error, Result};
pub use service::Service;
pub use writer::{BackupRawKVWriter, BackupWriter};
Loading

0 comments on commit 1dca687

Please sign in to comment.