Too many sst files of rocksdb. #206

kaikai1024 · 2019-01-30T07:36:32Z

Description

TBD

Steps to Reproduce

TBD

Expected behavior: TBD

Actual behavior: TBD

Reproduce how often: TBD

Versions

0.20.2

Additional Information

------------------Chinese-----------------------------

CITA 长时间运行后发现 rocksdb 会生成较多的 sst 文件，测试链运行 3 个月，数据大小 30G，nosql 目录下接近 200W 个 sst 文件，导致 CITA-chain CPU 占用率持续增长（偶尔出现卡顿不出块不知道是否和这个有关系），inode 占比 30% ，目测在 4c 8G 100G 硬盘的服务器启动一个节点运行一年 inode 会满，是否有办法控制 sst 生成的文件数量的，同时不要把所有的 sst 文件放在 nosql一级目录下，可以按hash 目录索引，目录下存放子目录再放 sst。

kaikai1024 · 2019-02-15T07:04:19Z

~~use ssd.~~ @jerry-yu Tune cita's rocksdb configuration/usage to match hdd.
Remove block data from rocksDB, find a better way to store it. @yangby-cryptape @kaikai1024

janx · 2019-02-15T07:51:11Z

I think @zhangsoledad 's suggestion is not use ssd, but tune cita's rocksdb configuration/usage to match hdd.

kaikai1024 · 2019-02-15T10:26:45Z

I think @zhangsoledad 's suggestion is not use ssd, but tune cita's rocksdb configuration/usage to match hdd.

Thx.

kaikai1024 · 2019-02-18T04:10:48Z

@rink1969 @jerry-yu @jiangxianliang007

帮忙贴下之前调查过程中总结的一些信息

kaikai1024 · 2019-02-21T02:55:30Z

@jerry-yu Please update the info.
You can create new issue if find other problem

jerry-yu · 2019-02-21T03:03:16Z

关于测试链的bug：经过rabbitmq的抓包，发现 mq 已经向 chain 发送了接收到的 executed_result 消息，但是chain的处理线程 recv_timeout 显示没有收到这个消息。经过定位和google，发现跟标准库的channel的 recv_timeout 函数的 bug 有关。
-> 一堆issue（ rust-lang/rust#48460 ）

解决：使用 crossbeam 的channel 替换标准库的channel，需要修改cita-common等

暂时解决：可以先做快照。测试发现，低压力（做过快照的）节点，丢包的数量比较少，很快恢复。而压力大的节点，会丢失较长时间。

jerry-yu · 2019-02-21T03:07:43Z

关于调优rocksdb：我们现有的kv-db的库，可供调优的参数提供的很少。
#[derive(Clone)]
pub struct DatabaseConfig {
/// Max number of open files.
pub max_open_files: i32,
/// Cache sizes (in MiB) for specific columns.
pub cache_sizes: HashMap<Option, usize>,
/// Compaction profile
pub compaction: CompactionProfile,
/// Set number of columns
pub columns: Option,
/// Should we keep WAL enabled?
pub wal: bool,
}

pub struct CompactionProfile {
/// L0-L1 target file size
pub initial_file_size: u64,
/// L2-LN target file size multiplier
pub file_size_multiplier: i32,
/// rate limiter for background flushes and compactions, bytes/sec, if any
pub write_rate_limit: Option,
}

classicalliu · 2019-02-21T03:09:19Z

关于测试链的bug：经过rabbitmq的抓包，发现 mq 已经向 chain 发送了接收到的 executed_result 消息，但是chain的处理线程 recv_timeout 显示没有收到这个消息。经过定位和google，发现跟标准库的channel的 recv_timeout 函数的 bug 有关。
-> 一堆issue（ rust-lang/rust#48460 ）

解决：使用 crossbeam 的channel 替换标准库的channel，需要修改cita-common等

It's a new problem, open a new issue is better :)

kaikai1024 · 2019-02-21T03:10:29Z

关于测试链的bug：经过rabbitmq的抓包，发现 mq 已经向 chain 发送了接收到的 executed_result 消息，但是chain的处理线程 recv_timeout 显示没有收到这个消息。经过定位和google，发现跟标准库的channel的 recv_timeout 函数的 bug 有关。
-> 一堆issue（ rust-lang/rust#48460 ）

解决：使用 crossbeam 的channel 替换标准库的channel，需要修改cita-common等

所以我理解这个是变成了另外一个 issue ？然后接着按照这个解决办法修改？如果是这样，可以麻烦建一个 issue 然后 reference 这个

jerry-yu · 2019-02-21T03:13:08Z

另外：在进行重构时，注意提供 kvdb 的接口，如果能够支持 tikv 的接口，减轻磁盘紧张的状况

yangby-cryptape · 2019-03-05T11:56:20Z

Source Code

CITA use RocksDB to store CurrentProof, CurrentHash and CurrentHeight.

https://github.com/cryptape/cita/blob/dd3e0cea53b9f96c3bcd9d387d5eead7042646d1/cita-chain/types/src/extras.rs#L48-L76

Check Data in Test Environment

Today, I checked our test environment:

There are 2_712_488 files in rocksdb for chain.
Total size is 33 GiB.
2_712_441 files of them are sst files.
1_314_306 sst files are 972 bytes.
I checked few 972-bytes sst files randomly, all of them were used for recording CurrentHeight.
And half of new files were 972 bytes.
Almost all files smaller than 1000 bytes are used to store CurrentHeigh.
Almost all files between 1500 bytes and 2500 bytes are used to store CurrentProof.

Comparison Test

Recently, I write a simple patch to let CITA do not store block data and current data in RocksDB.
Today, I run a chain with three nodes: node-1 and node-2 use my patched version CITA and node-3 uses the original version CITA.

In 6000 height:

My patched version has only 1 sst file.
The original version has more than 7000 sst files.
In the nodes which use my patched version CITA, all data are 59MB.
In the nodes which used the original version CITA, all data has 124MB.

Conclusion

DO NOT store current statuses into RocksDB.

I did not find the reason of these results, now.
If we care about that, I need more time to study RocksDB.

But, if we just want a solution to decrease sst files of RocksDB, this conclusion is enough.

rainchen · 2019-03-05T12:05:13Z

rocksdb 有个选项是target_file_size_multiplier

官方解释：

Q: What is options.target_file_size_multiplier useful for?

A: It's a rarely used feature. For example, you can use it to reduce the number of the SST files.

而默认的target_file_size_multiplier 配置为：

target_file_size_multiplier=1

配置值是1

对于这个配置项，看到有这个讨论：

facebook/rocksdb#3265 (comment)

I think parameter target_file_size_multiplier was introduced in early experimentation with LevelDB and never actually used in production.

既然默认配置是1，又不是常用参数，那么SST文件数量多可能不是真正引起CPU 过高的问题，
因为ext4 分区下是支持无限数量文件的。

如果不是文件数过多引起的高CPU占用，那需要同分析工具找出 CPU 占用高的是哪部分。

建议先开启 statistics，没有统计无法分析真正原因。

statistics是RocksDB用来统计系统性能和吞吐信息的功能，开启它可以更直接的提供性能观测数据，能快速发现系统的瓶颈或系统运行状态，由于统计信息在引擎内的各类操作都会设置很多的埋点，用来更新统计信息，但是开启statistics会增加5%到10%的额外开销。

参考：https://xiking.win/2018/12/05/rocksdb-tuning/

yangby-cryptape · 2019-03-05T15:06:37Z

同时不要把所有的 sst 文件放在 nosql一级目录下，可以按hash 目录索引，目录下存放子目录再放 sst。

Benchmark: Deep directory structure vs. flat directory structure to store millions of files on ext4:

Write is 44% faster using a flat directory structure instead of deep/tree directory structure. Read is even 7.8x faster.

In conclusion, just use a flat directory structure. It’s easier to use. Faster in write. Much faster in read. Save on ionodes. And doesn’t need to pre-create or dynamically generate the branch folders.

kaikai1024 · 2019-03-08T08:58:34Z

@jerry-yu 能把解决方案贴过来吗

kaikai1024 · 2019-03-08T09:31:26Z

Will release a patch version to fix it.

kaikai1024 · 2019-03-11T02:57:11Z

citahub/cita-common@49d4745

kaikai1024 added enhancement New feature or request more description TBD help wanted Extra attention is needed labels Jan 30, 2019

kaikai1024 changed the title ~~sst files of rocksdb are too many.~~ Sst files of rocksdb are too many. Jan 30, 2019

kaikai1024 added the research Need to research label Jan 30, 2019

kaikai1024 changed the title ~~Sst files of rocksdb are too many.~~ Too many Sst files of rocksdb. Jan 30, 2019

kaikai1024 changed the title ~~Too many Sst files of rocksdb.~~ Too many sst files of rocksdb. Jan 30, 2019

kaikai1024 pinned this issue Feb 18, 2019

kaikai1024 assigned jerry-yu Feb 18, 2019

kaikai1024 added the urgent label Feb 18, 2019

kaikai1024 self-assigned this Feb 18, 2019

kaikai1024 mentioned this issue Feb 18, 2019

Remove block data from rocksDB, find a better way to store it #236

Closed

kaikai1024 added this to the v0.22 milestone Feb 19, 2019

jerry-yu mentioned this issue Feb 21, 2019

Replace std::channel with crossbeam’s channel ## 使用crossbeam的channel替换rust的 std::channel #251

Closed

kaikai1024 mentioned this issue Mar 6, 2019

Update cita-common to 8c25204 commit. #289

Merged

kaikai1024 closed this as completed Mar 11, 2019

kaikai1024 unpinned this issue Mar 11, 2019

kaikai1024 mentioned this issue Mar 15, 2019

The way to reduce the disk footprint #284

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many sst files of rocksdb. #206

Too many sst files of rocksdb. #206

kaikai1024 commented Jan 30, 2019

kaikai1024 commented Feb 15, 2019 •

edited

Loading

janx commented Feb 15, 2019

kaikai1024 commented Feb 15, 2019

kaikai1024 commented Feb 18, 2019

kaikai1024 commented Feb 21, 2019

jerry-yu commented Feb 21, 2019 •

edited

Loading

jerry-yu commented Feb 21, 2019

classicalliu commented Feb 21, 2019

kaikai1024 commented Feb 21, 2019 •

edited

Loading

jerry-yu commented Feb 21, 2019

yangby-cryptape commented Mar 5, 2019 •

edited

Loading

rainchen commented Mar 5, 2019

yangby-cryptape commented Mar 5, 2019

kaikai1024 commented Mar 8, 2019

kaikai1024 commented Mar 8, 2019

kaikai1024 commented Mar 11, 2019

Too many sst files of rocksdb. #206

Too many sst files of rocksdb. #206

Comments

kaikai1024 commented Jan 30, 2019

Description

Steps to Reproduce

Versions

Additional Information

kaikai1024 commented Feb 15, 2019 • edited Loading

janx commented Feb 15, 2019

kaikai1024 commented Feb 15, 2019

kaikai1024 commented Feb 18, 2019

kaikai1024 commented Feb 21, 2019

jerry-yu commented Feb 21, 2019 • edited Loading

jerry-yu commented Feb 21, 2019

classicalliu commented Feb 21, 2019

kaikai1024 commented Feb 21, 2019 • edited Loading

jerry-yu commented Feb 21, 2019

yangby-cryptape commented Mar 5, 2019 • edited Loading

Source Code

Check Data in Test Environment

Comparison Test

Conclusion

rainchen commented Mar 5, 2019

yangby-cryptape commented Mar 5, 2019

kaikai1024 commented Mar 8, 2019

kaikai1024 commented Mar 8, 2019

kaikai1024 commented Mar 11, 2019

kaikai1024 commented Feb 15, 2019 •

edited

Loading

jerry-yu commented Feb 21, 2019 •

edited

Loading

kaikai1024 commented Feb 21, 2019 •

edited

Loading

yangby-cryptape commented Mar 5, 2019 •

edited

Loading