From 9f03d75843acea0eb30d0b52e59e5ef13a857099 Mon Sep 17 00:00:00 2001 From: Yu Leng Date: Mon, 16 Aug 2021 13:59:30 +0800 Subject: [PATCH] update en doc --- docs/.vuepress/config.js | 5 +- docs/operation/README.md | 309 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 306 insertions(+), 8 deletions(-) diff --git a/docs/.vuepress/config.js b/docs/.vuepress/config.js index cd1239cd0..eb2160e0a 100644 --- a/docs/.vuepress/config.js +++ b/docs/.vuepress/config.js @@ -134,9 +134,8 @@ module.exports = { title: 'Operation', collapsable: false, children: [ - ['', 'OM Advanced'], - ['Efficiency_of_sealing.md', 'Improve the efficiency of the sealing sector'], - ['System_monitor_of_Zabbix.md', 'System monitoring installation and use of Zabbix'], + ['', 'Finding optimal configurations'], + // ['System_monitor_of_Zabbix.md', 'System monitoring installation and use of Zabbix'], ] } ], diff --git a/docs/operation/README.md b/docs/operation/README.md index dd67dd804..0d140f401 100644 --- a/docs/operation/README.md +++ b/docs/operation/README.md @@ -1,9 +1,308 @@ -# venus独立组件运维进阶篇 +## Find your optimal configurations -   这里记录员如何运维venus独立组件,保障算力稳定增长,最大化利用系统资源,配置venus-worker的任务等系列文档。 +Get your filecoin mining operation up and running is hard. Expanding growth of your system is even harder. It will take a lot of time to scale growth and make sure your setup running without errors. + +## Overview + +General guidelines to follow when optimizing your sealing pipeline. + +- Pledge 2 to 4 sectors and record the exact time of each task (AP, P1, P2, C2) takes to finish + +- Make sure all your boxes have tasks assigned to them all the time +- Automate your `sector pledge` command with [script](https://filecoinproject.slack.com/archives/CPFTWMY7N/p1628092388117700?thread_ts=1628092099.117600&cid=CPFTWMY7N)/cron +- Use `MaxSealingSectors` to cap maximum number of sectors sealing in parallel +- Every worker can be assigned with a subset of tasks (AP, P1, P2, C2) to specialize + +## Record time for each task + +Types of task that a worker can do. + +```go +TTAddPiece TaskType = "seal/v0/addpiece" +TTPreCommit1 TaskType = "seal/v0/precommit/1" +TTPreCommit2 TaskType = "seal/v0/precommit/2" +TTCommit1 TaskType = "seal/v0/commit/1" // NOTE: We use this to transfer the sector into miner-local storage for now; Don't use on workers! +TTCommit2 TaskType = "seal/v0/commit/2" + +TTFinalize TaskType = "seal/v0/finalize" +TTFetch TaskType = "seal/v0/fetch" +TTUnseal TaskType = "seal/v0/unseal" +``` + +Each task shows up in log first with key word of `prepare` (start and end) then with key word of `work` as another log entry (also start and end). + +```bash +# seal/v0/fetch +2021-08-03T14:00:07.925+0800 INFO advmgr sector-storage/sched_worker.go:401 Sector 7 prepare for seal/v0/fetch ... +2021-08-03T14:05:36.772+0800 INFO advmgr sector-storage/sched_worker.go:403 Sector 7 prepare for seal/v0/fetch end ... + +2021-08-03T14:05:36.772+0800 INFO advmgr sector-storage/sched_worker.go:442 Sector 7 work for seal/v0/fetch ... +2021-08-03T14:05:36.774+0800 INFO advmgr sector-storage/sched_worker.go:444 Sector 7 work for seal/v0/fetch end ... + +# seal/v0/addpiece +2021-08-03T13:38:37.977+0800 INFO advmgr sector-storage/sched_worker.go:401 Sector 8 prepare for seal/v0/addpiece ... +2021-08-03T13:38:37.978+0800 INFO advmgr sector-storage/sched_worker.go:403 Sector 8 prepare for seal/v0/addpiece end ... + +2021-08-03T13:38:37.978+0800 INFO advmgr sector-storage/sched_worker.go:442 Sector 8 work for seal/v0/addpiece ... +2021-08-03T13:44:26.295+0800 INFO advmgr sector-storage/sched_worker.go:444 Sector 8 work for seal/v0/addpiece end ... + +# seal/v0/commit/2 +2021-08-03T13:26:02.119+0800 INFO advmgr sector-storage/sched_worker.go:401 Sector 7 prepare for seal/v0/commit/2 ... +2021-08-03T13:26:02.119+0800 INFO advmgr sector-storage/sched_worker.go:403 Sector 7 prepare for seal/v0/commit/2 end ... + +2021-08-03T13:26:02.119+0800 INFO advmgr sector-storage/sched_worker.go:442 Sector 7 work for seal/v0/commit/2 ... +2021-08-03T13:49:46.180+0800 INFO advmgr sector-storage/sched_worker.go:444 Sector 7 work for seal/v0/commit/2 end ... + +# seal/v0/finalize +2021-08-03T13:54:17.414+0800 INFO advmgr sector-storage/sched_worker.go:401 Sector 7 prepare for seal/v0/finalize ... +2021-08-03T13:59:30.471+0800 INFO advmgr sector-storage/sched_worker.go:403 Sector 7 prepare for seal/v0/finalize end ... + +2021-08-03T13:59:30.471+0800 INFO advmgr sector-storage/sched_worker.go:442 Sector 7 work for seal/v0/finalize ... +2021-08-03T14:00:07.915+0800 INFO advmgr sector-storage/sched_worker.go:444 Sector 7 work for seal/v0/finalize end ... +``` + +Some task may take more time in `prepare` than` work` and some are the other way around. Generally speaking, when task requires network transfer/bandwidth it will consume more time in `prepare` while if the task require more computation resources it will consume more time in `work`. Eg, AP, P1, P2, C2. + +To record time of core tasks like AP, P1, P2 and C2, we aggregate both the time of `fetch` before it and the task itself. For example, time of P1 = time of P1 + time of fetch before P1. + +## Performance factors + +There are many factors cobtributes to the performance of your sealing pipeline. + +### Sealing storage + +During sealing of a sector, cahce files will be generated by the proof algorithm which requires high disk IO speed. Low IO speed may result in idling of your computation resources (CPUs/GPUs). + +Choose apropriate hardware using forumla below. + +```bash +file size * number of parallel threads / operation time = average file IO speed +``` + + To get more precise estimations, sum up per task IO throughput. + +```bash +AP IO throughput = AP read + AP write +P1 IO throughput = P1 read + P1 write +P2 IO throughput = P2 read + P2 write +C2 IO throughput = C2 read + C2 write +``` + +SSD and NVMe are commonly used for sealing storages. To ensure effcient usage of these faster storage, it is recommended to use software RAID on these SSDs. + +```bash +mdadm -C /dev/md1 -l 0 -n 2 /dev/sdb1 /dev/sdc1 +mdadm -C /dev/md2 -l 5 -n 6 /dev/sd[b-g]1 +# Options +-C, --create +Create a new array. +-l, --level= +Set RAID level. +-n, --raid-devices= +Specify the number of active devices in the array. +-x, --spare-devices= +Specify the number of spare (eXtra) devices in the initial array. +-A, --assemble +Assemble a pre-existing array. +``` + +More on `mdadm`, please visit [here](http://raid.wiki.kernel.org/). Get latest version from [here](http://www.kernel.org/pub/linux/utils/raid/mdadm/). + +### Permenant storage + +Possible adversaries to overcome when setting up permenant storage. + +1. When a sector is sealed, it will be transferred from sealer to permenant storage which takes up network bandwidth and disk IOs. +2. During a `windowPost`, random selections files will be read in large number. Slow read may result in failed `windowPost`. +3. Choose high RAID level to have redunancy when possible. Eg, RAID5, RAID6, RAID10. +4. Monitor usage of your disk array. + +### Network transfer + +During sealing, if you specialize your worker in one type of task (to increase efficiency of your resources), it will result in file transfer over the network. If file being copied too slowly over the network, it will drag the speed of your sealing pipeline down. Closely monitor your computation resources and see if there is any idling. For example, if PC2 takes 25 minutes, reads ~400G and writes ~100G, then IO throughput will be ~368 MB/s (`440 * 1024 / 25 / 60 + 100 * 1024 / 25 / 60`). + +After sealing, the sealed sector need to be transferred to permanent storage which can be bottlenecked by the network bandwidth connecting your `venus-sealer` and your HDD disk array. + +### Environment variables + +SHA extension would make a huge difference in computing P1 tasks. P1 could cost around 250 minutes with SHA extension enabled while may cost 420+ minutes without SHA. + +When compiling `venus-sealer`, make sure you have set `RUSTFLAGS="-C target-cpu=native -g" FFI_BUILD_FROM_SOURCE="1"` flags and you shall see the following example output. + +```bash ++ trap '{ rm -f $__build_output_log_tmp; }' EXIT ++ local '__rust_flags=--print native-static-libs -C target-feature=+sse2' ++ RUSTFLAGS='--print native-static-libs -C target-feature=+sse2' ++ cargo +nightly-2021-04-24 build --release --no-default-features --features multicore-sdr --features pairing,gpu ++ tee /tmp/tmp.IYtnd3xka9 + Compiling autocfg v1.0.1 + Compiling libc v0.2.97 + Compiling cfg-if v1.0.0 + Compiling proc-macro2 v1.0.27 + Compiling unicode-xid v0.2.2 + Compiling syn v1.0.73 + Compiling lazy_static v1.4.0 + Compiling cc v1.0.68 + Compiling typenum v1.13.0 + Compiling serde_derive v1.0.126 + Compiling serde v1.0.126 +``` + +### Core restriction + +When running two types of tasks on same box, you may want to restrict CPU cores each task may use without competing for resources of the other. + +Through `taskset`. Note you cannot dynamically change core restrictions during execution of the program. + +```bash +TRUST_PARAMS=1 nohup taskset -c 0-32 ./venus-worker run +# Non-consecutive core selection +taskset -c 0-9,19-29,39-49 +``` + +Or through `Cgrep`, which supports dynamic core restrictions during program execution. + +```bash +sudo mkdir -p /sys/fs/cgroup/cpuset/Pre1-worker +sudo echo 0-31 > /sys/fs/cgroup/cpuset/Pre1-worker/cpuset.cpus +sudo echo > /sys/fs/cgroup/cpuset/Pre1-worker/cgroup.procs +``` + +## Worker optimization + +All numbers are for 32G sectors. For 64G sectors, double what the numbers of 32G sector. + +### P1 optimization + +Set following environment variable to speed up P1. + +```bash +# Store cache files in RAM; for 32G sectors, it will cost 56G RAM +export FIL_PROOFS_MAXIMIZE_CACHING=1 +# Use mutiple cores for P1 +export FIL_PROOFS_USE_MULTICORE_SDR=1 +``` + +P1 RAM usage includes 56G cache file and 2 layers of the sector for each sector sealing in parallel. + +```bash +# Assume 10 sector running in parallel +56G + 32G * 2 * 10 = 696G +``` + +P1 SSD usage includes 11 layers of the sector, 64G of `tree-d` file and 32G of the unsealed sector. + +```bash +# For 1 sector +11 * 32G + 64 + 32 = 440G +``` + +### P2 optimization + +Set following environment variable to speed up P2. + +```bash +# Use GPU for tree-r-last +export FIL_PROOFS_USE_GPU_COLUMN_BUILDER=1 +# Use GPU for tree-c +export FIL_PROOFS_USE_GPU_TREE_BUILDER=1 +``` + +P2 RAM usage is 96G. + +```bash +# Assume 10 sector running in parallel +96G * 10 = 960G +``` + +P1 SSD usage includes 4.6G tree-c file * 8, 9.2M tree-r-last file * 8, 4K t_aux file, 4K p_aux file and 32G unsealed sector file. + +```bash +4.6G * 8 + 8 * 9.2M + 4K * 2 + 32G = ~70G +``` + +### Commit + +C1 cost little CPU usage, but require sum of P1 and P2 SSD usage. + +```bash +P1 440G + P2 79G = 519G +``` + +C2 environment variable + +```bash +BELLMAN_NO_GPU=1 +# Example, if you are using 3090 +GPUBELLMAN_CUSTOM_GPU="GeForce RTX 3090:10496" +``` + +C2 RAM usage. + +```bash +128G + 64G = 192G +``` + +## Optimize sealing pipeline + +### Calculate your daily growth + +Calculate how many tasks your sealing pipeline can process. + +```bash +# for each type of task +tasks done / time = production rate +daily production rate * (32G OR 64G) = daily growth in power +``` + +For example, if we have one box and can finish P1 in 240 minutes, P2 in 30 minutes and Commit in 35 minutes, then you can derive daily growth by the following chart. + +| Task | Minute | Parallel | Hourly production rate | +| ------ | ------ | -------- | ---------------------- | +| P1 | 240 | 1 | 0.25 = 1 / (240 /60) | +| P2 | 30 | 1 | 2 = 1 / (30 /60) | +| Commit | 35 | 1 | 1.71 = 1 / (35 /60) | + +### Finding optimal task configurations + +From the table above, we know that daily growth will be bottlenecked by P1. Adjust number of parallel tasks for different types of task to achieve maximum efficiency. + +| Task | Minutes | Parallel | Hourly productin | Output | Memory consumption | +| ------ | ------- | -------- | -------------------- | ------ | ------------------ | +| P1 | 240 | 7 | 1.75 = 7 / (240 /60) | 1344 G | 504 G = 7*64+56 | +| P2 | 30 | 1 | 2 = 1 / (30 /60) | 1536 G | 96 G = 1*96 | +| Commit | 35 | 1 | 1.71 = 1 / (35 /60) | 1316 G | 192 G = 1*128+64 | + +The goal is to have `output` for each task to be as close as possilbe so that the sealing pipeline runs in its maximum efficiency. Things to watchout for includes... + +1. `hourly production` for Commit is lower than P1, which may result in tasks backlogged in Commit phase. +2. When one type of tasks being overly efficient than others, resources may become idle. +3. Miro management is needed to have highest possible efficiency. + +### Finding optimal pledging + +For example, if you find 7 P1 task to the optimal for your system, change the following venus-sealer configurations. + +```toml +[Sealing] + MaxSealingSectors = 7 +``` + +## Stop-loss + +If one of tasks fails too many times, manual intervention is needed to get sealing pipeline back to its normal output. + +Remove sectors when you have the following issues. + +1. Expired ticket +2. Expired Commit +3. Corrupted proof params + +To remove incomplete sectors. + +```bash +venus-sealer sectors remove --really-do-it +``` -## 目录 -1. [提升密封扇区效率](Efficiency_of_sealing.md) -2. [系统监控值zabbix](System_monitor_of_Zabbix.md)