Slow block import #13

purestakeoskar · 2023-04-18T15:30:38Z

Is there an existing issue?

I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

This is not a support question.

Description of bug

When syncing a node the block import is very slow, to the point where block production is faster than block import.
Instead of sync the logs shows Preparing.

2023-04-17 14:41:30.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373001 (31 peers), best: #3372537 (0x41ba…81b4), finalized #3372219 (0x602b…446f), ⬇ 35.2kiB/s ⬆ 4.8kiB/s
2023-04-17 14:41:34.478 DEBUG tokio-runtime-worker runtime::system: [🌗] [3372538] 0 extrinsics, length: 10962 (normal 0%, op: 0%, mandatory 0%) / normal weight:Weight(ref_time: 265621300000, proof_size: 0) (70%) op weight Weight(ref_time: 0, proof_size: 0) (0%) / mandatory weight Weight(ref_time: 7235415758, proof_size: 0) (0%)
2023-04-17 14:41:34.500 TRACE tokio-runtime-worker sync::import-queue: [🌗] Block imported successfully Some(3372538) (0x10fe…ec7d)
2023-04-17 14:41:34.500 TRACE tokio-runtime-worker sync::import-queue: [🌗] Header 0xb8de…8fa7 has 4 logs
2023-04-17 14:41:35.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.2 bps, target=#3373001 (31 peers), best: #3372538 (0x10fe…ec7d), finalized #3372219 (0x602b…446f), ⬇ 3.8kiB/s ⬆ 4.8kiB/s
2023-04-17 14:41:38.733 TRACE tokio-runtime-worker sync::import-queue: [🌗] Scheduling 1 blocks for import
2023-04-17 14:41:40.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373002 (31 peers), best: #3372538 (0x10fe…ec7d), finalized #3372219 (0x602b…446f), ⬇ 22.6kiB/s ⬆ 3.9kiB/s
2023-04-17 14:41:45.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373002 (31 peers), best: #3372538 (0x10fe…ec7d), finalized #3372219 (0x602b…446f), ⬇ 7.0kiB/s ⬆ 3.8kiB/s
2023-04-17 14:41:50.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373002 (31 peers), best: #3372538 (0x10fe…ec7d), finalized #3372219 (0x602b…446f), ⬇ 4.2kiB/s ⬆ 4.5kiB/s
2023-04-17 14:41:51.372 TRACE tokio-runtime-worker sync::import-queue: [🌗] Scheduling 1 blocks for import
2023-04-17 14:41:53.983 DEBUG tokio-runtime-worker runtime::system: [🌗] [3372539] 0 extrinsics, length: 45566 (normal 1%, op: 0%, mandatory 0%) / normal weight:Weight(ref_time: 357569150000, proof_size: 0) (95%) op weight Weight(ref_time: 0, proof_size: 0) (0%) / mandatory weight Weight(ref_time: 7235415758, proof_size: 0) (0%)
2023-04-17 14:41:54.008 TRACE tokio-runtime-worker sync::import-queue: [🌗] Block imported successfully Some(3372539) (0xb8de…8fa7)
2023-04-17 14:41:54.008 TRACE tokio-runtime-worker sync::import-queue: [🌗] Header 0x1835…434d has 4 logs
2023-04-17 14:41:55.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.2 bps, target=#3373003 (31 peers), best: #3372539 (0xb8de…8fa7), finalized #3372219 (0x602b…446f), ⬇ 11.9kiB/s ⬆ 3.2kiB/s
2023-04-17 14:42:00.390  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373003 (31 peers), best: #3372539 (0xb8de…8fa7), finalized #3372219 (0x602b…446f), ⬇ 10.7kiB/s ⬆ 4.1kiB/s
2023-04-17 14:42:00.390  WARN tokio-runtime-worker telemetry: [🌗] ❌ Error while dialing /dns/telemetry.polkadot.io/tcp/443/x-parity-wss/%2Fsubmit%2F: Custom { kind: Other, error: Timeout }
2023-04-17 14:42:05.390  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373003 (31 peers), best: #3372539 (0xb8de…8fa7), finalized #3372219 (0x602b…446f), ⬇ 11.4kiB/s ⬆ 3.0kiB/s
2023-04-17 14:42:06.384 TRACE tokio-runtime-worker sync::import-queue: [🌗] Scheduling 1 blocks for import
2023-04-17 14:42:09.607 DEBUG tokio-runtime-worker runtime::system: [🌗] [3372540] 0 extrinsics, length: 26287 (normal 0%, op: 0%, mandatory 0%) / normal weight:Weight(ref_time: 327287250000, proof_size: 0) (87%) op weight Weight(ref_time: 0, proof_size: 0) (0%) / mandatory weight Weight(ref_time: 7235415758, proof_size: 0) (0%)
2023-04-17 14:42:09.632 TRACE tokio-runtime-worker sync::import-queue: [🌗] Block imported successfully Some(3372540) (0x1835…434d)
2023-04-17 14:42:09.632 TRACE tokio-runtime-worker sync::import-queue: [🌗] Header 0x620c…caa3 has 4 logs
2023-04-17 14:42:10.390  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.2 bps, target=#3373004 (31 peers), best: #3372540 (0x1835…434d), finalized #3372219 (0x602b…446f), ⬇ 29.9kiB/s ⬆ 4.6kiB/s
2023-04-17 14:42:15.390  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373004 (31 peers), best: #3372540 (0x1835…434d), finalized #3372219 (0x602b…446f), ⬇ 20.8kiB/s ⬆ 4.2kiB/s

The node is connected to peers with blocks we need.

2023-04-17 14:45:38.109  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373020 (18 peers), best: #3372550 (0x18a5…4ca5), finalized #3372219 (0x602b…446f), ⬇ 5.2kiB/s ⬆ 2.3kiB/s
2023-04-17 14:45:39.845 TRACE tokio-runtime-worker sync: [🌗] New peer 12D3KooWH3nhNXgsiPhVREEcrbbWwVyNfuLnwfrgwdkTPKjtHS2N BlockAnnouncesHandshake { roles: FULL, best_number: 3373019, best_hash: 0x2bb9696090704a6ca516b482a1661177894ba7a63f1dffa3d958edb2d19a980f, genesis_hash: 0xfe58ea77779b7abda7da4ec526d14db9b1e9cd40a217c34892af80a9b332b76d }
2023-04-17 14:45:39.845 DEBUG tokio-runtime-worker sync: [🌗] New peer with known best hash 0x2bb9…980f (3373019).
2023-04-17 14:45:39.845 DEBUG tokio-runtime-worker sync: [🌗] Connected 12D3KooWH3nhNXgsiPhVREEcrbbWwVyNfuLnwfrgwdkTPKjtHS2N
2023-04-17 14:45:41.464 TRACE tokio-runtime-worker sync: [🌗] New peer 12D3KooWGNoqQTFj92X8S2x42uZFexUQRQFeLDiWzarJmXJBV42F BlockAnnouncesHandshake { roles: FULL, best_number: 3373019, best_hash: 0x2bb9696090704a6ca516b482a1661177894ba7a63f1dffa3d958edb2d19a980f, genesis_hash: 0xfe58ea77779b7abda7da4ec526d14db9b1e9cd40a217c34892af80a9b332b76d }
2023-04-17 14:45:41.464 DEBUG tokio-runtime-worker sync: [🌗] New peer with known best hash 0x2bb9…980f (3373019).
2023-04-17 14:45:41.464 DEBUG tokio-runtime-worker sync: [🌗] Connected 12D3KooWGNoqQTFj92X8S2x42uZFexUQRQFeLDiWzarJmXJBV42F
2023-04-17 14:45:41.474 TRACE tokio-runtime-worker sync: [🌗] 12D3KooWGNoqQTFj92X8S2x42uZFexUQRQFeLDiWzarJmXJBV42F Ignoring transactions while major syncing

There are also queued synced blocks (sync_queued_blocks metric)

Another interesting note is that the node does not know how far it needs to sync. Its sync_target is equal to best_block

Steps to reproduce

Start syncing a moonbem or moonriver node with archive parachain and pruned relay chain
We are running moonbeam version 0.30.3 ( using polkadot 0.9.37 )
Main flags used

--execution=wasm
--pruning=archive
--rpc-cors=all
--unsafe-rpc-external
--unsafe-ws-external
--rpc-methods=safe
--db-cache=1024
--trie-cache-size=1073741824
--runtime-cache-size=32
--eth-log-block-cache=1000000000
--eth-statuses-cache=1000000000
--detailed-log-output
--no-hardware-benchmarks
--ws-max-connections=800

The text was updated successfully, but these errors were encountered:

altonen · 2023-04-19T07:12:41Z

Can you run the node with -lsync=trace? Are you able to reproduce this issue reliable every time you run the node?

purestakeoskar · 2023-04-19T14:25:53Z

@altonen thanks for your response.
Yes this i reproducible, every time i restart it goes back to the same state and it happens on multiple nodes.
This is 5min of sync=trace logs: moonbeam-sync-trace.log

altonen · 2023-04-20T12:11:20Z

From the logs it looks like there are no block announcements coming for the relay chain so that could possibly explain why the target block is not progressing. And since for most peers you're connected the best block is not progressing because either a) they're major syncing as well and don't announce blocks or b) the inbound /block-announces/1 is closed and it's not detected by your node, your node is making very little progress.

Why it takes a minute between importing block 15159517 and 15159518 is weird and I don't know why that would be happening, any idea @bkchr?

purestakeoskar · 2023-04-20T13:00:37Z

Just for clarification, the relay chain in question is Polkadot.

bkchr · 2023-04-20T20:01:14Z

Why it takes a minute between importing block 15159517 and 15159518 is weird and I don't know why that would be happening, any idea @bkchr?

No. It started directly the checking of the header of 15159518, but as you said it took over a minute to import it. Not sure what happens here.

altonen · 2023-04-26T13:22:07Z

@purestakeoskar is this reproducible for you on v0.9.36?

bLd75 · 2023-04-27T07:57:43Z

Bringing in we have the same problem on Astar with an http endpoint which is heavily queried, tested on both 9933 and 9944.
The abnormal behavior strangely feels like more than 1 year ago when RPC nodes started to lose sync while heavily queried. CPU usage is > 60%.
Client is on v0.9.39 and runtime on v0.9.37

purestakeoskar · 2023-04-27T12:56:33Z

@purestakeoskar is this reproducible for you on v0.9.36?

I do not have a client with v0.9.36 to test with unfortunately. Is there any change in particular in 0.9.37 that stands out?

skunert · 2023-04-28T11:15:20Z

@purestakeoskar Does this node use paritydb or rocksdb? And does this problem persist forever or are block imports normal again after some time? Sync mode normal?
I want to reproduce this.

purestakeoskar · 2023-04-28T13:03:53Z

@skunert The node is using rocksdb, and the blocks does slowly import, some faster than others, the syncing does eventually finish and after that the node follows the chain.

Another interesting note (this might be a separate issue). when the node restarts the grandpa sessions takes a while to be established on the relay side, so we dont see blocks finalizing for sometimes up to 15 minutes. But then when the grandpa notification streams are open we start to see the finalization moving again. During this time block import still happens.

Also to note, we have the same behaviour when we use external RPC for the relay chain.

jasl · 2023-04-30T13:06:47Z

@purestakeoskar Does this node use paritydb or rocksdb? And does this problem persist forever or are block imports normal again after some time? Sync mode normal? I want to reproduce this.

Perhaps you can reproduce the issue use Khala network, we're on the latest Polkadot v0.9.41, can both reproduce no matter ParityDB or RocksDB

docker run -dti --name khala-node \
	-e NODE_NAME=khala-node \
	-e NODE_ROLE=FULL \
	-e PARACHAIN_DB="paritydb" \
	-e RELAYCHAIN_DB="paritydb" \
	-e PARACHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
	-e RELAYCHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
	--network host \
	--restart always \
	--log-driver local --log-opt max-size=100m --log-opt max-file=3 \
	-v /data/khala-node:/root/data \
	phalanetwork/khala-node-with-launcher:v0.1.24-1

You should see a significant sync speed regression when the progress reaches ~50%

Some users will experience the sync speed reduction to 0 bps (both Parachain and Relaychain), some lucky users will have ~ 50bps (in my experience)

jasl · 2023-04-30T13:09:45Z

We left some comments on paritytech/substrate#9360

arkpar · 2023-05-02T06:13:16Z

You can use https://github.com/koute/not-perf for profiling

# once the node is stuck and shows 0 bps with busy CPU, let nperf run for ~20 seconds and quit with Ctrl-C
nperf record -p $PID_OF_NODE_PROCESS -o datafile

nperf flamegraph datafile > flame.svg

jasl · 2023-05-03T00:11:14Z

@arkpar Here it is, the node uses the default arguments, recording when sync is slow

I see [2023-05-02T23:42:11Z ERROR nperf_core::profiler] Failed to load '/memfd:wasm-memory-image (deleted)' from "/memfd:wasm-memory-image (deleted)": No such file or directory (os error 2) not sure it impacts the result

flaminggraph.zip

arkpar · 2023-05-03T07:44:24Z

There seem to be a lot of time wasted in cumulus_client_consensus_common::parachain_consensus::follow_finalized_head
In particular, there's a lot of header queries that don't hit cache for some reason.
@bkchr

bkchr · 2023-05-03T07:45:49Z

Yeah, @skunert is already working on this: paritytech/cumulus#2495

skunert · 2023-05-03T07:59:53Z

Yes, the problem are finality notifications that reach the node while syncing when the finalized block is far away from the tip of the chain.

For each of these finalization attempts, we calculate multiple tree routes. One from last finalized -> new finalized and one from new_finalized -> best_block. This last route is very long in the collator case and takes up to 15 seconds on my machine. During this time, the import lock is taken, and no other import operations are possible. We do not hit the cache since the distance between finalized and best can be millions of blocks.

I see the following options:

Cache the tree route from finalized -> best block and verify that the new finalized block is on the correct route (looking into this currently)
Move tree route computation out of the import lock for finalization and verify only that this computed route is still valid
Disable collator finalization during major sync. This is the easiest fix but will lead to a multi-minute finalization operation from genesis to tip once fully synced.

@arkpar Any opinions on this?

arkpar · 2023-05-03T08:24:36Z

So this looks like paritytech/substrate#9360, p.2

For each of these finalization attempts, we calculate multiple tree routes. One from last finalized -> new finalized and one from new_finalized -> best_block.

What is that last route used for? Maybe we can make some assumptions if the difference in block numbers is over some threshold?

Disable collator finalization during major sync. This is the easiest fix but will lead to a multi-minute finalization operation from genesis to tip once fully synced.

In theory, we can finalize any chain in constant time, without having to check all the intermediate blocks. IIRC we only query tree routes to get a list of discarded blocks and actually discard them. This list should be empty after the initial sync anyway. Implementing this at the database level could be a bit involved. state-db in particular requires finalizing blocks one at a time, with the exception of warp sync.

bkchr · 2023-05-03T08:30:04Z

We could check the number of leaves. If the number of leaves is one, we don't need to calculate the tree route as there isn't any other chain. This should also hold for the initial sync and should give us the speed up we need.

skunert · 2023-05-03T08:34:24Z

We could check the number of leaves. If the number of leaves is one, we don't need to calculate the tree route as there isn't any other chain. This should also hold for the initial sync and should give us the speed up we need.

This sounds reasonable, will give it a try.

arkpar · 2023-05-03T09:01:39Z

We could check the number of leaves. If the number of leaves is one, we don't need to calculate the tree route as there isn't any other chain. This should also hold for the initial sync and should give us the speed up we need.

This sounds reasonable for this particular issue, but it won't fix the problem reported in paritytech/cumulus#2495 There will still be a million blocks finalized at once.

skunert · 2023-05-03T09:04:52Z

We could check the number of leaves. If the number of leaves is one, we don't need to calculate the tree route as there isn't any other chain. This should also hold for the initial sync and should give us the speed up we need.

This sounds reasonable for this particular issue, but it won't fix the problem reported in paritytech/cumulus#2495 There will still be a million blocks finalized at once.

True, but this issue here is particularly annoying since this blocks the entire syncing. paritytech/cumulus#2495 is for sure still problematic but will only stall the node for a few minutes before it continues operation normally. We should look into further improvements.

skunert · 2023-05-05T07:29:23Z

This change is now in cumulus, closing here.

jasl · 2023-05-14T13:31:02Z

I just made a new node based on polkadot-v0.9.42 and did a resync on my Azure VM (which stuck sync when reaching ~50% progress),
Unfortunately, I still see the node stuck

2023-05-14 13:27:50 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909361 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 367.0kiB/s ⬆ 260.9kiB/s
2023-05-14 13:27:53 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885948 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 5.5kiB/s ⬆ 6.0kiB/s
2023-05-14 13:27:55 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909361 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 303.7kiB/s ⬆ 229.6kiB/s
2023-05-14 13:27:58 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (39 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 30.4kiB/s ⬆ 6.0kiB/s
2023-05-14 13:28:00 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909362 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 598.7kiB/s ⬆ 222.9kiB/s
2023-05-14 13:28:03 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 11.1kiB/s ⬆ 7.3kiB/s
2023-05-14 13:28:05 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909363 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 208.4kiB/s ⬆ 194.0kiB/s
2023-05-14 13:28:08 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 43.7kiB/s ⬆ 9.3kiB/s
2023-05-14 13:28:10 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909364 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 284.6kiB/s ⬆ 183.3kiB/s
2023-05-14 13:28:13 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (39 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 27.1kiB/s ⬆ 5.6kiB/s
2023-05-14 13:28:15 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909365 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 230.8kiB/s ⬆ 202.6kiB/s
2023-05-14 13:28:18 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 34.1kiB/s ⬆ 8.1kiB/s
2023-05-14 13:28:20 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909366 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 285.0kiB/s ⬆ 232.5kiB/s
2023-05-14 13:28:23 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 8.2kiB/s ⬆ 6.7kiB/s
2023-05-14 13:28:25 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909367 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 269.0kiB/s ⬆ 225.0kiB/s
2023-05-14 13:28:28 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 42.2kiB/s ⬆ 10.2kiB/s
2023-05-14 13:28:30 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909368 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 252.5kiB/s ⬆ 211.8kiB/s

I'll try record nperf tomorrow

BTW, I see that node can't stop gracefully (so does previous)

docker stop khala-node
Error response from daemon: cannot stop container: khala-node: tried to kill container, but did not receive an exit event

jasl · 2023-06-01T12:06:44Z

@bkchr I think txpool has trouble, here is the log, I only run the node < 5 min, that generates 8GB log

https://storage.googleapis.com/phala-misc/trace_txpool.log.zip

NOTE: I must confess in the early day Khala online, our pallet has serious bugs we have to clear our collators' txpool for mitigate.

But we found sync Phala (on Polkadot) has the same trouble which old problems have already been fixed properly.

For now I don't have Phala node so I can't share info yet, but I'll do next week

arkpar · 2023-06-01T14:10:37Z

The log is huge because tree_route is called mutiple times for a long route and the whole thing is just dumped into the log.

2023-06-01 11:40:55.922 DEBUG tokio-runtime-worker txpool: [Parachain] resolve hash:0xfd9bc5828f075da985eefc978d6da257b4a435c5cc8b8495b67a7d2f853b50e5 finalized:true tree_route:TreeRoute { route: [HashAndNumber { number: 3242935, ....

There was a PR(paritytech/substrate#13004) to prevent this, but apparently it does not help much.

skunert · 2023-06-01T17:04:54Z

In the logs we see the transaction pool act on finality notifications. Usually we should only see one finality justification for each incoming block with an attached justification. The justification should be attached every justification_period block or at the end of the era.

@jasl However I found that in khala the justification_period is set to 1 here. This is really bad as the node will store every justification and also give it out during sync. This generated finality notifications which in turn trigger actions in the node like in the transaction pool during sync, so it will impact all nodes that request blocks to it from syncing.

jasl · 2023-06-01T17:11:27Z

In the logs we see the transaction pool act on finality notifications. Usually we should only see one finality justification for each incoming block with an attached justification. The justification should be attached every justification_period block or at the end of the era.

@jasl However I found that in khala the justification_period is set to 1 here. This is really bad as the node will store every justification and also give it out during sync. This generated finality notifications which in turn trigger actions in the node like in the transaction pool during sync, so it will impact all nodes that request blocks to it from syncing.

Yeah, we patched the justification_period for historical reasons, do you mean the txpool will be processed when generating finality justification? if we revert to default value 512 we would see each 512 blocks process txpool which reduce the pressure of node

skunert · 2023-06-01T17:17:41Z

I mean its not just the transaction pool, basically every location that listens for finality notifications in substrate will be notified. In addition we verify the justification, do extra operations in the database, state-db etc.. This brings extra overhead that we should not have for every single block.

jasl · 2023-06-01T17:47:50Z

I mean its not just the transaction pool, basically every location that listens for finality notifications in substrate will be notified. In addition we verify the justification, do extra operations in the database, state-db etc.. This brings extra overhead that we should not have for every single block.

I can understand we patched justification_period to 1 introducing additional performance impact, I'll ask our dev for whether I can remove the patch.

But I still don't quite understand:
Do you mean txpool awakens by finality notifications and this is required for justification verification?
or do you mean in major sync, txpool shouldn't awaken even if it got a finality notification?

PS: I'm too noob in this domain, is there any doc I can learn?

bkchr · 2023-06-01T23:20:29Z

I can understand we patched justification_period to 1 introducing additional performance impact, I'll ask our dev for whether I can remove the patch.

There is a RPC to proof finality that you can use.

Do you mean txpool awakens by finality notifications and this is required for justification verification?
or do you mean in major sync, txpool shouldn't awaken even if it got a finality notification?

paritytech/substrate#14285 this will improve the situation. However, I also think that there is maybe still more to it.

jasl · 2023-06-02T07:36:15Z

I can understand we patched justification_period to 1 introducing additional performance impact, I'll ask our dev for whether I can remove the patch.

There is a RPC to proof finality that you can use.

Do you mean txpool awakens by finality notifications and this is required for justification verification?
or do you mean in major sync, txpool shouldn't awaken even if it got a finality notification?

paritytech/substrate#14285 this will improve the situation. However, I also think that there is maybe still more to it.

Thank you, I can backport your PR to our node to test

skunert · 2023-06-02T10:55:50Z

Do you mean txpool awakens by finality notifications and this is required for justification verification?
or do you mean in major sync, txpool shouldn't awaken even if it got a finality notification?

Transaction pool listens for these notifications and does some maintenance work. We have logic in place to skip this based on block distances. However when the node is finalizing every block, this maintenance is triggered too often.
But as I said, there are other problems as well. Finalization of each block itself also takes more resources than finalizing in batches.

Even if you disable this now, your existing nodes will have the justifications in their db. Ideally, the node would not finalize for every incoming justification during sync, so that will probably change.

jasl · 2023-06-03T16:43:16Z

I've backported paritytech/substrate#14285 to our Khala node
Here are new flame graphs

Short (run about 20 s)

sync_short.svg.zip

Long (run about 1.5 min)

sync_long.svg.zip

bkchr · 2023-06-04T21:04:40Z

@jasl can you please share the exact CLI args you are using to spawn your node?

This could still be a result of your relay chain node importing a justification every block. We need to wait for @andresilva to provide a fix that ensures that we don't import justifications every block as you have configured it (which is wrong and should be reverted!).

jasl · 2023-06-04T21:36:15Z

@jasl can you please share the exact CLI args you are using to spawn your node?

This could still be a result of your relay chain node importing a justification every block. We need to wait for @andresilva to provide a fix that ensures that we don't import justifications every block as you have configured it (which is wrong and should be reverted!).

./khala-node \
  --chain khala \
  --base-path $DATA_PATH \
  --name $NODE_NAME \
  --port 30333 \
  --prometheus-port 9615 \
  --rpc-port 9933 \
  --ws-port 9944 \
  --database paritydb \
  --no-hardware-benchmarks \
  --no-telemetry \
  --rpc-max-response-size 64 \
  --max-runtime-instances 16 \
  --runtime-cache-size 8 \
  --state-pruning archive-canonical \
  --blocks-pruning archive-canonical \
  -- \
  --chain kusama \
  --port 30334 \
  --prometheus-port 9616 \
  --rpc-port 9934 \
  --ws-port 9945 \
  --database paritydb \
  --no-hardware-benchmarks \
  --no-telemetry \
  --rpc-max-response-size 64 \
  --max-runtime-instances 16 \
  --runtime-cache-size 8 \
  --state-pruning archive-canonical \
  --blocks-pruning archive-canonical

(which is wrong and should be reverted!).

Thank you, now I understand how bad it would be if we set a very short justification_period, I'll forward this warning to our team.

skunert · 2023-06-05T07:09:05Z

@jasl If you want you could try to run your node with --relay-chain-rpc-url <external-relay-chain-rpc> and point it to one of your polkadot nodes. This will not start an internal polkadot node and thus the justification period will not have an impact there. This way you could at least see if it is syncing faster with this issue eliminated even before we have the proper fix in place.

jasl · 2023-06-05T07:12:10Z

@jasl If you want you could try to run your node with --relay-chain-rpc-url <external-relay-chain-rpc> and point it to one of your polkadot nodes. This will not start an internal polkadot node and thus the justification period will not have an impact there. This way you could at least see if it is syncing faster with this issue eliminated even before we have the proper fix in place.

Our collators are using --relay-chain-rpc-url <external-relay-chain-rpc>, and seem not to see any sync issue, I'll try for our normal nodes later.

But running 2 apps in one Docker container is not recommended, so I would like to test, for end-users I shall waiting the proper fix

bLd75 · 2023-06-14T16:26:29Z

Having here the same behaviour on an Astar node.
It's been tested with relay RPC sync too but the issue is the same.

It's important to note that it happens only on RPC under high number of requests.
On the screenshot below we can see the same behavior, node loses sights of head of the chain pretty quickly after start, and this gap increases over time. While finalization is getting completely stuck.
Node gets restarted after an hour (18:15 on the graph), resyncing with the right chain height, then losing it again.

skunert · 2023-06-19T15:27:32Z

@bLd75 Can you provide logs with -lsync=trace for the time period where the issue appears? The issues we had previously discussed in the comments should not occur when you run with an external relay chain node.

crystalin · 2023-07-19T07:50:42Z

This is still an issue on Polkadot 0.9.43:
Looking breifly at the logs of sync=trace, it seems that blocks are imported fast initially but the more and more time passes, the less concurrent block are processed. It looks like some block import might get stuck forever.
sync.log

crystalin · 2023-07-19T08:05:55Z

Additional information. It seems that the import is limited by the Disk IOPS (this is specially the case in Cloud services providing low default IOPS).
Surprisingly, even when sync is at 0.0/s it still is using the max IOPS (3000)

I think something changed in recent version (probably around 0.9.37) which is significantly increasing the IOPS reaching a limit where it snowballs and block most of the processes.

Ex:

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
nvme6n1        3032.20     15671.60         0.00         0.00     156716          0          0

Increasing it (realtime) to 10k IOPS allows to restore some bps but it quickly goes down to 0.2 or 0.0 bps. I think something is inefficient and snowballing.

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
nvme6n1        10114.10     52228.00       356.00         0.00     522280       3560          0

Jul 19 08:13:35 ip-10-0-0-228 parity-moonbeam[375045]: 2023-07-19 08:13:35 [Relaychain] ⚙️  Syncing  0.2 bps, target=#18838858 (43 peers), best: #14447387 (0x7cb5…d01f), finalized #14447104 (0x8c38…91fb), ⬇ 517.3kiB/s  449.1kiB/s
Jul 19 08:13:35 ip-10-0-0-228 parity-moonbeam[375045]: 2023-07-19 08:13:35 [🌗] ⚙️  Syncing  0.2 bps, target=#4697322 (27 peers), best: #3451127 (0x41cd…1d03), finalized #2565171 (0xb87f…89bb), ⬇ 8.2kiB/s  1.4kiB/s

=== Experiment 1: rocksdb from scratch ===
I tried sync from scratch using rocks db:

Jul 19 08:30:58 ip-10-0-0-228 parity-moonbeam[1354684]: 2023-07-19 08:30:58 [Relaychain] ⚙️  Syncing 368.8 bps, target=#16465081 (1 peers), best: #19264 (0x4555…7066), finalized #18944 (0x48b9…c40b), ⬇ 210.2kiB/s ⬆ 28.6kiB/s
Jul 19 08:30:58 ip-10-0-0-228 parity-moonbeam[1354684]: 2023-07-19 08:30:58 [🌗] ⚙️  Syncing 131.8 bps, target=#4025691 (1 peers), best: paritytech/substrate#5315 (0x7cea…36da), finalized #0 (0xfe58…b76d), ⬇ 510.3kiB/s ⬆ 0.3kiB/s

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
nvme6n1          74.20        26.40     17532.80         0.00        264     175328          0

(for 370 bps, IOPS is only ~100, however the chain/state was empty as it was the beginning of the sync)

=== Experiment 2: paritydb from scratch ===

Jul 19 08:34:57 ip-10-0-0-228 parity-moonbeam[1358078]: 2023-07-19 08:34:57 [Relaychain] ⚙️  Syncing 639.2 bps, target=#16465119 (1 peers), best: #46670 (0xf6a7…84f9), finalized #46592 (0x1456…92dd), ⬇ 261.1kiB/s ⬆ 13.7kiB/s
Jul 19 08:34:57 ip-10-0-0-228 parity-moonbeam[1358078]: 2023-07-19 08:34:57 [🌗] ⚙️  Syncing 152.2 bps, target=#4025708 (1 peers), best: paritytech/substrate#11722 (0x8186…cf93), finalized #0 (0xfe58…b76d), ⬇ 465.1kiB/s ⬆ 0.3kiB/s

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
nvme6n1        3137.40        24.80     55084.00         0.00        248     550840          0

(for 370 bps, IOPS is ~3000 which is a lot compared to rocksdb)

jasl · 2023-07-19T08:49:26Z

This is still an issue on Polkadot 0.9.43: Looking breifly at the logs of sync=trace, it seems that blocks are imported fast initially but the more and more time passes, the less concurrent block are processed. It looks like some block import might get stuck forever. sync.log

paritytech/substrate#14285 seems not bundled in 0.9.43, but in 0.9.44.
paritytech/substrate#14423 have to wait .45

You have to backport it by yourself. we forked 0.9.43 and cherry-pick these changes.
here's our sample https://github.com/Phala-Network/khala-parachain/blob/main/Cargo.toml#L60-L65

crystalin · 2023-07-19T08:53:32Z

Thank you, I'll try those

lexnv · 2024-07-09T10:32:34Z

@jasl @crystalin Do you still get this issue with the latest release?

github-actions bot added the J2-unconfirmed label Apr 18, 2023

skunert mentioned this issue Apr 28, 2023

Parachain finalization keeps import lock occupied after sync paritytech/cumulus#2495

Open

altonen mentioned this issue Apr 30, 2023

Cumulus sync performance paritytech/substrate#9360

Open

jasl mentioned this issue May 2, 2023

extreamly slow booting paritytech/parity-db#207

Closed

skunert mentioned this issue May 3, 2023

Only calculate tree route during finalization when there are multiple leaves paritytech/substrate#14067

Merged

skunert closed this as completed May 5, 2023

andresilva mentioned this issue Jun 20, 2023

grandpa: avoid importing unnecessary justifications paritytech/substrate#14423

Merged

d-roak mentioned this issue Jun 23, 2023

bug: sequencers forking keep-starknet-strange/madara#737

Closed

crystalin mentioned this issue Jul 19, 2023

Fixes slow import (to remove at 0.9.45) moonbeam-foundation/substrate#63

Open

Agusrodri mentioned this issue Aug 15, 2023

Fixes slow import moonbeam-foundation/moonbeam#2438

Merged

altonen mentioned this issue Aug 23, 2023

Node being stucked paritytech/cumulus#3052

Open

the-right-joyce transferred this issue from paritytech/substrate Aug 24, 2023

the-right-joyce added I10-unconfirmed Issue might be valid, but it's not yet known. and removed J2-unconfirmed labels Aug 25, 2023

skunert mentioned this issue Aug 25, 2023

Node being stucked #1157

Closed

Slow block import #13

Slow block import #13

Comments

purestakeoskar commented Apr 18, 2023

Is there an existing issue?

Experiencing problems? Have you tried our Stack Exchange first?

Description of bug

Steps to reproduce

altonen commented Apr 19, 2023

purestakeoskar commented Apr 19, 2023

altonen commented Apr 20, 2023

purestakeoskar commented Apr 20, 2023

bkchr commented Apr 20, 2023

altonen commented Apr 26, 2023

bLd75 commented Apr 27, 2023 • edited Loading

purestakeoskar commented Apr 27, 2023

skunert commented Apr 28, 2023

purestakeoskar commented Apr 28, 2023 • edited Loading

jasl commented Apr 30, 2023 • edited Loading

jasl commented Apr 30, 2023

arkpar commented May 2, 2023

jasl commented May 3, 2023 • edited Loading

arkpar commented May 3, 2023

bkchr commented May 3, 2023

skunert commented May 3, 2023 • edited Loading

arkpar commented May 3, 2023

bkchr commented May 3, 2023

skunert commented May 3, 2023

arkpar commented May 3, 2023

skunert commented May 3, 2023

skunert commented May 5, 2023

jasl commented May 14, 2023 • edited Loading

jasl commented Jun 1, 2023 • edited Loading

arkpar commented Jun 1, 2023

skunert commented Jun 1, 2023

jasl commented Jun 1, 2023

skunert commented Jun 1, 2023

jasl commented Jun 1, 2023 • edited Loading

bkchr commented Jun 1, 2023

jasl commented Jun 2, 2023

skunert commented Jun 2, 2023

jasl commented Jun 3, 2023

bkchr commented Jun 4, 2023

jasl commented Jun 4, 2023

skunert commented Jun 5, 2023

jasl commented Jun 5, 2023 • edited Loading

bLd75 commented Jun 14, 2023 • edited Loading

skunert commented Jun 19, 2023

crystalin commented Jul 19, 2023

crystalin commented Jul 19, 2023 • edited Loading

jasl commented Jul 19, 2023

crystalin commented Jul 19, 2023

lexnv commented Jul 9, 2024

bLd75 commented Apr 27, 2023 •

edited

Loading

purestakeoskar commented Apr 28, 2023 •

edited

Loading

jasl commented Apr 30, 2023 •

edited

Loading

jasl commented May 3, 2023 •

edited

Loading

skunert commented May 3, 2023 •

edited

Loading

jasl commented May 14, 2023 •

edited

Loading

jasl commented Jun 1, 2023 •

edited

Loading

jasl commented Jun 1, 2023 •

edited

Loading

jasl commented Jun 5, 2023 •

edited

Loading

bLd75 commented Jun 14, 2023 •

edited

Loading

crystalin commented Jul 19, 2023 •

edited

Loading