Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain Download Halt #6884

Closed
siladu opened this issue Apr 5, 2024 · 1 comment · Fixed by #7162
Closed

Chain Download Halt #6884

siladu opened this issue Apr 5, 2024 · 1 comment · Fixed by #7162
Assignees
Labels
bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc))

Comments

@siladu
Copy link
Contributor

siladu commented Apr 5, 2024

Seen on 24.3.0 and 24.4.0-RC2

On fresh SNAP/CHECKPOINT sync on a variety of networks, block import sometimes halts and requires a restart, despite having a full set of peers and world state downloading finishing.

Example:

{"@timestamp":"2024-04-04T05:48:35,877","level":"INFO","thread":"EthScheduler-Services-37 (importBlock)","class":"ImportBlocksStep","message":"Block import progress: 4030400 of 5625239 (71%)","throwable":""}
{"@timestamp":"2024-04-04T05:48:39,175","level":"INFO","thread":"EthScheduler-Services-26 (batchPersistTrieNodeData)","class":"SnapSyncMetricsManager","message":"Healed 221585 world state trie nodes, Peer count: 6","throwable":""}
{"@timestamp":"2024-04-04T05:49:42,821","level":"INFO","thread":"EthScheduler-Services-36 (requestCompleteTask)","class":"SnapWorldDownloadState","message":"Pausing world state download while waiting for sync to complete","throwable":""}

// and then nothing until restart
@siladu siladu changed the title Block Import Halt Chain Download Halt Apr 5, 2024
@siladu
Copy link
Contributor Author

siladu commented Apr 8, 2024

Tested this logging on a Sepolia node and it hung at 99% (note in this case, it looks like worldstate didn't finish)...

It gets stuck in a loop where it repeatedly prints this log for the same peers every few seconds.

...
// last world state log
2024-04-06 02:52:10.313+00:00 | EthScheduler-Services-3 (batchPersistAccountData) | INFO  | SnapSyncMetricsManager | Worldstate download progress: 49.64%, Peer count: 7
2024-04-06 02:52:41.596+00:00 | EthScheduler-Services-43 (importBlock) | INFO  | ImportBlocksStep | Block import progress: 3779220 of 5638048 (67%)
...
2024-04-06 07:18:36.991+00:00 | EthScheduler-Services-43 (importBlock) | INFO  | ImportBlocksStep | Block import progress: 5633620 of 5638048 (99%)
2024-04-06 07:18:38.691+00:00 | EthScheduler-Services-38 (fetchCheckpoints) | DEBUG | FastSyncDownloadPipelineFactory | Stopping chain download as lastRoundHeader=5638329 is not less than pivotBlockHeader=5638329 for peer PeerId: 0x8b61dc2d06c3f96f... PeerReputation score: 150, timeouts: {}, useless: 1, validated? true, disconnected? false, client: Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.7, [Connection with hashCode 81600091 inboundInitiated? false initAt 1712368196470], enode://8b61dc2d06c3f96fddcbebb0efb29d60d3598650275dc469c22229d3e5620369b0d3dedafd929835fe7f489618f19f456fe7c0df572bf2d914a9f4e006f783a9@170.64.250.88:30303
2024-04-06 07:19:10.446+00:00 | EthScheduler-Services-42 (fetchCheckpoints) | DEBUG | FastSyncDownloadPipelineFactory | Stopping chain download as lastRoundHeader=5638329 is not less than pivotBlockHeader=5638329 for peer PeerId: 0x525ca6453105fc8b... PeerReputation score: 150, timeouts: {}, useless: 0, validated? true, disconnected? false, client: Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.6, [Connection with hashCode -1783042358 inboundInitiated? false initAt 1712385475223], enode://525ca6453105fc8b7364de85e6494a5db288fade51aac5620fb94ec17c9e771f4daef3b1aadad8a954b620484d25d2b6e37fde2787a84034f361d69b491fe10f@61.173.39.51:17004
2024-04-06 07:19:15.623+00:00 | EthScheduler-Services-38 (fetchCheckpoints) | DEBUG | FastSyncDownloadPipelineFactory | Stopping chain download as lastRoundHeader=5638329 is not less than pivotBlockHeader=5638329 for peer PeerId: 0x525ca6453105fc8b... PeerReputation score: 150, timeouts: {}, useless: 0, validated? true, disconnected? false, client: Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.6, [Connection with hashCode -1783042358 inboundInitiated? false initAt 1712385475223], enode://525ca6453105fc8b7364de85e6494a5db288fade51aac5620fb94ec17c9e771f4daef3b1aadad8a954b620484d25d2b6e37fde2787a84034f361d69b491fe10f@61.173.39.51:17004
2024-04-06 07:19:21.356+00:00 | EthScheduler-Services-41 (fetchCheckpoints) | DEBUG | FastSyncDownloadPipelineFactory | Stopping chain download as lastRoundHeader=5638329 is not less than pivotBlockHeader=5638329 for peer PeerId: 0x525ca6453105fc8b... PeerReputation score: 150, timeouts: {}, useless: 0, validated? true, disconnected? false, client: Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.6, [Connection with hashCode -1783042358 inboundInitiated? false initAt 1712385475223], enode://525ca6453105fc8b7364de85e6494a5db288fade51aac5620fb94ec17c9e771f4daef3b1aadad8a954b620484d25d2b6e37fde2787a84034f361d69b491fe10f@61.173.39.51:17004
simon.dudley@dev-elc-besu-teku-sepolia-simon-6885-283fe42:~$ cat besu-99-halt.log  | grep "Stopping chain download" | sed 's?.*\(enode://.*@.*:\).*?\1?' | sort | uniq -c | sort
      1 enode://8b61dc2d06c3f96fddcbebb0efb29d60d3598650275dc469c22229d3e5620369b0d3dedafd929835fe7f489618f19f456fe7c0df572bf2d914a9f4e006f783a9@170.64.250.88:
      2 enode://b3dc4688b686d5cef81f1b3b16b098ffbd9ace6a5d5bd5f1fbd12b2635cae637ca220a2f18f00f156982b038a5324e45d3b627df7e6416f30e5f9689ebdc47e9@79.100.217.83:
     77 enode://525ca6453105fc8b7364de85e6494a5db288fade51aac5620fb94ec17c9e771f4daef3b1aadad8a954b620484d25d2b6e37fde2787a84034f361d69b491fe10f@61.173.39.51:
     88 enode://dd44d7ba187ac7f6fc322bc3c31ff0b071b878100f7c655cc34bb238a3514c347baed37df75f9603805613e87b9b8fd7dfc078dc0185666f8cde466e1f7675a9@125.118.60.99:
    227 enode://eed8e8a489504305d46fa6fa4614dc81447220f9f9e0b014f66693fff1af692b0f995b011fddfb97c3103965b57e69bb9fc0f90ddcb68ae2806f2b15005cc5a5@45.250.253.66:
    404 enode://cd4ba553c37c7437a0a10875a1ee8bc59ae09f087551a7bffa26e5478872dc59ae1553fe315d7aaa688ae63e9079052c9a779ff1de15866b5ee51c2f721308cb@147.28.163.15:
    433 enode://a5fcbb3b98c5f0a6b03774bcae9c935807fa010858f93049106e9fe4c54e41a04cdb7cadb643e18b90131bd381b996e9c70005f25ca03bbb7f440fed59b02457@87.246.108.93:
    635 enode://f4abc41b943949465b1dac6f58496a6b5b891b5232b0c8e88e761214fe3e14751b7fb6937d53c49aad59e4409e5c3276bd8927469a539184431c6c3d92b570a1@65.109.68.53:
   1403 enode://3eab5b562114111bfc9bdc3e58b057309dc8730e47eefa318be98d82b7080edc789ef21e83ad9d58aea250e915d8905cc2634d9d2b96cf4f08fc558bcbc0f3a9@63.251.87.45:
   1407 enode://4276635a44d9cf13b54ad85b83435941493629ce7c090ba775a16bfa6d4dbf699337c27fac10f3554c0c026bb88c511b45045a6ab4f10654abc8029c81184512@172.233.225.4:
   1581 enode://06c4ac72f097b91c8f3fcb038825cb6cbe1fc7c7dfe9584bf35e44521352c628a8cf0d702ee00acb1f599361fe0c0758aa253dfc5b449ff61864e145ecb4e8ab@34.82.82.140:
   4889 enode://39d1ec7b6aa5e3647a222b0cadb8857254d6f40d731aeb696a757f614e6a94ed7125ba6cdc40f2d2ea561d5b2aec9125457206286fe9d51424c714d1d6faeb6f@54.80.216.115:
   9215 enode://a302b62e0cb25d51292cd61ed7d8496435b9bc139a7223939096912564e7803adb2e778445b10a7f15f7cae57b619c3c10ca1306c342c710a75930372482ba39@5.59.232.59:

@siladu siladu added bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc)) labels Apr 8, 2024
@macfarla macfarla added P1 Very High (ex: Security issues, unplanned backward incompatibility preventing network participation) and removed P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc)) labels Apr 8, 2024
@macfarla macfarla assigned jframe and unassigned jframe Apr 8, 2024
@macfarla macfarla added P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc)) and removed P1 Very High (ex: Security issues, unplanned backward incompatibility preventing network participation) labels Apr 8, 2024
@macfarla macfarla assigned jframe and pinges and unassigned jframe Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc))
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants