Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(sync/chain_processor): Failed to handle justification: start node does not exist #3150

Closed
EclesioMeloJunior opened this issue Mar 9, 2023 · 3 comments · Fixed by #3167
Assignees

Comments

@EclesioMeloJunior
Copy link
Member

Describe the bug

  • While syncing on Westend locally, around 200k blocks synced I saw the following error:
block data processing for block with hash 0x54b1de84953457f10d821773c88d93320f4288f51446898b5ca846e2024c73a6 failed: handling justification: verifying block number 198940 justification: setting finalised hash: failed to set finalised subchain in db on finalisation: start node does not exist: 0x54b1de84953457f10d821773c88d93320f4288f51446898b5ca846e2024c73a6	chain_processor.go:L96	pkg=sync

and then it is not possible to see finalization logs like:

🔨 finalised block number 198785 with hash 0x62caf6a8c99d63744f7093bceead8fdf4c7d8ef74f16163ed58b1c1aec67bf18	chain_processor.go:L297	pkg=sync
  • After a quick investigation I notice the error originates in the function VerifyBlockJustification where, among other things, is responsible for calling blockState.SetFinalisedHash(...) after all the checks are done.

    • The method SetFinalisedHash receives the hash to finalize, the round and the setID. Then it gets the highest finalized hash at the moment and tries to retrieve the subchain between the highest finalized hash and the hash we passed as the argument, and here the error happens since the function RangeInMemory(highestHash, hashToFinalize) returns the error: start node does not exists, which means the highestHash was not found in the in-memory block tree.
  • Here are my thoughts of the root cause:

    • Is known that the root node of the in-memory block tree is a finalized block, so this principle is broken or...
    • While getting the highest finalized block hash the information is being updated, since we call GetHighestFinalisedHash and it goes to the database to retrieve the highest round and set id known and with them retrieve the hash from database.
@EclesioMeloJunior
Copy link
Member Author

EclesioMeloJunior commented Mar 9, 2023

And when this problem happens I notice an increase in the number of tries in the memory as well as the increase of the heap in use bytes

resource usage

tries in memory

@EclesioMeloJunior EclesioMeloJunior self-assigned this Mar 10, 2023
@EclesioMeloJunior
Copy link
Member Author

Actually what is happening is that at block height 198k we are not saving the round and set id for the latest finalized block but we still mark it as a finalized one. As you can check in the log lines above:

image

  1. At line 3395 you can see the problem we have different informations for the latest finalised block. The first hash 0x62ca... is the right finalized hash (check line 3392), but the function GetHighestFinalisedHash is returning the hash 0x41ee... which was finalized at line 3389
last finalised: 0x62caf6a8c99d63744f7093bceead8fdf4c7d8ef74f16163ed58b1c1aec67bf18 |
higest finalised: 0x41ee565b0924839964b0da1328440983872e8f8c22f798f4d49023cbaef725bb
  1. The problem is that we don't see the node storing the information of round and set id for the 0x62ca... hash which is leading the node to retrieve not the latest finalized hash but the hash related to the last round and set id

@EclesioMeloJunior
Copy link
Member Author

After a quick chat with Andre from Parity said that is likely that all the validators needed to restart at that time reseting the round counter to 0 and resuming the epoch 314 from there which explains why we see a block 198,656 with round 106 and set id 314 and then block 198,785 with round 3 and set 314.

So I removed the verification from the setHighestRoundAndSetID which allowed us to go beyond 200k blocks synced locally, I've deployed this version to staging so I will see how this change behaves in the long run.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants