Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polygon/sync: block downloader to not re-insert blocks behind start #11929

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

taratorio
Copy link
Member

Run into an issue since we started pruning total difficulty.

EROR[09-09|10:58:03.057] [2/6 PolygonSync] stopping node          err="parent's total difficulty not found with hash 9334099de5d77c0d56afefde9985d44f8b4416db99dfe926908d5501fa8dbd9e and height 11736178: <nil>

It happened for checkpoint 9703. Our start block was in the middle of the checkpoint range which meant we have to fetch all the 8k blocks in this checkpoint to verify the checkpoint root hash when receiving blocks from the peer.

The current logic will attempt to insert all these 8k blocks and it will fail with a missing parent td error because we only keep the last 1000 parent td records.

This PR fixes this by enhancing the block downloader to not re-insert blocks behind the start block. This solves the parent td error and also is saving some unnecessary inserts on the first waypoint processing on startup.

@taratorio taratorio enabled auto-merge (squash) September 9, 2024 13:49
@taratorio taratorio merged commit 6d15c01 into main Sep 9, 2024
10 checks passed
@taratorio taratorio deleted the fix-astrid-parent-td-err-on-sync-to-tip branch September 9, 2024 14:10
taratorio added a commit that referenced this pull request Oct 8, 2024
taratorio added a commit that referenced this pull request Oct 9, 2024
This PR captures several todos on my list of improving things related to
Milestone hash mismatches that I noticed while testing Astrid on the
tip:
- after restarts, when we initialise the canonical chain builder, we
need to set the root to the last finalised waypoint header and connect
the latest known tip to it. Previously, we set the root to the latest
known tip which may not be the finalised one. This caused unnecessary
unwinds on startup (with potentially incorrect unwind num for the
bridge.Unwind) and although we auto recover from it it is still better
to fix it properly as it may cause unexpected behaviour further down the
line
- harden the milestone verifier to check for correct header connectivity
via ParentHash and Number and also that the number of headers match with
the number of headers in the Milestone because the Milestone is simply
just the last header hash so a malicious peer may trick us easily
- revert #11929 which, after
taking a deeper look, seems to be an inaccurate interpretation of the
error (prune of TD happens only for the blocks T-150_000). I believe
this error may have been caused by something else, which may have been
fixed by now (after numerous fixes to Astrid related to the parent td
errors) - but if not it will be better to troubleshoot more thoroughly
if the error ever arises again. In addition, this change is dangerous as
it means we may falsely think we have all blocks for a given waypoint
inserted in the DB, however that may not be true on restarts in the
middle of waypoints, because the blocks inserted before the start may
not be the right ones that are part of the waypoint (due to the tip not
yet being finalised)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants