Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncer stuck fix #255

Merged
merged 5 commits into from
May 30, 2024
Merged

Syncer stuck fix #255

merged 5 commits into from
May 30, 2024

Conversation

goran-ethernal
Copy link
Collaborator

@goran-ethernal goran-ethernal commented May 29, 2024

Description

Problem

If for some reason, the node does not get status change events from its peers, a situation may arise where the consensus will stuck, and no blocks will be created.

For example, let's say we have a situation where we have 5 validator nodes, and they are building block 100. 4 nodes (which is quorum) send the commit messages to each other, but for some reason, only 3 of the nodes receive enough commit messages and they insert the block to their state. Because of some network issues, the other two, even though they've sent their commit messages, did not receive enough to insert the block through consensus, so they remain reliable to syncer to insert the new block. But because network was in some weird state, where some messages were not received, the syncer also on the two problematic nodes did not receive status change events from connected peers, so they missed that block through syncer as well.

Solution

The syncer has a field called block timeout (basically block time * 3) used as a way to stop the syncing from some peer if it does not respond in appropriate time. The PR uses the same field to check if we did not receive anything from some peer, and if not, it will manually ping the best peer, without the need to wait for its status change.

We have all the peers in a peer map, and that map holds the last block on each peer, so the algorithm will always choose a peer that is responsive and has the highest block number.

Changes include

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)

Checklist

  • I have assigned this PR to myself
  • I have added at least 1 reviewer
  • I have added the relevant labels
  • I have updated the official documentation
  • I have added sufficient documentation in code

Testing

  • I have tested this code with the official test suite
  • I have tested this code manually

@goran-ethernal goran-ethernal marked this pull request as ready for review May 29, 2024 14:24
@goran-ethernal goran-ethernal merged commit f1f36f9 into develop May 30, 2024
10 checks passed
@goran-ethernal goran-ethernal deleted the syncer-stuck-fix branch May 30, 2024 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix Fix for a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants