Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERRO Missing chunk in forwards iterator (block roots) #4697

Closed
michaelsproul opened this issue Sep 4, 2023 · 2 comments
Closed

ERRO Missing chunk in forwards iterator (block roots) #4697

michaelsproul opened this issue Sep 4, 2023 · 2 comments
Labels
bug Something isn't working database v4.6.0 ETA Q1 2024

Comments

@michaelsproul
Copy link
Member

michaelsproul commented Sep 4, 2023

Summary

  • Causes HTTP requests for states & blocks to fail.
  • Causes responses to peers for blocks to fail.
  • Doesn't affect the node's ability to follow the chain or validate (low impact).
  • Doesn't affect newly synced nodes.
  • Only occurs for a limited time before it self-heals, which is how it was missed during our release testing.

Description

My PR #4663 which fixed a bug #4610 inadvertently introduced a new database bug, which thankfully is capable of self-healing.

That PR changed the invariant for block roots in the freezer database from:

slot < last_restore_point_slot -> block root is in the freezer

to:

slot < split.slot -> block root is in the freezer

Although #4663 took care to maintain the new invariant once it was established, it failed to establish it in the period immediately after upgrading. This means that until a new restore-point is stored in the database (on a slot with slot % slots-per-restore-point == 0), the linear block roots array will have a gap in it between last_restore_point_slot and split.slot.

Steps to resolve

We could do nothing, and wait for the issue to resolve itself once everyone has upgraded and waited for 27h. However, this is not really ideal, and I suspect many users will be scared off upgrading by the error log.

To fix it properly, we could make a patch release to immediately establish the new invariant on updating. This could be implemented as so:

  • Add a DB schema migration from v17 to itself.
  • Load the split state, and use a ChunkWriter to fill-in the slots between last_restore_point_slot (if any) and split.slot.

The only downside to this is that it is potentially a bit wasteful to re-do this every time the node starts up. Pragmatically, it probably wouldn't take very long (a few seconds max) and would stop recurring once we update the DB schema to v18 for Deneb (see #4693).

@jimmygchen
Copy link
Member

  • Run "fix beacon roots" during v18 db migration
  • Add "fix beacon roots" command lighthouse db fix-block-roots

bors bot pushed a commit that referenced this issue Oct 25, 2023
## Issue Addressed

Fixes #4697. 

This also unblocks the state pruning PR (#4835). Because self healing breaks if state pruning is applied to a database with missing block roots.

## Proposed Changes

- Fill in the missing block roots between last restore point slot and split slot when upgrading to latest database version.
@jimmygchen
Copy link
Member

Fixed in #4875

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working database v4.6.0 ETA Q1 2024
Projects
None yet
Development

No branches or pull requests

2 participants