You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Doesn't affect the node's ability to follow the chain or validate (low impact).
Doesn't affect newly synced nodes.
Only occurs for a limited time before it self-heals, which is how it was missed during our release testing.
Description
My PR #4663 which fixed a bug #4610 inadvertently introduced a new database bug, which thankfully is capable of self-healing.
That PR changed the invariant for block roots in the freezer database from:
slot < last_restore_point_slot -> block root is in the freezer
to:
slot < split.slot -> block root is in the freezer
Although #4663 took care to maintain the new invariant once it was established, it failed to establish it in the period immediately after upgrading. This means that until a new restore-point is stored in the database (on a slot with slot % slots-per-restore-point == 0), the linear block roots array will have a gap in it between last_restore_point_slot and split.slot.
Steps to resolve
We could do nothing, and wait for the issue to resolve itself once everyone has upgraded and waited for 27h. However, this is not really ideal, and I suspect many users will be scared off upgrading by the error log.
To fix it properly, we could make a patch release to immediately establish the new invariant on updating. This could be implemented as so:
Add a DB schema migration from v17 to itself.
Load the split state, and use a ChunkWriter to fill-in the slots between last_restore_point_slot (if any) and split.slot.
The only downside to this is that it is potentially a bit wasteful to re-do this every time the node starts up. Pragmatically, it probably wouldn't take very long (a few seconds max) and would stop recurring once we update the DB schema to v18 for Deneb (see #4693).
The text was updated successfully, but these errors were encountered:
## Issue Addressed
Fixes#4697.
This also unblocks the state pruning PR (#4835). Because self healing breaks if state pruning is applied to a database with missing block roots.
## Proposed Changes
- Fill in the missing block roots between last restore point slot and split slot when upgrading to latest database version.
Summary
Description
My PR #4663 which fixed a bug #4610 inadvertently introduced a new database bug, which thankfully is capable of self-healing.
That PR changed the invariant for block roots in the freezer database from:
to:
Although #4663 took care to maintain the new invariant once it was established, it failed to establish it in the period immediately after upgrading. This means that until a new restore-point is stored in the database (on a slot with
slot % slots-per-restore-point == 0
), the linear block roots array will have a gap in it betweenlast_restore_point_slot
andsplit.slot
.Steps to resolve
We could do nothing, and wait for the issue to resolve itself once everyone has upgraded and waited for 27h. However, this is not really ideal, and I suspect many users will be scared off upgrading by the error log.
To fix it properly, we could make a patch release to immediately establish the new invariant on updating. This could be implemented as so:
ChunkWriter
to fill-in the slots betweenlast_restore_point_slot
(if any) andsplit.slot
.The only downside to this is that it is potentially a bit wasteful to re-do this every time the node starts up. Pragmatically, it probably wouldn't take very long (a few seconds max) and would stop recurring once we update the DB schema to v18 for Deneb (see #4693).
The text was updated successfully, but these errors were encountered: