Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: fix save blob sidecar using wrong prefix #12849

Merged
merged 4 commits into from
Sep 6, 2023

Conversation

terencechain
Copy link
Member

@terencechain terencechain commented Sep 5, 2023

Background:

Bolt DB struggles with pruning, which has been evident during the development of the slasher. With the introduction of EIP-4844 in Prysm, there will be a significant amount of blob data. In the last design, we implemented a rotating buffer database schema for blobs. This schema will overwrite old data once the expiration period for blobs is reached, eliminating the need for pruning or deletion routines.

How It Works:

  • Store a maximum of N epochs worth of blobs.
  • Once epoch N is reached, overwrite the contents of epoch 0 using a modulo operation.
  • Given the possibility of multiple blob sidecars per slot, a straightforward overwriting algorithm won't suffice.
    Saving blobs involves:
  • Convert the slot using a modulo operator to a range of [0, maxSlots], where maxSlots = MAX_BLOB_EPOCHS * SLOTS_PER_EPOCH
  • Compute the key for the blob using the formula: bytes(slot_to_rotating_buffer(blob.slot)) + bytes(blob.slot) + blob.block_root.
  • Overwrite all elements if the incoming blob has a slot more significant than the last saved slot in the rotating keys buffer.
    This PR ensures efficient data storage and retrieval without frequent pruning, thus keeping the database size bounded.

Bug fix:

When checking for a pre-existing blob sidecar to override (ex: if blob sidecar slot is > 4096 epoch),
it is supposed to use bytes(blob.slot), which is [8:16].
The bug is it was using: [:8], which is bytes(slot_to_rotating_buffer(blob.slot))

@terencechain terencechain added Bug Something isn't working Deneb PRs or issues for the Deneb upgrade labels Sep 5, 2023
@terencechain terencechain requested a review from a team as a code owner September 5, 2023 02:48
@@ -54,7 +54,7 @@ func (s *Store) SaveBlobSidecar(ctx context.Context, scs []*ethpb.BlobSidecar) e
// If there is no element stored at blob.slot % MAX_SLOTS_TO_PERSIST_BLOBS, then we simply
// store the blob by key and exit early.
if len(replacingKey) != 0 {
slotBytes := replacingKey[:8]
slotBytes := replacingKey[8:16]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a comment that can be added to direct readers on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@james-prysm it's in the godoc for this method.

//  2. Compute key for blob as bytes(slot_to_rotating_buffer(blob.slot)) ++ bytes(blob.slot) ++ blob.block_root

Terence wants bytes(blob.slot) here, which is the second quadword

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@james-prysm what do you think about this? #12855

Would that the PR make it easier to understand?

@rauljordan
Copy link
Contributor

hi @terencechain, @saolyn and I were looking at this PR and felt the description and fix did not have enough context. A few questions we had coming it blind:

  • Why is the structure of a DB key like this? Why is the first part a rotating buffer?
  • What is newRetentionSlot in the unit test?
  • What about other edge cases, such as saving a blob exactly at the modulo 0?

In terms of testing, we think the unit tests can be made a lot simpler. They should be more human-readable. For example, we should override the config to store 10 slots of blobs, and then do something like

maxBlobTimeToLive := types.Slot(10)
for i := 0; i < maxBlobTimeToLive; i++ {
  db.SaveBlob(generateBlobAtSlot(i))
}

// Then check a few things about the stored blobs..

// Then, override the first 3 blobs, check they were overwritten

// Override the whole thing, save at the boundaries

This way, we can better audit for edge-cases

@terencechain
Copy link
Member Author

terencechain commented Sep 5, 2023

@rauljordan @saolyn I'll answer them here

Why is the structure of a DB key like this? Why is the first part a rotating buffer?

The rotating buffer is to handle blobs outside of the retention window ~4096 epochs. It enables a fast lookup without having to unmarshal the blob sidecar and check. It was motivated by your PR here :) terencechain#3
Another option is to eliminate the rotating buffer and use the slot. Open to this, and it can be done in another PR

What is newRetentionSlot in the unit test?

It's a slot that's higher than 4096 * 32. The idea is we want to trigger the bug which only happens higher than the first retention window

What about other edge cases, such as saving a blob exactly at the modulo 0?

Can add this

For example, we should override the config to store 10 slots of blobs, and then do something like

I believe there is already a unit test for this

@prylabs-bulldozer prylabs-bulldozer bot merged commit e3527e1 into develop Sep 6, 2023
@prylabs-bulldozer prylabs-bulldozer bot deleted the fix-blob-db-using-wrong-prefix branch September 6, 2023 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Deneb PRs or issues for the Deneb upgrade
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants