eds: Blockstore provided by the Store should support Put #2424

Wondertan · 2023-07-03T17:49:37Z

Context

During the design and implementation of #1099, we missed the necessity of caching the inner and leaf nodes during reconstruction via the Blockservice, leading to Bitswap having no access to them. This is due to all the Puts being discarded on the Blockstore.

Problems

FNs do not share proofs and data during reconstruction between each other.
FNs cannot generate BEFPs for withheld blocks.

Solution

The proposed solution here is changing Blockstore to support Put operations:

Blockstore over on-disk Datastore
- Dagstore's Blockstore would write to disk and, on reads, check the index first and then the Datastore.
- The default badger Datastore scales well for write-heavy loads, and considering sparse read patterns for reconstruction and BEFP, it should also suit read requirements.
- The only complication is clean-up, which can be naively solved by iterating over each stored piece with the DeleteBlock method on Blockstore at the end of the reconstruction session(or even after we put an indexed eds to avoid interruption of serving data to other nodes over bitswap)
Blockstore over in-mem Datastore
- Similar cleanup complication.
- Similar implementation time estimate as for on-disk
- Less robust. Does not tolerates restarts.

Testing

We have a (flaky) Swamp reconstruction test that asserts that a FN can reconstruct a block from light nodes only, but we also need to add a test similar to the test below, which validates that FNs can collectively rebuild the block.

celestia-node/share/availability/full/reconstruction_test.go

Line 162 in 4d98694

func TestShareAvailable_DisconnectedFullNodes(t *testing.T) {

Once we know that this works fine in Swamp, we should also replicate the same scenario on TG.

musalbas · 2023-07-03T19:01:13Z

I assume that Put would only be called if shrexeds fails to get the full ODS, thus falling back to sampling anyway, right?

Wondertan · 2023-07-04T11:53:31Z

@musalbas, right. The sampling/reconstruction is started once we cannot get ODS after a while.

Closes #2424 Alternative Design Discussions: - Using in-memory instead of in on disk: During an unexpected shutdown or in the case of AsyncGetter, cleanup would not occur and blocks would be stuck in the store indefinitely. Using an in-memory blockstore would fix this issue upon restarts - Using a local Blockgetter to pass to NewErrByzantine: Instead of putting the blocks into the EDS blockstore, the retrieval session could make an in-memory blockstore on the fly to hand to NewErrByzantine when needed. This would look cleaner in the code/make sense architecturally, but it would mean full nodes cannot share these shares with each other during reconstruction.

Closes celestiaorg#2424 Alternative Design Discussions: - Using in-memory instead of in on disk: During an unexpected shutdown or in the case of AsyncGetter, cleanup would not occur and blocks would be stuck in the store indefinitely. Using an in-memory blockstore would fix this issue upon restarts - Using a local Blockgetter to pass to NewErrByzantine: Instead of putting the blocks into the EDS blockstore, the retrieval session could make an in-memory blockstore on the fly to hand to NewErrByzantine when needed. This would look cleaner in the code/make sense architecturally, but it would mean full nodes cannot share these shares with each other during reconstruction. (cherry picked from commit 196e849)

Closes #2424 Alternative Design Discussions: - Using in-memory instead of in on disk: During an unexpected shutdown or in the case of AsyncGetter, cleanup would not occur and blocks would be stuck in the store indefinitely. Using an in-memory blockstore would fix this issue upon restarts - Using a local Blockgetter to pass to NewErrByzantine: Instead of putting the blocks into the EDS blockstore, the retrieval session could make an in-memory blockstore on the fly to hand to NewErrByzantine when needed. This would look cleaner in the code/make sense architecturally, but it would mean full nodes cannot share these shares with each other during reconstruction. (cherry picked from commit 196e849)

Wondertan self-assigned this Jul 3, 2023

github-actions bot added the needs:triage label Jul 3, 2023

Wondertan mentioned this issue Jul 3, 2023

[EPIC] BlockSync Overhaul - Storage #1099

Closed

45 tasks

musalbas mentioned this issue Jul 25, 2023

[EPIC]: BEFP hardening #2462

Closed

3 tasks

renaynay assigned distractedm1nd Jul 31, 2023

distractedm1nd mentioned this issue Aug 2, 2023

fix(eds/blockstore): Puts on EDSStore Blockstore #2532

Merged

Wondertan mentioned this issue Aug 15, 2023

share/eds: Return errUnsupportedOperation for Put/PutMany in the Blockstore once we update Bitswap #1440

Closed

distractedm1nd closed this as completed in #2532 Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eds: Blockstore provided by the Store should support Put #2424

eds: Blockstore provided by the Store should support Put #2424

Wondertan commented Jul 3, 2023 •

edited

Loading

musalbas commented Jul 3, 2023 •

edited

Loading

Wondertan commented Jul 4, 2023

eds: Blockstore provided by the Store should support Put #2424

eds: Blockstore provided by the Store should support Put #2424

Comments

Wondertan commented Jul 3, 2023 • edited Loading

Context

Problems

Solution

Testing

musalbas commented Jul 3, 2023 • edited Loading

Wondertan commented Jul 4, 2023

Wondertan commented Jul 3, 2023 •

edited

Loading

musalbas commented Jul 3, 2023 •

edited

Loading