Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified block storage #53

Merged
merged 40 commits into from
Jan 9, 2024
Merged

Simplified block storage #53

merged 40 commits into from
Jan 9, 2024

Conversation

pompon0
Copy link
Collaborator

@pompon0 pompon0 commented Dec 19, 2023

  • simplified the storage traits, by moving the inmem caching implementation to era-consensus
  • extracted PayloadManager trait responsible for proposing/verifying payloads.
  • clarified the expected persistence guarantees of the PersistentBlockStore trait
  • simplified the block caching to maintain a continuous range of blocks
  • simplified the sync_blocks implementation as a result of previous simplifications
  • moved rocksdb storage implementation to tools crate, given that the executor binary is there anyway.
  • Removed genesis block from config to simplify configuration
  • Removed redundant header from FinalBlock.
  • Made bft actor wait for past blocks to be stored before calling propose/verify to more accurately reflect the expected implementation of a stateful blockchain (and to simplify implementation of the PayloadManager trait)

@pompon0 pompon0 marked this pull request as ready for review January 3, 2024 09:27
@pompon0 pompon0 changed the title Simplified configs and FinalBlock Simplified block storage Jan 3, 2024
@pompon0 pompon0 requested a review from moshababo January 3, 2024 12:10
node/actors/bft/src/lib.rs Outdated Show resolved Hide resolved
node/actors/executor/src/testonly.rs Show resolved Hide resolved
node/actors/sync_blocks/src/peers/mod.rs Outdated Show resolved Hide resolved
node/actors/sync_blocks/src/peers/mod.rs Outdated Show resolved Hide resolved
node/actors/sync_blocks/src/peers/mod.rs Outdated Show resolved Hide resolved
Comment on lines 165 to 167
/// Since cache is a continuous range of blocks, `queue_block()`
/// synchronously waits until all intermediate blocks are added
/// to cache AND cache is not full.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, these constraints make it difficult to reason about store usage correctness; e.g., it's moderately easy to deadlock it by queuing blocks out of order. A somewhat artificial example: in SyncBlocks actor, if the number of concurrently downloaded blocks is more than cache capacity and the corresponding permits were (incorrectly) dropped too late, after queuing the received block.

Copy link
Collaborator Author

@pompon0 pompon0 Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify your example? As I understand it:

  • N = download permits in SyncBlocks actor
  • M = max cache size
  • N > M
  • first N blocks are being downloaded in parallel
  • (what happens here?)

I can remove the block cache size by replacing queue_block with store_block (i.e. make fetching task wait until the block is stored persistently), which actually simplify the api.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

node/libs/storage/src/block_store.rs Outdated Show resolved Hide resolved

/// In-memory block store.
#[derive(Debug, Default)]
pub struct BlockStore(Mutex<BTreeMap<validator::BlockNumber, validator::FinalBlock>>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Ditto as for crate::BlockStore; using a VecDeque could help clarifying an invariant that only continuous ranges of blocks can be stored.
  • Bikeshedding: Subjectively, having multiple structs named BlockStore in the same crate is slightly confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • done
  • ack, that's why we don't use fully qualified imports for these. We would need to prefix the struct names with the module name, which would cause unnecessary stuttering

node/libs/storage/src/block_store.rs Outdated Show resolved Hide resolved
node/libs/storage/src/block_store.rs Outdated Show resolved Hide resolved
@pompon0 pompon0 requested a review from slowli January 4, 2024 11:29
Copy link
Member

@brunoffranca brunoffranca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

node/actors/bft/src/replica/state_machine.rs Outdated Show resolved Hide resolved
Co-authored-by: Bruno França <bruno@franca.xyz>
node/actors/sync_blocks/src/peers/mod.rs Show resolved Hide resolved
node/actors/sync_blocks/src/peers/tests/multiple_peers.rs Outdated Show resolved Hide resolved
node/actors/sync_blocks/src/peers/tests/multiple_peers.rs Outdated Show resolved Hide resolved
Comment on lines 10 to 13
/// Stored block with the lowest number.
pub first: validator::CommitQC,
/// Stored block with the highest number.
pub last: validator::CommitQC,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean using 2 terms to describe the same entity. It's a good rule to have a ubiquitous dictionary of terms and apply them uniformly (so that each concept has a single term); makes code easier to maintain.

node/libs/storage/src/block_store.rs Outdated Show resolved Hide resolved
node/libs/storage/src/block_store.rs Outdated Show resolved Hide resolved
node/libs/storage/src/block_store.rs Outdated Show resolved Hide resolved
struct Inner {
queued: sync::watch::Sender<BlockStoreState>,
persisted: BlockStoreState,
queue: VecDeque<validator::FinalBlock>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is queue capacity unconstrained now? Would we want to constraint it in the future, or was constrained capacity in the previous iteration a temporary decision anyway?
  • Could you add a metric for the queue length (to mimic removed GossipFetcherMetrics in the core repo)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per #53 (comment) I've moved the responsibility of keeping cache bounded to the user, so that both cache size limit and inflight request limit are enforced in the same place.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added metrics

@pompon0 pompon0 requested a review from slowli January 8, 2024 12:30
node/tools/src/store.rs Outdated Show resolved Hide resolved
node/libs/storage/src/traits.rs Outdated Show resolved Hide resolved
node/actors/sync_blocks/src/tests/end_to_end.rs Outdated Show resolved Hide resolved
node/libs/storage/src/testonly/in_memory.rs Outdated Show resolved Hide resolved
async fn state(&self, ctx: &ctx::Ctx) -> ctx::Result<BlockStoreState>;

/// Gets a block by its number.
/// Returns error if block is missing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: Is this expected that the caller will know the state() of the store beforehand, so that it won't call block() for (probably) missing block numbers? In the general case, it could make sense to distinguish between a missing block and other errors (i.e., return ctx::Result<Option<FinalBlock>>).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, currently PersistentBlockStore is expected to never lose blocks and consensus code only requests blocks contained in the state().

node/libs/storage/src/block_store/mod.rs Outdated Show resolved Hide resolved
node/libs/storage/src/testonly/in_memory.rs Outdated Show resolved Hide resolved
@pompon0 pompon0 merged commit 5727a3e into main Jan 9, 2024
3 of 4 checks passed
@pompon0 pompon0 deleted the gprusak-tools-config branch January 9, 2024 10:06
github-merge-queue bot pushed a commit to matter-labs/zksync-era that referenced this pull request Jan 13, 2024
* Simplified configs to not require genesis block.
* Removed consensus fields from jsonrpc sync api, to prevent having gaps
in miniblock quorum certificates.
* Moved consensus fields to a separate table to make first/last entry
lookup fast and reliable.
* Migrated consensus integration to new era-consensus API:
matter-labs/era-consensus#53
* Deduplicated code between EN consensus integration and main node
consensus integration
* Implemented support for cloning a db in tests
github-merge-queue bot pushed a commit to matter-labs/zksync-era that referenced this pull request Jan 13, 2024
* Simplified configs to not require genesis block.
* Removed consensus fields from jsonrpc sync api, to prevent having gaps
in miniblock quorum certificates.
* Moved consensus fields to a separate table to make first/last entry
lookup fast and reliable.
* Migrated consensus integration to new era-consensus API:
matter-labs/era-consensus#53
* Deduplicated code between EN consensus integration and main node
consensus integration
* Implemented support for cloning a db in tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants