Skip to content

Commit

Permalink
resolve comments
Browse files Browse the repository at this point in the history
  • Loading branch information
shreyan-gupta committed Dec 16, 2024
1 parent 097cca5 commit c41d5aa
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions neps/nep-0568.md
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ Given that the strategy for buffered receipts is to assign all buffered receipts
This field is more complex as it includes information from both delayed receipts and buffered receipts. To calculate this field accurately, we need to know the distribution of `receipt_bytes` across both delayed receipts and buffered receipts. The current solution is to store metadata about the total `receipt_bytes` for buffered receipts in the trie. This way, we have the following:

* For the child with the lower index, `receipt_bytes` is the sum of both delayed receipts bytes and buffered receipts bytes, hence `receipt_bytes = parent.receipt_bytes`.
* For the child with the higher index, `receipt_bytes` is just the bytes from delayed receipts, hence `receipt_bytes = parent.receipt_bytes - buffered_receipt_bytes`.
* For the child with the higher index, `receipt_bytes` is just the bytes from delayed receipts, hence `receipt_bytes = parent.receipt_bytes - parent.buffered_receipt_bytes`.

#### `allowed_shard`

Expand Down Expand Up @@ -351,7 +351,7 @@ The solution to this problem is to introduce the concept of a Frozen MemTrie (wi

Along with `FrozenArena`, we also introduce a `HybridArena`, which effectively combines a base `FrozenArena` with a top layer of `STArena` that supports allocating and deallocating new nodes into the MemTrie. Newly allocated nodes can reference nodes in the `FrozenArena`. This Hybrid MemTrie serves as a temporary MemTrie while the flat storage is being constructed in the background.

While Frozen MemTries facilitate instant resharding, they come at the cost of memory consumption. Once a MemTrie is frozen, it continues to consume the same amount of memory as it did at the time of freezing, as it does not support memory deallocation. If a node tracks only one of the child shards, a Frozen MemTrie would continue to use the same amount of memory as the parent trie. Therefore, Hybrid MemTries are only a temporary solution, and we rebuild the MemTrie for the children once the post-processing step for Flat Storage is completed.
While Frozen MemTries facilitate instant resharding, they come at the cost of memory consumption. Once a MemTrie is frozen, it continues to consume the same amount of memory as it did at the time of freezing, as it does not support memory deallocation. If a node tracks only one of the child shards, a Frozen MemTrie would continue to use the same amount of memory as the parent trie. Therefore, Hybrid MemTries are only a temporary solution, and we rebuild the MemTrie for the children after resharding is completed.

Additionally, a node would need to support twice the memory footprint of a single trie. After resharding, there would be two copies of the trie in memory: one from the temporary Hybrid MemTrie used for block production and another from the background MemTrie under construction. Once the background MemTrie is fully constructed and caught up with the latest block, we perform an in-place swap of the Hybrid MemTrie with the new child MemTrie and deallocate the memory from the Hybrid MemTrie.

Expand All @@ -360,7 +360,7 @@ During a resharding event at the epoch boundary, when we need to split the paren
1. **Freeze the Parent MemTrie**: Create a read-only frozen arena representing a snapshot of the state at the time of freezing (after post-processing the last block of the epoch). The parent MemTrie is no longer required in runtime going forward.
2. **Clone the Frozen MemTrie**: Clone the Frozen MemTrie cheaply for both child MemTries to use. This does not clone the parent arena's memory but merely increases the reference count.
3. **Create Hybrid MemTries for Each Child**: Create a new MemTrie with `HybridArena` for each child. The base of the MemTrie is the read-only `FrozenArena`, while all new node allocations occur in a dedicated `STArena` memory pool for each child MemTrie. This temporary MemTrie is used while Flat Storage is being built in the background.
4. **Rebuild MemTrie from Flat Storage**: Once the Flat Storage is constructed in the post-processing step of resharding, we use it to load a new MemTrie and catch up to the latest block.
4. **Rebuild MemTrie**: Once resharding is completed, we use it to load a new MemTrie and catch up to the latest block.
5. **Swap and Clean Up**: After the new child MemTrie has caught up to the latest block, we perform an in-place swap in the client and discard the Hybrid MemTrie.

![Hybrid MemTrie diagram](assets/nep-0568/NEP-HybridMemTrie.png)
Expand Down Expand Up @@ -523,7 +523,7 @@ This implementation ensures efficient and scalable shard state transitions, allo

### State Sync

The state sync algorithm defines a `sync_hash` used in many parts of the implementation. This is always the first block of the current epoch, which the node should be aware of once it has synced headers to the current point in the chain. A node performing state sync first makes a request (currently to centralized storage on GCS, but in the future to other nodes in the network) for a `ShardStateSyncResponseHeader` corresponding to that `sync_hash` and the Shard ID of the shard it's interested in. Among other things, this header includes the last new chunk before `sync_hash` in the shard and a `StateRootNode` with a hash equal to that chunk's `prev_state_root` field. Then the node downloads (again from GCS, but in the future from other nodes) the nodes of the trie with that `StateRootNode` as its root. Afterwards, it applies new chunks in the shard until it's caught up.
The state sync algorithm defines a `sync_hash` used in many parts of the implementation. This is always the first block of the current epoch, which the node should be aware of once it has synced headers to the current point in the chain. A node performing state sync first makes a request for a `ShardStateSyncResponseHeader` corresponding to that `sync_hash` and the Shard ID of the shard it's interested in. Among other things, this header includes the last new chunk before `sync_hash` in the shard and a `StateRootNode` with a hash equal to that chunk's `prev_state_root` field. Then the node downloads the nodes of the trie with that `StateRootNode` as its root. Afterwards, it applies new chunks in the shard until it's caught up.

As described above, the state we download is the state in the shard after applying the second-to-last new chunk before `sync_hash`, which belongs to the previous epoch (since `sync_hash` is the first block of the new epoch). To move the point in the chain of the initial state download to the current epoch, we could either move the `sync_hash` forward or change the state sync protocol (perhaps changing the meaning of the `sync_hash` and the fields of the `ShardStateSyncResponseHeader`, or somehow changing these structures more significantly). The former is an easier first implementation, as it would not require any changes to the state sync protocol other than to the expected `sync_hash`. We would just need to move the `sync_hash` to a point far enough along in the chain so that the `StateRootNode` in the `ShardStateSyncResponseHeader` refers to the state in the current epoch. Currently, we plan on implementing it that way, but we may revisit making more extensive changes to the state sync protocol later.

Expand Down

0 comments on commit c41d5aa

Please sign in to comment.