Prevent Overflow LRU Cache from Exploding #4801

ethDreamer · 2023-10-03T15:09:45Z

Issue Addressed

Prevent Overflow LRU Cache from blowing up due to large state sizes #4757

In a nutshell, the problem with the OverflowLRUCache before this PR is that it caches an AvailabilityPendingExecutedBlock for each entry, which contains the entire BeaconState. If we end up with multiple forks and many unavailable blocks, this cache can have many entries, causing it to consume large amounts of memory.

I've addressed this by creating a small StateLRUCache which will accept the AvailabilityPendingExecutedBlock, move the BeaconState to a very limited LRU cache and return a DietAvailablityPendingExecutedBlock which contains all the same data except the BeaconState has been replaced with a root.

If many unavailable blocks are stored at the same time in the DataAvailabiltyCache, the excess states will be dropped. If those states are needed later they are recovered by loading the parent BeaconState from disk and replaying the block.

michaelsproul

Nice! This is looking great on the whole. I just had a few minor perf tips and style-tweaks to get in before we merge

beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs

beacon_node/beacon_chain/src/data_availability_checker/state_lru_cache.rs

## Issue Addressed While reviewing #4801 I noticed that our use of `take_while` in the block replayer means that if a state root iterator _with gaps_ is provided, some additonal state roots will be dropped unnecessarily. In practice the impact is small, because once there's _one_ state root miss, the whole tree hash cache needs to be built anyway, and subsequent misses are less costly. However this was still a little inefficient, so I figured it's better to fix it. ## Proposed Changes Use [`peeking_take_while`](https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.peeking_take_while) to avoid consuming the next element when checking whether it satisfies the slot predicate. ## Additional Info There's a gist here that shows the basic dynamics in isolation: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=40b623cc0febf9ed51705d476ab140c5. Changing the `peeking_take_while` to a `take_while` causes the assert to fail. Similarly I've added a new test `block_replayer_peeking_state_roots` which fails if the same change is applied inside `get_state_root`.

beacon_node/beacon_chain/src/data_availability_checker/state_lru_cache.rs

beacon_node/beacon_chain/src/data_availability_checker/error.rs

realbigsean · 2023-10-06T16:36:14Z

beacon_node/beacon_chain/src/data_availability_checker/state_lru_cache.rs

+            .apply_blocks(vec![diet_executed_block.block.clone_as_blinded()], None)
+            .map(|block_replayer| block_replayer.into_state())
+            .and_then(|mut state| {
+                state


if we're explicit about building all caches here, is it possible we're doing too much work? for example if we're not verifying signatures, do we need the pubkey cache? and if we're not crossing an epoch boundary do we need the next epoch's committee cache? I think all the caches should be built on demand, so maybe just removing the explicit building would be better. What do you think @michaelsproul

Yeah, I guess in most cases we will need these caches when we process the next block, but in the case of a reorg or a block on a side chain, we may not. So it might be a bit more resilient if we don't build any caches here (less DoS risk).

In the past we've had some issues with caches not getting auto-built when they're required, but I think we're past that now, and hopefully Hydra helps flush out any cases we've missed.

So I did some messing around and it looks like we only need to build the exit cache & the tree hash cache in order to have equality with the original state. Should I build those or build nothing?

I guess in most cases we will need these caches when we process the next block

Oh true. Well I guess the scenario this mechanism is designed for is when we have a bunch of heads. So if we don't get a next block on a head it'd still be wasted work, right? Also building caches on-demand as we get each head's "next block" might be better to spread out this work over a longer period of time.

So I did some messing around and it looks like we only need to build the exit cache & the tree hash cache in order to have equality with the original state. Should I build those or build nothing?

Yea sure, building the ones we know we need makes sense to me

## Issue Addressed While reviewing sigp#4801 I noticed that our use of `take_while` in the block replayer means that if a state root iterator _with gaps_ is provided, some additonal state roots will be dropped unnecessarily. In practice the impact is small, because once there's _one_ state root miss, the whole tree hash cache needs to be built anyway, and subsequent misses are less costly. However this was still a little inefficient, so I figured it's better to fix it. ## Proposed Changes Use [`peeking_take_while`](https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.peeking_take_while) to avoid consuming the next element when checking whether it satisfies the slot predicate. ## Additional Info There's a gist here that shows the basic dynamics in isolation: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=40b623cc0febf9ed51705d476ab140c5. Changing the `peeking_take_while` to a `take_while` causes the assert to fail. Similarly I've added a new test `block_replayer_peeking_state_roots` which fails if the same change is applied inside `get_state_root`.

ethDreamer added 2 commits October 3, 2023 09:20

Initial Commit of State LRU Cache

e8ef884

Build State Caches After Reconstruction

433d59f

ethDreamer changed the title ~~Initial Commit of State LRU Cache~~ Prevent Overflow LRU Cache from Exploding Oct 3, 2023

ethDreamer added the deneb label Oct 3, 2023

ethDreamer added 3 commits October 3, 2023 12:38

Cleanup Duplicated Code in OverflowLRUCache Tests

d3709c5

Added Test for State LRU Cache

31e6176

Prune Cache of Old States During Maintenance

18577fe

ethDreamer added the ready-for-review The code is ready for review label Oct 3, 2023

ethDreamer requested a review from michaelsproul October 3, 2023 20:27

michaelsproul requested changes Oct 4, 2023

View reviewed changes

michaelsproul added waiting-on-author The reviewer has suggested changes and awaits thier implementation. optimization Something to make Lighthouse run more efficiently. and removed ready-for-review The code is ready for review labels Oct 4, 2023

michaelsproul mentioned this pull request Oct 4, 2023

[Merged by Bors] - Use peeking_take_while in BlockReplayer #4803

Closed

Address Michael's Comments

840907f

michaelsproul reviewed Oct 6, 2023

View reviewed changes

beacon_node/beacon_chain/src/data_availability_checker/state_lru_cache.rs Outdated Show resolved Hide resolved

Few More Comments

a6acb23

ethDreamer added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Oct 6, 2023

Removed Unused impl

62f0b3f

realbigsean reviewed Oct 6, 2023

View reviewed changes

ethDreamer added 2 commits October 10, 2023 17:26

Last touch up

2e67dbd

Fix Clippy

2aa30fb

michaelsproul approved these changes Oct 11, 2023

View reviewed changes

ethDreamer merged commit 8660043 into sigp:deneb-free-blobs Oct 11, 2023
21 of 24 checks passed

realbigsean mentioned this pull request Oct 18, 2023

Prevent Overflow LRU Cache from blowing up due to large state sizes #4757

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent Overflow LRU Cache from Exploding #4801

Prevent Overflow LRU Cache from Exploding #4801

ethDreamer commented Oct 3, 2023

michaelsproul left a comment

realbigsean Oct 6, 2023

michaelsproul Oct 9, 2023

ethDreamer Oct 10, 2023

realbigsean Oct 10, 2023

ethDreamer Oct 10, 2023

Prevent Overflow LRU Cache from Exploding #4801

Prevent Overflow LRU Cache from Exploding #4801

Conversation

ethDreamer commented Oct 3, 2023

Issue Addressed

michaelsproul left a comment

Choose a reason for hiding this comment

realbigsean Oct 6, 2023

Choose a reason for hiding this comment

michaelsproul Oct 9, 2023

Choose a reason for hiding this comment

ethDreamer Oct 10, 2023

Choose a reason for hiding this comment

realbigsean Oct 10, 2023

Choose a reason for hiding this comment

ethDreamer Oct 10, 2023

Choose a reason for hiding this comment