FN/BN CPU BBQ #3630

Wondertan · 2024-08-06T17:29:37Z

Overview

Recent spikes in network activity increased resource consumption for FN/BN node operators to inadequate levels.

Problem Description

The flamegraph shown below is from the profile captured at the exact moment where such a spike happened.

The profile suggest that the most of the time spent happens during GetSize operation on the Blockstore. FN/BN uses GetSize to determine whether it has data or not. This check is triggered by remote Bitswap requests from LNs. Before requesting data itself, LNs check which peers have the data by broadcasting these requests to all connected peers(in worst case). Once they know who has the data, they request a particular peer for the data.

As it turns out, checking existence of data is an expensive operation which is as expensive as getting the data itself. This comes from the inverted_index the main bottleneck of the celestia-node(at the moment of writing). This index is maintained within KVStore that is enormous and simple lookup operations are the source of the CPU overhead we seeing.

Overall, we have LNs broadcasting data checks for every sample operations, which, in fact, consists of multiple requests where each of those multiple requests get duplicated to every connected peer in the worst case, creating massive load on the network.

Potential Solutions

Wait for Shwap which particularly targets this issue and many others.
- This was the initial strategy, but BBQ is getting hotter so we have to come with an iterim solution.
Prune inverted index. We don't need to index outside availability window, while the index is not pruned and evergrowing.
- Reducing index size will speed up queries to KVStore.
- Complex and pruning itself will induce more pressure on KVstore in use.
Remove inverted index by doing quick change to sampling protocol.
- Every single sampling requests would then contain DataRoot/Hash routing FN to respective EDS CAR file.
- Breaking protocol change that feels simple, but is still protocol-breaking and potentially full of unknown uknowns.
Adjust topology. Reducing the number of peers LNs connects will reduce the amount of broadcasts the node does.
- Only reduces the load, but does not solve the issue.
Dummy solution. For the described data existence pre-checks, FN/BNs always report like they have the data, without doing lookups into inverted index.
- Extremely simple and non-breaking. The prevailing precheck msgs won't cause any load, so it solves the issue.
- The correctly running FN/BN should have that data anyway.
- If a FN/BN falls behind or is in syncing stage, it will technically lie to LN requesting data. However, the LN will use other FN after a timeout.
- Similarly affects historical Blob requests over IPLDGetter as pruned node will report that they have data, while in fact they node.

Decision

After internal discussion we decided to go with "Dummy Solution". It solves the problem in a simple and fast way and tradeoffs it brings are negligible.

We may still do additionally the solution that removes inverted index entirely to win more time if Shwap gets delayed or if there is more pressure on achieving bigger blocks that inverted index is blocking as well

The text was updated successfully, but these errors were encountered:

Wondertan · 2024-08-07T15:53:39Z

The implementation for the dummy solution is quite straighforward.

Here instead of doing the expensive checks, we simply return the data size, which in our case is constant. Intuitively, returning the completely arbritrary or approximate size there should work, but to minimize risks we could still return the factual sizes.

There are two types of data we serve there. Leaf and inner nodes with their respective sizes defined here. Detecting which type we are requested with is tricky and might not be possible. Quic look into the old protocol made me realize that we don't have different CID codecs for leaf or inner node requests and we unified them to be under the same code.

If we won't find a way to determine which type of request that is by simply looking at the CID, we should try returning the max size, which is the size of leaf or the share.

Wondertan · 2024-08-07T16:52:47Z

Related issue that would simplify our life: ipfs/boxo#657

But the proposed implementation above is not blocked on that and we can adopt the change in boxo once it gets shipped

Wondertan added bug Something isn't working priority:high High priority issue to be prioritized for the current/following sprint labels Aug 6, 2024

renaynay assigned cristaloleg Aug 6, 2024

renaynay added the v0.17.0 Intended for v0.17.0 release label Aug 6, 2024

cristaloleg mentioned this issue Aug 8, 2024

fix(share): return fixed data size in blockstore #3634

Merged

This was referenced Aug 12, 2024

share/getters: Consider removing / not using IPLDGetter for requests of historic data #3653

Open

Audit the need for WANT_HAVEs and cut them off in case its not #3632

Closed

walldiss closed this as completed in #3634 Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FN/BN CPU BBQ #3630

FN/BN CPU BBQ #3630

Wondertan commented Aug 6, 2024 •

edited

Loading

Wondertan commented Aug 7, 2024

Wondertan commented Aug 7, 2024

FN/BN CPU BBQ #3630

FN/BN CPU BBQ #3630

Comments

Wondertan commented Aug 6, 2024 • edited Loading

Overview

Problem Description

Potential Solutions

Decision

Wondertan commented Aug 7, 2024

Wondertan commented Aug 7, 2024

Wondertan commented Aug 6, 2024 •

edited

Loading