[draft] decentralized state sync: p2p state part transfer #12095

saketh-are · 2024-09-16T13:07:23Z

Mainly this is a draft PR so I can get CI signals. Will be breaking into smaller PRs for review.

This module is dead code in master but I've started using it in #12095. While testing end-to-end I noticed some unexpected behavior and started digging into the implementation. While doing so I ended up refactoring it extensively and simplifying it somewhat. Ultimately I don't think there was actually any bug. One meaningful change I've made is to include the shard id in the priority score. This ensures that in the specific situation that different shards have the same/similar host set and the number of parts is smaller than the number of hosts, the load will still be well distributed. This is admittedly not a concern in mainnet (where the number of parts should be large) but can easily occur in localnet.

In this PR we implement the ability to request state parts from arbitrary peers in the network via routed messages. Previously, state parts were requested via a PeerMessage which can only be sent to directly connected peers of the node. Because the responses to these requests are large and non-time-sensitive, it is undesirable to send them over the tier1/tier2 connections used for other operations of the protocol. Hence we also introduce a new connection pool tier3 used for the sole purpose of transmitting large one-time payloads. A separate PR will follow which overhauls the state sync actor in accordance with these changes. The end-to-end behavior has been built and tested in #12095.

saketh-are added 19 commits September 9, 2024 17:25

learn own public addr from peer infos and keep it in network_state

f114ea8

add Tier3Handshake PeerMessage variant

61e101f

add Tier3 connection pool

0530d27

fix test connection_pool::invalid_edge

d7535b9

add new routed message StatePartRequest

a7f8b83

enable snapshot generation by default

aad7348

send state part request over new routed msg

3c7eb09

add tier3_requests queue

3bc6c0a

implement response via tier3

bfd99f6

fix allowed messages

436cd6a

bugfix tier3 init

8d11063

successfully handle multiple requests from same peer

02a8bb2

use SnapshotHostsCache for peer selection

016650d

include shard_id in prio and fix tests

c0fd295

simplify peer selection and fix some bugs

b3499e6

implement an idle timeout for tier3 connections

fa6c069

Merge remote-tracking branch 'origin/master' into dss-wip

17f93e8

clear peer selector when state part is received

9a267cf

fix test

4134f2c

saketh-are mentioned this pull request Sep 17, 2024

refactor(network): SnapshotHostsCache #12103

Merged

try peers first before external storage

a826bdb

saketh-are added 3 commits September 18, 2024 16:33

Merge remote-tracking branch 'origin/master' into dss-wip

30742f7

Merge remote-tracking branch 'origin/master' into dss-wip

2e384ca

get catching_up test compiling

0ecf34a

saketh-are mentioned this pull request Sep 18, 2024

feat(network): overhaul state part request #12110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[draft] decentralized state sync: p2p state part transfer #12095

[draft] decentralized state sync: p2p state part transfer #12095

saketh-are commented Sep 16, 2024

[draft] decentralized state sync: p2p state part transfer #12095

Are you sure you want to change the base?

[draft] decentralized state sync: p2p state part transfer #12095

Conversation

saketh-are commented Sep 16, 2024