Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer management #1711

Closed
6 tasks done
sistemd opened this issue Jan 30, 2024 · 0 comments · Fixed by #1760
Closed
6 tasks done

Peer management #1711

sistemd opened this issue Jan 30, 2024 · 0 comments · Fixed by #1760
Assignees

Comments

@sistemd
Copy link
Contributor

sistemd commented Jan 30, 2024

To avoid inbound connection exhaustion (e.g. the situation where 50 malicious inbound nodes are connected to us via Kademlia, effectively DoSing our node from allowing any inbound nodes that want to gossip), we can implement peer eviction algorithms. When a new node connects, we will select one of our existing peers to disconnect (evict).

  • Limit the number of inbound connections. Treat relayed and non-relayed inbound connections as separate sets with separate limits. When there is a new incoming connection and our node is maxed out, run the inbound peer eviction algorithm:
    • Calculate the keyed network group values of the peers as SHA3(our private key || peer IP address prefix), where the prefix is /16 for IPv4 and /32 for IPv6 addresses. Group the peers by the keyed network group, and pick 4 groups with the smallest value (which is deterministic, but unpredictable by the attacker). Pick one peer from each group, and protect that peer from eviction. To pick the peer, use a similar keyed hashing approach and select the peer with the smallest SHA3(our private key || peer ID) value, so that this process is also deterministic but unpredictable by the attacker. For the attacker to circumvent this step, he would need to be able to allocate very specific IPs, and he would need to be able to predict which prefixes we are going to protect, which is impossible. The goal is to ensure we are connected to a diverse set of IP addresses.
    • Protect 8 peers with the lowest minimum ping time. To circumvent this step, the attacker would have to be able to run nodes that are geographically closer to us than these peers, which is difficult to do.
    • Not sure if we can implement this immediately, but it would also be nice to protect some nodes which have recently sent us valid useful messages (in terms of advancing the chain state or our sync). For example, maybe save 4 nodes that have most recently gossiped valid transactions, and 8 nodes that have most recently gossiped a valid new head (or any other block if we are still syncing).
    • Of the remaining nodes, protect half of them which have been connected for the longest time.
    • The node that ends up evicted is the youngest one in the most populous keyed network group. Concretely, we would sort all the nodes by a) total number of connected nodes which share their keyed network, breaking ties with b) keyed network group (maybe in reverse order, since we use regular order in the first step of this algorithm), breaking ties with c) connection time. Evict the first node after sorting.
    • It is worth noting that the above algorithm is the same for relayed and non-relayed peers (but relayed and non-relayed are treated as completely separate peer sets!). For relayed peers we may or may not want to tweak the numbers. Alternatively perhaps the numbers should be defined as percentages, and automatically scaled based on inbound relayed/non-relayed connection limits. Alternatively, we could make it so that peer limit configs can only be zero or more than 25 (but outbound limit cannot be zero).
  • Implement a rate limiting mechanism for inbound connections. E.g. cap inbound connections to 10 per second, or something reasonable like that, and drop all incoming connections when the rate limit is exceeded.
  • Provide a mechanism for the sync process to notify the P2P module that it needs more peers. For example, the sync process might have gone through all peers and found out that none of them have the blocks needed for sync. In this case the P2P system should go and look for more peers in the network via the Kademlia bootstrap() procedure. The mechanism by which the sync process notifies the P2P module to fetch new peers works as follows:
    • Peers have a "useful" flag which is by default set to true. This flag is used for outbound peer eviction (which is different from inbound peer eviction), explained later.
    • Sync goes though all the peers trying to fetch whatever data it needs. If a peer doesn't have any of the data needed, sync will set its useful flag to false.
      • We might not know yet under which exact conditions sync will set this flag to false, but from the point of view of P2P this doesn't matter.
    • Once sync runs out of peers, it should call bootstrap() on P2P.
    • Bootstrap will look through the network and try to open some new connections.
    • Outbound peer eviction algorithm: when a new connection is opened, if the outbound connection limit is reached, a connection will be evicted. Any of the outbound connections for which useful = false are candidates for eviction, and we could pick one randomly or deterministically. Perhaps we should pick the node with the highest SHA3(our private key || peer ID) value, making this process deterministic but unpredictable by the attacker.
    • This approach ensures that there's a way to rotate outbound peers and that outbound peers only get disconnected if they are not useful to us.
  • Additionally, P2P should keep an expiring list of peers that have been evicted to avoid re-opening the same connections shortly after they have been closed. This applies to both inbound and outbound connections.
  • Remove periodic Kademlia bootstrap. Doing Kademlia bootstrap lookups periodically no matter what can easily lead to opening too many outbound connections. Instead have a low peer watermark. Periodically check if the outbound peer count is lower than the low peer watermark, and only run the Kademlia bootstrap procedure if this is the case.
  • Remember to do the following config checks:
    • max_inbound_direct_connections >= 25 or equal to zero.
    • max_inbound_relayed_connections >= 25 or equal to zero.
    • low_watermark <= max_outbound_connections.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant