-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eclipse robust DHT using expected ID distribution #18
Comments
Sounds plausible. And there is literature covering probabilistic methods to estimate network size precisely based on a reasoning similar to yours; your proposal is using that same measurement to layer detect anomalies. @yiannisbot? |
Could you clarify a couple of points in the proposal:
Do you mean that the bucket size would be based on the size of the network, rather than being a constant?
I'm not clear on which nodes share the bits here. Are you saying that given a random key, we can expect that probabilistically approximately one node's peer ID will share 13 bits with that key?
I think a node can roughly calculate this based on the distribution of peer IDs of its closest neighbours (ie the closest K peer IDs in its routing table) without needing to ask neighbours.
Are you saying here that:
Not sure if I'm understanding the above correctly? |
I think there may be something interesting here with estimating network size. If IIUC two issues with this proposal are:
Another approach taken by I2P for this problem is to combine DHT keys with dates to make it more difficult to eclipse a particular target (source). IIUC in I2P a node's KadID is If creating new Kad peers is very cheap (e.g. no problem recreating the same peers every day) then I2Ps strategy doesn't help, but it may be useful to us if we start making KadID creation and maintenance more expensive. |
Not exactly. My description isn't quite accurate as I'm not suggesting we actually change K. I'm suggesting we use K and the network size to figure out the expected key distribution. Basically, what's the expected XOR distance X from any given key such that we'd expect K nodes to fall within that range. Once we calculate X, we'd:
If the network is operating correctly, we'd expect to have K peers within XOR distance X and we'd expect to put/get to/from K peers.
Yes.
Unless that node is being eclipsed. There are two eclipse attacks:
Yes. Of course, this does mean that an attacker could force other nodes to do a bunch of work. However, they should have to do a proportional amount of work and they shouldn't be able to actually hide anything.
We'd dial all the final peers in parallel. But yes, this could lead to a lot of work. We'd likely need audit/score peers to detect misbehavior. I wonder if we could abuse relays and ephemeral libp2p nodes to anonymously verify that our peers are behaving correctly.
You're right, the closest peers to a key would need to return all peers within X (xor distance) of the key. And we would need to set an upper limit.
I'm not sure how well that works unless I don't know the key; I believe this defense is trying to defend against that case. That is, when the day changes, I can simply create 20 nodes close to the key. Peers close to that key will accept me because I'm the closest. |
@Stebalien thanks for clarifying. Some details to be worked out but this seems like a good lead for how to mitigate against eclipse attacks while using linear defense resources. Do I understand correctly that the query also needs to get values from all the peers with the common prefix (eg if there are 100 peers in the 8 bits of common prefix range, it needs to get values from all 100 peers)? In the case of an IPNS-like system which fetches a certain number of values (eg 16) an attacker could still flood the target key's neighbourhood with enough sybils to have a high chance of shutting out legitimate nodes, is that right? |
I think this sort of defensive may help if creating and/or maintaining peers in the network is expensive enough that creating enough peers to eclipse the network within a 1 day period is too expensive. We're certainly not there yet, but we could potentially utilize PoW KadID generation and/or some reputation system based on continuous availability to help here.
Good news, we shouldn't need to deal with this system for much longer. Generally there are two types of DHT Get queries, those that abort early and those that actually locate the |
Yes.
Yes but we're dropping that feature. Instead, we're continuing till we "terminate" the query. In this case, termination would mean talking to all nodes with the common prefix. |
@Stebalien: the following paper is proposing something very similar to what you're suggesting. They assume a fixed network size (of 4M in their case), so size estimation is not an issue for them, although they acknowledge that if network size fluctuates a lot, a size estimation technique needs to be in place. They detect DHT attacks by comparing the abnormal peer ID distribution introduced around the targeted entry to the theoretical IDs distribution. After defining the theoretical distribution, they use the Kullback-Leibler divergence to detect anomalies. Certainly interesting read. |
@raulk: Regarding measurement of network size, there are generally two approaches as far as I have seen.
[1] Controlling the Cost of Reliability in Peer-to-Peer Overlays - page 31-42 |
It may be possible to make a DHT robust against eclipse attacks by using expected distribution of node IDs.
In a DHT, nodes are expected to be evenly distributed around the node ID space. In a DHT with an active eclipse attack, one would expect a large cluster of node IDs around the target key.
Instead of using K as the bucket size, one could use expected ID distribution. For example, in a network with 10K nodes, one would expect:
logtwo(10,000) = 13
at least bits.logtwo(network size) - logtwo(K)
)So, instead of putting to the closest 20 peers, you'd calculate the expected network size (e.g., by asking other peers how close their neighbors are and assuming a uniform network), then put to all peers sharing the expected number of bits (in this case, 8).
The text was updated successfully, but these errors were encountered: