-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src/query/peers/closest: Consider alpha peers at initialization #1536
Conversation
Given an iterator with the configured goal of 1 result and an initial set of peers of 2, one would expect the iterator to try the second peer in case the first fails.
Instead of initializing the iterator with `num_results` amount of nodes, discarding the remaining, initialize the iterator with all provided peers. This patch allows the following scenario: > Given an iterator with the configured goal of 1 result and an initial set of peers of 2, one would expect the iterator to try the second peer in case the first fails. There is a downside to this patch. Say the iterator is initialized with 100 peers. Each peer is doomed to fail. In this case the iterator will try each peer resulting in many connection attempts. While the previous state is a safeguard against the scenario above, the same could happen when the iterator is configured with num_result of 10, the 9 first peers return 100 peers, each of them being doomed to fail, thus the iterator would again attempt to make 100 connections only to fail overall.
To be honest, this is a point that my understanding has never been entirely clear on. From the wording of the Kademlia paper as well as other sources, the lookup is supposed to start with the |
Agreed. This should either be limited by only
This would not solve my initial issue. With the suggestion in #1473 (comment) in mind consider the following scenario: Say we want to use 4 disjoint paths. Thus we would have one
At this point the forth Moving forward I see the following options:
What do you think @romanb? |
I admit that I already assumed that we would always take |
Thinking about this I find the following path forward most suitable:
|
@romanb and I discussed the following via chat:
Say the number of disjoint paths and alpha is 3. Thus all of the This would be solved by the suggestion above using the same set of |
True, your last suggestion sounds good to me then, though it may then actually still be preferable to partition the initial list of peers to make sure each iterator gets at least |
This reverts commit 4fb9366.
To allign with the Kademlia paper this commit makes the `ClosestPeersIter` consider the closest alpha peers at initialization, instead of `num_results` (which is equal to the k-value). Aligning with the paper also applies aligning with the libp2p Golang implementation as well as the xlattice specification.
I have reverted my initial commit 4fb9366 which considered all peers passed at initialization. I have added dd87819 which makes This pull request still includes a regression test. I would suggest removing it as it adds more noise than being actually helpful.
This I would address in #1473. @romanb what do you think? |
Sounds good, the tests seem to be stalling though (judging from CI)? |
I went down quite a rabbit hole to understand the failures. As far as I can tell my changes to to I think it is easiest to review the new commits one by one with each commit message in mind:
@romanb would you mind taking another look? |
Introduces the `build_node` function to build a single not connected node. Along the way replace the notion of a `port_base` with returning the actual `Multiaddr` of the node.
When looking for the closest node to a key, Kademlia considers ALPHA_VALUE nodes to query at initialization. If `num_groups` is larger than ALPHA_VALUE the remaining locally known nodes will not be considered. Given that no other node is aware of them other than node 1, they would be lost entirely. To prevent the above restrict `num_groups` to be equal or smaller than ALPHA_VALUE.
In the past, when trying to find the closest nodes to a key, Kademlia would consider `num_result` amount of nodes to query out of all the nodes it is aware of. Both the `put_record` and the `get_provider` tests initialized their swarms in the same way. The tests took the replication factor to use as an input. The number of results to get was equal to the replication factor. The amount of swarms to start was twice the replication factor. Nodes would be grouped in two groups a replication factor nodes. The first node would be aware of all of the nodes in the first group. The last node of the first group would be aware of all the nodes in the second group. By coincidence (I assume) these numbers played together very well. At initialization the first node would consider `num_results` amount of peers (see first paragraph). It would then contact each of them. As the first node is aware of the last node of the first group which in turn is aware of all nodes in the second group, the first node would eventually discover all nodes. Recently the amount of nodes Kademlia considers at initialization when looking for the nodes closest to a key was changed to only consider ALPHA nodes. With this in mind playing through the test setup above again would result in (1) `replication_factor - ALPHA` nodes being entirely lost as the first node would never consider them and (2) the first node probably never contacting the last node out of the first group and thus not discovering any nodes of the second group. To keep the multi hop discovery in place while not basing ones test setup on the lucky assumption of Kademlia considering replication factor amount of nodes at initialization, this patch alters the two tests: Build a fully connected set of nodes and one addition node (the first node). Connect the first node to a single node of the fully connected set (simulating a boot node). Continue as done previously.
c7300d6
to
ab134d7
Compare
@romanb friendly ping. Would you mind taking another look? |
Thanks for the quick feedback! Would you mind taking another look @romanb? |
Regarding ab134d7, is it really necessary to create a fully connected graph? Wouldn't it suffice to ensure |
Problem would be that Given
If we want to store the key A @romanb does the above make sense? |
I see, thanks for the example. Fully connected or not, I forgot that these tests rely on always finding the closest nodes to the key out of all nodes, and that there are intentionally more nodes than the replication factor (but not more than ps. I think ab134d7 should talk about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, a nice improvement to the tests in particular. However, I'm still a bit on the fence w.r.t. the use of \alpha
. It seems that go-libp2p-kad-dht
now even changed their behaviour to what we are currently doing (almost). In light of that, for reasons of trying to provide analogous behaviour in rust-libp2p
and to keep backward-compatibility, maybe we should just use .take(K_VALUE)
? It is still a good idea not to use the configured num_results
, as is done right now. The test improvements would remain valid, of course. What do you think?
I am fine with both While preparing a commit I noticed Very happy we have all these Quickcheck tests. |
@@ -695,6 +695,32 @@ mod tests { | |||
} | |||
|
|||
#[test] | |||
fn try_all_provided_peers_on_failure() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make this a prop test as well.
To allign with the Kademlia paper this commit makes the
ClosestPeersIter
consider the closest alpha peers at initialization,instead of
num_results
(which is equal to the k-value).Aligning with the paper also applies aligning with the libp2p Golang
implementation as well as the xlattice specification.
Instead of initializing the iterator withnum_results
amount of nodes,discarding the remaining, initialize the iterator with all provided
peers.
This patch allows the following scenario:> Given an iterator with the configured goal of 1 result and an initialset of peers of 2, one would expect the iterator to try the second peer
in case the first fails.
There is a downside to this patch. Say the iterator is initialized with100 peers. Each peer is doomed to fail. In this case the iterator will
try each peer resulting in many connection attempts.
While the previous state is a safeguard against the scenario above, thesame could happen when the iterator is configured with num_result of 10,
the 9 first peers return 100 peers, each of them being doomed to fail,
thus the iterator would again attempt to make 100 connections only to
fail overall.
I cam across this issue looking into the following suggestion in #1473 (comment):> It is probably not necessary to split the initial closest peers among the d iterators, instead letting all of them start out with the same list of initial peers. Then the "fairness" w.r.t. which iterator gets to query which peer will be determined by the composite disjoint iterator, without needing to also think about a "fair" initial split. That no peer is queried by multiple iterators is also still guaranteed by the parent disjoint iterator, of course.Would it make sense to introduce aMAX_CONSECUTIVE_FAILURE
constant to safeguard against many unsuccessful connection attempts?