Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHT Optimistic Provide #25

Open
dennis-tra opened this issue Nov 15, 2021 · 8 comments
Open

DHT Optimistic Provide #25

dennis-tra opened this issue Nov 15, 2021 · 8 comments

Comments

@dennis-tra
Copy link

dennis-tra commented Nov 15, 2021

Copying the forum post to this repository.


Hi everyone,

Over the last weeks, some ResNetLab collaborators (including me) conducted IPFS uptime and content routing measurements with great support from @yiannisbot. Some results can be found here, here and here. In parallel, I went ahead and built another DHT performance measurement tool for the particular use case of investigating the provide performance as it lags far behind the discovery performance.

While investigating how the whole machinery works I had an idea of how to speed things up that I would like to share with you and also would love to get feedback on.

I would call it an "Optimistic Provide" and the TL;DR would be:

For each peer, we come across during content publication, calculate the probability of an even closer one. If the likelihood is low, just store the provider record at that peer optimistically.

For the long version, I'm copying relevant parts from this repository

https://github.com/dennis-tra/optimistic-provide

which includes more (but partially irrelevant) plots than I'm sharing here in this post.


Motivation

When IPFS attempts to store a provider record in the DHT it tries to find the 20 closest peers to the corresponding CID using the XOR distance.
To find these peers, IPFS sends FIND_NODES RPCs to the closest peers in its routing table and then repeats the process for the set of returned peers.
There are two termination conditions for this process:

  1. Termination: The 20 closest peers to the CID were queried for even closer peers but didn't yield closer ones.
  2. Starvation: All peers in the network were queried (if I interpret this condition correctly)

This can lead to huge delays if some of the 20 closest peers don't respond timely or are straight out not reachable.
The following graph shows the latency distribution of the whole provide-process for 1,269 distinct provide operations.

Provide Latencies

In other words, it shows the distribution of how long it takes for the kaddht.Provide(ctx, CID, true) call to return.
At the top of the graph, you can find the percentiles and total sample size. There is a huge spike at around 10s which is probably related to an exceeded context deadline - not sure though.

If we on the other hand look at how long it took to find the peers that we eventually attempted stored the provider records at, we see that it takes less than 1.6s in the vast majority of cases.

Discover Latencies

Again, sample size and percentiles are given in the figure title. The sample size corresponds to 1269 * 20 as in every Provide-run we attempt to save the provider record at 20 peers.

The same point can be made if we take a look at how many hops it took to find a peer that we eventually attempted to store the provider records at:

Hop Distribution

Note the log scale of the y-axis. Over 98 % of the time, an appropriate peer to store the provider record at was found in 3 hops or less.

Optimistic Provide

The discrepancy between the time the provide operations take and the time it could have taken led to the idea of just storing provider records optimistically at peers.
This would trade storing these records on potentially more than 20 peers for decreasing the time content becomes available in the network.
Further, it requires a priori information about the current network size.

Procedure

Let's imagine we want to provide content with the CID C and start querying our closest peers.
When finding a new peer with Peer ID P we calculate the distance to the CID C and derive the expected amount of peers μ that are even closer to the CID (than the peer with peer ID P).

If we norm P and C to the range from 0 to 1 this can be calculated as:

μ = || P - C || * N

Where N is the current network size and || . || corresponds to the normed XOR distance metric.

The logic would be that if the expected value μ is less than 20 peers we store the provider record at the peer P.

This threshold could also consider standard deviation etc. and could generally be tuned to minimize falsely selected peers (peers that are not in the set of the 20 closest peers).

Example

The following graph shows the distribution of normed XOR distances of the peers that were selected to store the provider record to the CID that was provided.

Selected Peer Distances

The center of mass of this distribution is roughly at 0.1 %. So if we find a peer that has a distance of || P - C || = 0.1 % while the network has a size of N = 7000 peers we would expect to find 7 peers that are closer than the one we just found.
Therefore, we would store a provider record at that peer right away.

N = 7000 is a realistic assumption based on our crawls.

Network Size

Since the calculation above needs information about the current network size there is the question of how to get to that information locally on every node. I could come up with three strategies:

  1. Hard Coding
  2. Periodic full DHT crawls (was implemented in the new experimental DHT client)
  3. Estimation based on peer-CID proximity of previous provide operations

As said above, measurement methodology and more information are given in this repository.

Generally, I would love to hear what you think about it, if I made wrong assumptions, if I got my maths wrong, if anything else seems fishy, etc 👍

Best,
Dennis

PS: @yiannisbot pointed me to this IPFS Content Providing Proposal. If the proposal outlined in this post proofs to be viable, I would consider it a fifth option of techniques to speed things up.

@petar
Copy link

petar commented Nov 15, 2021

Copy pasting my first comment from the slack thread for continuity:

“Optimistic providing” is a very promising direction and the repo presentation is nice and clear. The provide process visualizations would are great to have going forward. And even the distribution graphs for provide and find would be awesome to have as occasional automated reports in our infra.

The project proposes a procedure for estimating the number of peers closer to a key than a given peer, assuming knowledge of the total count of peers. This procedure should be investigated more carefully:

  • As peer IDs are random numbers, their distribution in the key space is far from uniform. Very roughly speaking, it is a Poisson distribution. There are theoretical results (e.g. survey in http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/twosurvey.pdf) that indicate that the proposed estimator (normed XOR distance times count of peers, N) can be off from the correct answer by as much as a factor of log N.
  • One approach is to understand this caveat better with a data study (rather than digging in theory papers). For instance, plot the distribution of “normed XOR distance to closest peer” across many (or most) peers. Large variation should be observed. Try using network size 20K, which is more reflective of true network size, and it will also highlight the variance better (which, being a multiple of logN, grows with the network size)

The “optimistic provide” ideation also suggests a slightly different approach to "optimistic providing" which avoids the imprecision in the estimation procedure (above). I would investigate this direction and I think it has good potential. Provide optimistically, using the following algorithm:

  • Perform two parallel “independent” DHT queries for the target key. “Independent” means that the queries should start from a different set of seed peers.
  • When the sets of peers discovered by both queries overlaps by at least 10 peers, stop both queries and store the values on these 10 peers.

The key point here is that when the condition "the two queries intersect" is met, we have probabilistically verified that if a client is to search for the key they would also discover the intersection peers. The second query in the provide algorithm essentially simulated a future client.

Doing it this way avoids making estimates, which can be imprecise.

@petar
Copy link

petar commented Nov 15, 2021

@yiannisbot mentioned that additional graphs (not yet shown here) suggest that under some parameterization of the "optimistic provide" algorithm (that uses an estimator), clients find the peers where content was provided optimistically in 1 to 2 hops. This is good news, however we need a bit more to trigger potential implementation:

  • the rule cannot be based on constants that work for the current network size (it must scale automatically as the network grows)
  • (for any variant of optimistic routing) success metrics must be measured with respect to a large set of keys that cover the key space densely (e.g. N/2 keys, if N is the size of the network in # of peers). this is necessary due to the non-uniform distribution of peer ids mentioned above.
  • if the estimation-based variant of optimistic routing is to be used, we would also need a design for how peers can continuously measure the size of the network in a decentralized manner. this seems complex, which is another reason why the non-estimator-based algorithm is worth considering.

@dennis-tra
Copy link
Author

dennis-tra commented Nov 17, 2021

Hi @petar, thanks for the quick and thoughtful reply!

My overall takeaways are:

  • Optimistic Providing as a general concept is a promising direction.

  • Optimistic Providing can have different manifestations as you're suggesting a different algorithm to provide optimistically.
    Your algorithm utilizes two concurrent, independent queries that start at different points in the keyspace. As soon as both queries find the same ten peers, the provider record is stored.

  • Assuming a uniform distribution for the peer IDs in the keyspace is not correct.

    As peer IDs are random numbers, their distribution in the keyspace is far from uniform.

    This came to me as a surprise and probably reveals quite a bit about my statistical intuition. However, it resembles the observations that I made when I was doing some numerical simulations. Assuming a uniform distribution didn't match reality at all. Your comment let me read up on it and I'm just leaving some links as future reference [1, 2]. The main takeaway for me is that the estimator needs more thought but the general idea would still work.

  • Getting an accurate estimation of the current network size is a challenge as it needs to dynamically adapt to changing network conditions.
    By chance, I came across another issue on this repository and specifically this post from @yiannisbot:

    Regarding measurement of network size, there are generally two approaches as far as I have seen.

    1. The distance to the nearest neighbours in the routing table can be used as a network size estimate as it correlates with the network size
    2. Measurement or signalling (e.g., for DHT self-stabilisation). In this second category probabilistic approaches can be used to model inaccuracies. Gossiping has also been proposed as an optimisation to self-stabilisation to avoid extensive signalling.

    The first option sounds suspiciously compelling - every time you have an up-to-date routing table you can estimate the network size. However, I can't assess the precision that this estimate gives and I can't tell yet which precision is actually needed for the optimistic provide algorithm.
    The other idea I had was to use estimations based on peerID-to-CID proximity of previous "full" provide operations. I can imagine that if there was a statistical model for the distribution of the fourth plot from above one could derive the network size from a reasonable sample size of "full" provide operations.

  • It must be measured how well the algorithms perform.
    If I understand your suggestion correctly, you would add provider records for N/2 CIDs and then check if these records will be found afterward.
    I would add to the list: delay until content becomes available in the network (provider records written)

Another thing that I'm still thinking about: Would the provide process stop when 20 records (or ten in your case) were put optimistically or would the process still continue. I can imagine that there are cases where peers are not able to find content because of the optimistic nature. I guess, the success metrics measurements could help answering that question.

[1] https://eighty-twenty.org/2013/03/26/span-lengths-in-chord-style-dhts
[2] https://subs.emis.de/LNI/Proceedings/Proceedings61/GI-Proceedings.61-28.pdf

@petar
Copy link

petar commented Nov 17, 2021

i. The distance to the nearest neighbours in the routing table can be used as a network size estimate as it correlates with the network size

This is precisely the estimator which suffers from large inaccuracies due to the uneven distribution of random IDs (some IDs are log N-times closer together than others). You are welcome to investigate different estimators and I would be happy to follow your findings. However, my theoretical experience suggests that this is likely not a fruitful direction: I don't think it is possible to estimate network size sufficiently accurately based on local observations (i.e. things like "my distance to nearest peer").

In order to understand this phenomenon of "varying density" of peers in the key space, you could do the following experiment:

  • generate 10K random IDs
  • place them in a binary trie (we have implemented one already: https://github.com/libp2p/go-libp2p-xor)
  • observe the variance in the depth of all the nodes in the trie
  • note that the depth in the trie is a measure of the density around a node

This is an alternative way of seeing that the "distance to my closest peer, using the normed XOR distance" varies wildly from peer to peer. (The normed XOR distance between two peers equals 2^{- the depth of the peers in the trie}.)

@petar
Copy link

petar commented Nov 17, 2021

Another thing that I'm still thinking about: Would the provide process stop when 20 records (or ten in your case) were put optimistically or would the process still continue.

The process would stop. It's key to understand the idea driving this algorithm.

At high level, both algorithms are trying to ensure that the predicate "if someone looks for the CID I am providing, their search for the CID should lead them to the nodes where I decided to provide the records" is true probabilistically with respect to the key space.

The estimator-free algorithm is ensuring this directly by "probabilistically proving" that the provided records will be found by other users:

  • The first lookup simulates someone who is trying to provide some records (for a cid)
  • The second lookup simulates someone who is looking for the cid
    We put the records where they intersect because we probabilistically believe that subsequent queries will behave similarly to the "second lookup" during provide which reached the peers with the records.

In fact, this guarantee can be strengthened if necessary by running 3 (instead of 2) queries in parallel and placing the records where they all intersect.

@petar
Copy link

petar commented Nov 18, 2021

The big picture here is this:

Both estimator-based and estimator-free provide algorithms would work (in theory), when the correct parameters are used. The question is which one is less costly for the benefit of achieving better latency.

  • The estimator-based algorithm needs an estimation procedure which needs to be designed and its networking cost assessed.
  • The estimator-free algorithm incurs a networking cost in that it does multiple parallel queries.

The two costs need to be compared to pick a winner.

Both algorithms must be proven to work correctly, which must be demonstrated by showing that records that are provided optimistically at most places in the keys pace can be found by a subsequent lookup.

@dennis-tra
Copy link
Author

Thanks again, @petar!

I did some simulations based on your input and the links that I've posted above. I definitely could get a feeling for the inaccuracies of the proposed estimator. I basically recreated similar plots to [1]. I'm leaving here some research about how to locally estimate the size of a peer-to-peer network [2] [3].

Regarding performance indicators/success metrics I would consider:

  • provide delay (as in the first graph)
  • content discovery success ratio
  • networking overhead compared to the current implementation
    • estimator-based: it would be great if a sufficient estimate (whatever this is) could be derived from the current interactions that are done anyways (this would mean no overhead)
    • estimator-free: it seems that the current provide algorithm starts by contacting the ten closest known peers. Could this be halved and, let's say, start at the five closest and the five furthest known peers and then apply the logic you've outlined? (Just thinking out loud here). This would mean very little added networking cost.

I'll have a chat with @yiannisbot about pushing this topic forward next week. I'm definitely motivated to further work on this and also come up with statistical models that could assess the incurred costs of either approach.

[1] https://eli.sohl.com/2020/06/05/dht-size-estimation.html
[2] https://www.researchgate.net/publication/225863407_Estimating_The_Size_Of_Peer-To-Peer_Networks_Using_Lambert's_W_Function
[3] https://www.researchgate.net/publication/220717323_Peer_to_peer_size_estimation_in_large_and_dynamic_networks_A_comparative_study

@BigLep
Copy link

BigLep commented Apr 6, 2023

For anyone watching this discussion, this functionality is rolling out into the Kubo IPFS implementation (see https://protocollabs.notion.site/Optimistic-Provide-2c79745820fa45649d48de038516b814 and ipfs/kubo#9753)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants