Feat: Track Session Peer Latency More Accurately #149

hannahhoward · 2019-07-04T16:43:31Z

Goals

Track the speeds between session peers more accurately

Implementation

The current SessionPeerManager uses a very simple algorithm for sorting peers by optimization -- it simply orders them by last block received.

This algorithm attempts to actually track peers length of time to respond to requests, and sort them not only by which is fastest, by provide useful information (and optimization rating from 1-0, with 1 being the fastest peer) about how they relate to each other.

The steps are as follows:

For a broadcast request, track all responses (not just first one) until a preset timeout period (5 seconds for now) and use that to establish optimization ratings between peers
For targeted requests, individually measure time for each peer requested up to a timeout period. If response is received, track total time. If no response is received, but a cancel was sent (usually cause another peer responded first), ignore. If no response is received before a timeout, and no cancel was sent, record the full timeout period as it's latency.
Prioritize latency of last response (0.5 * last response + 0.5 * previous latency rating)

For Discussion

Is this too complicated and specific (doesn't feel that way to me -- feels like we should produce the best information possible)
Is the fall off right (0.5)?
Is the timeout right (5 seconds)?
Does the logic around which timeouts matter make sense?

Return optimized peers in real latency order, weighted toward recent requests

When fetching optimized peers from the peer manager, return an optimization rating, and pass on to request splitter BREAKING CHANGE: interface change to GetOptimizedPeers and SplitRequests public package methods

Better estimate latency per peer by tracking cancellations

send duplicate responses to the session peer manager to track latencies

hannahhoward · 2019-07-04T18:16:57Z

Note marked improvement on time for some benchmarks:

BenchmarkDups2Nodes/AllToAll-OneAtATime-2                                                           2071401035      2072688572     +0.06%
BenchmarkDups2Nodes/AllToAll-BigBatch-2                                                             88909019        90046606       +1.28%
BenchmarkDups2Nodes/Overlap1-OneAtATime-2                                                           2632222013      2632567139     +0.01%
BenchmarkDups2Nodes/Overlap2-BatchBy10-2                                                            820683679       820798869      +0.01%
BenchmarkDups2Nodes/Overlap3-OneAtATime-2                                                           2627422739      2067928550     -21.29%
BenchmarkDups2Nodes/Overlap3-BatchBy10-2                                                            822213067       818108666      -0.50%
BenchmarkDups2Nodes/Overlap3-AllConcurrent-2                                                        707189322       701354193      -0.83%
BenchmarkDups2Nodes/Overlap3-BigBatch-2                                                             701004548       693238080      -1.11%
BenchmarkDups2Nodes/Overlap3-UnixfsFetch-2                                                          692404913       215097237      -68.93%
BenchmarkDups2Nodes/10Nodes-AllToAll-OneAtATime-2                                                   2069193746      2075425311     +0.30%
BenchmarkDups2Nodes/10Nodes-AllToAll-BatchFetchBy10-2                                               241809647       243263661      +0.60%
BenchmarkDups2Nodes/10Nodes-AllToAll-BigBatch-2                                                     98872270        96828694       -2.07%
BenchmarkDups2Nodes/10Nodes-AllToAll-AllConcurrent-2                                                95828461        95103353       -0.76%
BenchmarkDups2Nodes/10Nodes-AllToAll-UnixfsFetch-2                                                  115383212       114733473      -0.56%
BenchmarkDups2Nodes/10Nodes-OnePeerPerBlock-OneAtATime-2                                            6552511357      6558910244     +0.10%
BenchmarkDups2Nodes/10Nodes-OnePeerPerBlock-BigBatch-2                                              1281881927      1309517705     +2.16%
BenchmarkDups2Nodes/10Nodes-OnePeerPerBlock-UnixfsFetch-2                                           1110855308      1108554936     -0.21%
BenchmarkDups2Nodes/200Nodes-AllToAll-BigBatch-2                                                    907350546       957346823      +5.51%
BenchmarkDupsManyNodesRealWorldNetwork/200Nodes-AllToAll-BigBatch-FastNetwork-2                     2642276485      2375770917     -10.09%
BenchmarkDupsManyNodesRealWorldNetwork/200Nodes-AllToAll-BigBatch-AverageVariableSpeedNetwork-2     4176594592      3007236195     -28.00%
BenchmarkDupsManyNodesRealWorldNetwork/200Nodes-AllToAll-BigBatch-SlowVariableSpeedNetwork-2        13381514550     7773090900     -41.91%

Stebalien

Overall, this looks awesome! My main comments are:

Let's document the interfaces (when certain functions should be called).
Can we write a benchmark that issues many parallel requests for unavailable content? This adds some complicated per-CID logic so I'm a bit worried about issues like #154.

Stebalien · 2019-07-05T15:48:21Z

sessiondata/sessiondata.go

+// OptimizedPeer describes a peer and its level of optimization from 0 to 1.
+type OptimizedPeer struct {
+	Peer               peer.ID
+	OptimizationRating float64


Can we comment on what this rating means?

Stebalien · 2019-07-05T21:58:17Z

sessionpeermanager/latencytracker.go

+	request, ok := lt.requests[key]
+	var latency time.Duration
+	if ok {
+		latency = time.Now().Sub(request.startedAt)


nit: time.Since(request.startedAt)

Stebalien · 2019-07-05T22:04:36Z

sessionpeermanager/sessionpeermanager.go

+
+func (ptm *peerTimeoutMessage) handle(spm *SessionPeerManager) {
+	data, ok := spm.activePeers[ptm.p]
+	if !ok || !data.lt.WasCancelled(ptm.k) {


Should this be ok && !data.lt.....? That is, do we want to record timeouts for inactive peers?

(that is, won't this add these peers to the active set?)

Stebalien · 2019-07-05T22:17:52Z

IMO, the constants are fine. Ideally, we'd some how "learn" them but I can't think of a simple way to do so.

Stebalien · 2019-07-05T22:20:07Z

BenchmarkDupsManyNodesRealWorldNetwork/200Nodes-AllToAll-BigBatch-FastNetwork-2                     2642276485      2375770917     -10.09%
BenchmarkDupsManyNodesRealWorldNetwork/200Nodes-AllToAll-BigBatch-AverageVariableSpeedNetwork-2     4176594592      3007236195     -28.00%
BenchmarkDupsManyNodesRealWorldNetwork/200Nodes-AllToAll-BigBatch-SlowVariableSpeedNetwork-2        13381514550     7773090900     -41.91%

This ^^ really shows that it's working. That's exactly what I'd expect from latency tracking. This is going to be a really nice boost. ❤️

Kubuxu · 2019-07-06T15:57:47Z

Prioritize latency of last response (0.5 * last response + 0.5 * previous latency rating)

From my experience with digital signal processing and networking systems, the alpha parameter of 0.5 in exponential moving average seems high.

Could you set up logging (probably one separate logger) with raw data so we can try tweaking these parameters? As an example, one lost packet over TCP will incur 2-3 RTTs of jitter over normal latency.

From my simulations in matlab, with network connection defined by Rayleigh distribution, packet drop probability of 0.5% and EMA for latency tracking, 0.5 seems too high. Take a look:

Especially that we track only blocks transferred and not latency in general.

Matlab script for those interested: https://gist.github.com/Kubuxu/5c58022d7af6b1f3dfb66f0eae5a730c

Stebalien · 2019-07-15T23:07:15Z

I'm going to merge this as strictly better than what we have. Given our new release process, I'm confident that we'll catch any regressions (if any) before they hit users.

…cking Feat: Track Session Peer Latency More Accurately This commit was moved from ipfs/go-bitswap@8f0e4c6

hannahhoward added 4 commits July 3, 2019 11:53

feat(sessions): track real latency per peer

98f01e7

Return optimized peers in real latency order, weighted toward recent requests

feat(sessions): pass optimization rating

8e59a71

When fetching optimized peers from the peer manager, return an optimization rating, and pass on to request splitter BREAKING CHANGE: interface change to GetOptimizedPeers and SplitRequests public package methods

feat(sessionpeermanager): track cancels

1bf9ed3

Better estimate latency per peer by tracking cancellations

feat(sessions): record duplicate responses

0d8b75d

send duplicate responses to the session peer manager to track latencies

Stebalien approved these changes Jul 5, 2019

View reviewed changes

Stebalien merged commit 8f0e4c6 into master Jul 15, 2019

Jorropo pushed a commit to Jorropo/go-libipfs that referenced this pull request Jan 26, 2023

Merge pull request ipfs/go-bitswap#149 from ipfs/feat/better-peer-tra…

5f0d952

…cking Feat: Track Session Peer Latency More Accurately This commit was moved from ipfs/go-bitswap@8f0e4c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Track Session Peer Latency More Accurately #149

Feat: Track Session Peer Latency More Accurately #149

hannahhoward commented Jul 4, 2019

hannahhoward commented Jul 4, 2019 •

edited

Loading

Stebalien left a comment

Stebalien Jul 5, 2019

Stebalien Jul 5, 2019

Stebalien Jul 5, 2019

Stebalien Jul 5, 2019

Stebalien commented Jul 5, 2019

Stebalien commented Jul 5, 2019

Kubuxu commented Jul 6, 2019 •

edited

Loading

Stebalien commented Jul 15, 2019

Feat: Track Session Peer Latency More Accurately #149

Feat: Track Session Peer Latency More Accurately #149

Conversation

hannahhoward commented Jul 4, 2019

Goals

Implementation

For Discussion

hannahhoward commented Jul 4, 2019 • edited Loading

Stebalien left a comment

Choose a reason for hiding this comment

Stebalien Jul 5, 2019

Choose a reason for hiding this comment

Stebalien Jul 5, 2019

Choose a reason for hiding this comment

Stebalien Jul 5, 2019

Choose a reason for hiding this comment

Stebalien Jul 5, 2019

Choose a reason for hiding this comment

Stebalien commented Jul 5, 2019

Stebalien commented Jul 5, 2019

Kubuxu commented Jul 6, 2019 • edited Loading

Stebalien commented Jul 15, 2019

hannahhoward commented Jul 4, 2019 •

edited

Loading

Kubuxu commented Jul 6, 2019 •

edited

Loading