[ipfs/go-bitswap] Reduce Broadcasting #81

Stebalien · 2021-10-05T20:29:20Z

Currently, when we can't find some content, we add it to the "broadcast" list and ask every connected peer. Unfortunately, this means that bitswap nodes under heavy load will:

Take a while to find blocks (heavy load).
Believe these blocks "can't be found" and broadcast requests.
Slow down even more.

Furthermore, these massive broadcast lists end creating a lot of unnecessary traffic.

Insight: Sessions tend to be highly correlated (by design) so asking for more than one CID in a session is unlikely to give us any extra information. At the very least, broadcasting a request for a new CID B won't get us any closer to finding some previously broadcast CID A.

Given this insight, we can trim down the broadcast list to one want per session, which is much more manageable. It also doesn't really matter which want, as long as it's relevant.

Proposal:

When a session has no peers and/or is "stuck", broadcast a single want request. This want should probably be the oldest still outstanding want in the session.
When broadcasting, rank peers by "class" and iteratively widen the broadcast. Some heavily loaded nodes like public gateways may simply ignore some classes and/or "batch" requests.
1. Peers who previously gave us blocks but got kicked out of the session (IIRC, this class exists but I'm not sure).
2. Peers we've never seen before (never sent a broadcast to).
3. Peers we've previously sent broadcasts to.

dirkmc · 2021-10-06T07:23:20Z

I agree with the general idea 👍 We don't need to broadcast so many CIDs.

Two considerations:

We may become connected to new peers during a long-running transfer. They are most likely to have the root CID. So probably it makes sense to always include the first CID in the broadcast.
Nodes are most likely to have the top branches in the DAG. As the transfer progresses, the blocks are more likely to be sparsely distributed (Node A has some, Node B has some blocks overlapping with Node A, some unique to Node B, etc). So towards the end of the transfer it may be better to include more CIDs in the broadcast.
It may make sense to iteratively increase the number of CIDs in the broadcast.

Stebalien · 2021-10-08T05:02:07Z

So, my point here is that including more than one CID doesn't actually help us in the end. See:

Insight: Sessions tend to be highly correlated (by design) so asking for more than one CID in a session is unlikely to give us any extra information. At the very least, broadcasting a request for a new CID B won't get us any closer to finding some previously broadcast CID A.

Basically, sure, adding additional CIDs may get us those blocks. But it won't get us any closer to finding the CIDs that are already in our broadcast set.

We may become connected to new peers during a long-running transfer. They are most likely to have the root CID. So probably it makes sense to always include the first CID in the broadcast.

Yes, at that point we've already moved on. I'm proposing announcing the last block requested to find the "most specific" peers. "general" peers (e.g., ones that have the root only) may still be able to help "share the load" so to speak, but I'm not sure if it's worth broadcasting to find those peers.

dirkmc · 2021-10-08T08:46:01Z

it won't get us any closer to finding the CIDs that are already in our broadcast set

Ok makes sense 👍

I'm proposing announcing the last block requested to find the "most specific" peers

I think we may want to broadcast want-haves for several CIDs at once.
Consider the following scenario:

First we broadcast a request for cid1, and get back Peer A:

Peer A: cid 1, cid 2, cid 3, cid 4, cid 5

Peer B, C & D connect to us.

Peer B: cid 1, cid 6
Peer C: cid 1, cid 7
Peer D: cid 1, cid 8

We broadcast a want-have for cid 6 and get a HAVE from Peer B.
We broadcast a want-have for cid 7 and get a HAVE from Peer C.
We broadcast a want-have for cid 8 and get a HAVE from Peer D.

Instead of broadcasting one want-have at a time it would be more efficient to broadcast all of them together.

Stebalien · 2021-10-08T17:26:43Z

It's a performance trade-off, but I don't think this scenario is all that common. The basic assumption of sessions is that most blocks being requested through the session are correlated in some way. If different peers have individual blocks, the session is pretty useless.

But I guess the best solution is to adapt:

If we find we're having to pause to broadcast frequently, we increase the number of CIDs (in the session) that we broadcast.
If we find that we're not having to frequently broadcast, we decrease the number of CIDs.

NOTE: we'd only increase the number of CIDs if the broadcast was actually successful. That is, it found a new peer that wasn't previously in the session that had the block.

RubenKelevra · 2022-05-05T17:36:02Z

@Stebalien how about using probabilities via feedback from previous attempts to determine which nodes should get a "broadcast" request?

We do number_of_supplies / number_of_requests for each peer – so for example, we get 0.03 for a 3% chance of a successful reply.

Then we feed this into an 'exponential moving average' with an alpha of 0.005. If we don't do any requests to peers, their value should just be frozen. This avoids dropping of the value by supplying zeros and also avoids that we have to do calculations.

To make sure new connections get some chance of getting requests, we should send a value of 1 into the 'exponential moving average' to get it started.

Now we use the result as the probability that we send a given CID-broadcast to a node we know (maybe not even be connected to). We might want to redial some known "useful" nodes in the past, to check if they got useful data for us now.

So if we have nodes which don't supply any useful data for us in the recent past, we slowly phase them out from being asked and just focus on the nodes which do useful response to us.

Stebalien added the need/triage Needs initial labeling and prioritization label Oct 5, 2021

Stebalien mentioned this issue Oct 5, 2021

Deadlock/overload in gateway during spiking goroutine count ipfs/kubo#8381

Open

BigLep mentioned this issue Jan 27, 2023

[ipfs/go-bitswap] rewrite the wantlist to generally be less contentious #79

Open

lidel added kind/enhancement A net-new feature or improvement to an existing feature status/deferred Conscious decision to pause or backlog and removed need/triage Needs initial labeling and prioritization labels Apr 15, 2022

RubenKelevra mentioned this issue Jun 24, 2022

Connect to meshtastic nodes berty/berty#4034

Open

Jorropo changed the title ~~Reduce Broadcasting~~ [ipfs/go-bitswap] Reduce Broadcasting Jan 27, 2023

Jorropo transferred this issue from ipfs/go-bitswap Jan 27, 2023

aschmahmann mentioned this issue Mar 7, 2023

Reprovide Sweep libp2p/go-libp2p-kad-dht#824

Open

BigLep mentioned this issue Apr 27, 2023

[ipfs/go-bitswap] go-bitswap overhaul 2023 #73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ipfs/go-bitswap] Reduce Broadcasting #81

[ipfs/go-bitswap] Reduce Broadcasting #81

Stebalien commented Oct 5, 2021

dirkmc commented Oct 6, 2021

Stebalien commented Oct 8, 2021

dirkmc commented Oct 8, 2021 •

edited

Loading

Stebalien commented Oct 8, 2021

RubenKelevra commented May 5, 2022

[ipfs/go-bitswap] Reduce Broadcasting #81

[ipfs/go-bitswap] Reduce Broadcasting #81

Comments

Stebalien commented Oct 5, 2021

dirkmc commented Oct 6, 2021

Stebalien commented Oct 8, 2021

dirkmc commented Oct 8, 2021 • edited Loading

Stebalien commented Oct 8, 2021

RubenKelevra commented May 5, 2022

dirkmc commented Oct 8, 2021 •

edited

Loading