Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitswap sessions #3867

Merged
merged 11 commits into from
Jul 7, 2017
Merged

Bitswap sessions #3867

merged 11 commits into from
Jul 7, 2017

Conversation

whyrusleeping
Copy link
Member

This is an implementation of the bitswap sessions enhancement i proposed in #3786

Its still a work in progress, and has a few potential issues around tracking of wantlist refcounts in failure modes (if peers don't have any of the things we want and we re-broadcast, it increases the wantlist refcounts unnecessarily), but it generally works quite well and makes bitswap faster and have much less bandwidth overhead. In cases of transferring large files just between one or two peers, i've seen over a 2x speed improvement.

@whyrusleeping whyrusleeping added the status/in-progress In progress label Apr 17, 2017
@whyrusleeping whyrusleeping requested review from Kubuxu, a user, hsanjuan and kevina April 17, 2017 17:33
@whyrusleeping
Copy link
Member Author

Review of the code and algorithms involved are wanted, but also desired are test suggestions

@kevina
Copy link
Contributor

kevina commented Apr 17, 2017

@whyrusleeping, due to the move it may be a week or so before I can get to this.

@whyrusleeping
Copy link
Member Author

@kevina no worries, i'm a bit out of it this week too.

@hsanjuan hsanjuan self-assigned this Apr 21, 2017
@hsanjuan
Copy link
Contributor

Assigned myself so I remember to look at this

@whyrusleeping whyrusleeping mentioned this pull request Apr 25, 2017
@whyrusleeping
Copy link
Member Author

next steps:

Have wantlists not just be refcounted, instead, actually track which session IDs are actively requesting them. This way we avoid weirdness when one session says 'want X' multiple times and only cancels once.

@whyrusleeping whyrusleeping changed the title WIP: Bitswap sessions Bitswap sessions Apr 28, 2017
@whyrusleeping
Copy link
Member Author

This is less of a WIP now, i've got the problem i was worried about resolved. please review!

@eminence
Copy link
Contributor

eminence commented May 1, 2017

Currently running f39ac4e and encountered some things of note while trying to run ipfs get on a large 60GB dataset.

First issue was an exceedingly large number of "too many open files" messages. See log here.

After restarting the daemon, this problem seemed to go away. I'm see far fewer of these messages. However, I observed that my bitswap wantlist length kept growing.

So I killed the ipfs get, and an hour later, I still have a large wantlist. Its length hasn't gone down at all:

achin@bigbox ~/devel/gopath/src/github.com/ipfs/go-ipfs $ env IPFS_PATH=/nas/achin/ipfs_tmp ipfs bitswap wantlist |wc -l
3134

I have not yet tried these same steps with the latest master to compare results.

@eminence
Copy link
Contributor

eminence commented May 3, 2017

I just got around to testing this scenario with the latest master.

Using current master branch (f44ff30 at the time of this writing):

ipfs stats bw reports a "RateOut" that seems to be between 350kB/s and 450 kB/s. (As measured by running this command about 30 times and observing the result). The "RateIn" was roughly between 1.3MB/s and 1.8MB/s. No wantlist blowups as described in my previous comment.

Using your latest commit (1eca7bd):

"RateOut" seems to be between 110 kB/s and 190 kB/s (while "RateIn" was between 2.2 MB/s and 2.4 MB/s). No wnatlist blowups either.

So there seems to be clear improvements to amount of outgoing bandwidth.

Other observations:

After downloading about 600MB of a 61GB dataset, I killed the ipfs get command. My wantlist length drops to zero, but I'm still receiving data. ipfs stats bw reports a substantial RateIn, and a bunch of dup data received. This didn't happen in the master branch.

achin@bigbox ~ $ env IPFS_PATH=/nas/achin/ipfs_tmp ipfs stats bw                                                                                                                           Bandwidth
TotalIn: 3.9 GB
TotalOut: 294 MB
RateIn: 1.7 MB/s
RateOut: 232 kB/s
achin@bigbox ~ $ env IPFS_PATH=/nas/achin/ipfs_tmp ipfs stats bitswap
bitswap status
        provides buffer: 0 / 256
        blocks received: 13084                                                                                                                                                                     blocks sent: 0
        data received: 2748817430
        data sent: 0
        dup blocks received: 8401
        dup data received: 2.1 GB
        wantlist [0 keys]
        partners [31]

@whyrusleeping
Copy link
Member Author

whyrusleeping commented May 3, 2017

@eminence Hey, wanna try again? I pushed a few fixes. Thanks for trying it out :) (also added new bitswap tests, apparently we didnt check that things were cleared from your wantlist after you fetched them)

@whyrusleeping
Copy link
Member Author

Whoops, posted my comment without refreshing the page. Thats great results, thanks again for the feedback :) I'll keep tweaking things and see if i can't fix that residual dup blocks issue

@whyrusleeping
Copy link
Member Author

@eminence also, ipfs stats bw has a --poll option so you don't need to keep rerunning it

@whyrusleeping
Copy link
Member Author

@eminence further questions, how long does the extra traffic continue for after the ipfs get is completed?

@eminence
Copy link
Contributor

eminence commented May 3, 2017

how long does the extra traffic continue for after the ipfs get is completed?

I'm not sure. I didn't let the ipfs-get complete, but rather in interrupted it with a ctrl-c (I'm not sure if this is an important distinction). I let things run for about 10ish minutes before killing the daemon.

@whyrusleeping
Copy link
Member Author

whyrusleeping commented May 3, 2017 via email

@eminence
Copy link
Contributor

eminence commented May 3, 2017

Looks much better. After killing the ipfs get, the download rate fell to about 10 kB/S within 45 seconds.

@whyrusleeping
Copy link
Member Author

Alright, last open question here:

When a new peer connects, do we still send them our entire wantlist?

@whyrusleeping
Copy link
Member Author

my answer:
Keep a list of all wanted keys that were 'broadcasted' (i.e. sent to all peers) and send that wantlist to newly connected peers.

defer bs.sessIDLk.Unlock()
bs.sessID++
return bs.sessID
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use an atomic.

However, is there any reason this uses session IDs instead of just pointers to Session structs? Are you worried about these IDs getting stuck somewhere preventing Sessions from being freed (I get an itch when I see ints being used as pointers)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing it this way because other bits of code need to be able to reference which sessions things belong to. Otherwise we might end up with circular dependencies since bitswap imports wantlists and wantlists reference sessions IDs.

newReqs chan []*cid.Cid
cancelKeys chan []*cid.Cid

interest *lru.Cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be an LRU Cache? If this overflows, won't we end up hanging forever waiting for our block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this is used to track blocks that are relevant to this session, so responses from other sessions can inform us of peers. Even after we've received a block, it remains in the interest cache, because receiving a block from peer X a different session may mean that peer X also has other blocks our session wants.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, nice! But there's still the problem of waiting forever if the block gets evicted from the interest cache before we receive it (I'll comment on that line to point it out to you).

if toadd > len(keys) {
toadd = len(keys)
}
s.liveCnt += toadd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if some of these keys are already in liveWants?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed liveCnt

}
}(live[0])
}
s.resetTick()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As-is, I believe this will trigger concurrent provider searches on a slow network (which will make the provider searches even slower). Somewhere in the stack, we should insert a proxy to prevent concurrent provider searches for the same block.

s.bs.wm.WantBlocks(ctx, ks, s.activePeersArr, s.id)
}

func (s *Session) cancel(keys []*cid.Cid) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing from the interest set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, see other comment about the point of the interest cache

// (why): in reality, i think this command should be removed. Its
// messing with the internal state of bitswap. You should cancel wants
// by killing the command that caused the want.
bs.CancelWants(ks, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does canceling the context not cancel the associated wants?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, cancelling the context of any request command cancels the associated wants. So this command really should just get removed, it has no real purpose


interest *lru.Cache
liveWants map[string]time.Time
liveCnt int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just call len(liveWants)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm... good point. I think this bit is a relic from before i kept a map of each want

return
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure every session should have it's own event loop. For one, it makes it hard to avoid unnecessary work when fetching subtrees common to multiple sessions. On the other hand, having a per-session event loop does make fairness simpler.

WARNING: Future-proofing ahead.

Additionally, Im not sure if sessions should be so... first-class(?). Personally, I see them as metadata that can help us find blocks but hardly the only metadata. With a separate event loop for every session, adding block-level predictions that are unrelated to sessions (e.g., some of the ones I was talking about in the issue) could be a bit awkward.

Basically, in the future, I expect we'll build a prediction engine that:

  • Accepts information about:
    • Who probably has a block (input from: gossip, provider searches, command-line arguments).
    • Who probably doesn't have a block (input from: gossip, recently failed requests).
    • Who gave us a block (and on which session we received it).
  • Given a CID, provides a stream of peer suggestions in the following order:
    1. Peers it believes have the block in question (learned through the information given to the prediction engine above).
    2. Peers in the current session.
    3. (maybe) Peers in any related sessions (sessions where parent blocks were found)
    4. Everyone (triggers a provider search).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For one, it makes it hard to avoid unnecessary work when fetching subtrees common to multiple sessions.

For now, i'm operating under the assumption that this won't be a super common scenario. And even in cases where it does happen, the 'interested' cache logic helps out with cross-session interplay pretty decently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "interested" cache will ensure that all interested parties receive the block but won't prevent us from spamming our wantlist more widely than necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what the "interested" cache does now; it ensures that intersecting sessions learn about each others peers.

However, given predictor above, there's still a period where we really don't need to ask the entire session for a block because we're pretty sure we know which peers are likely to have it. That is, given the existing system, we could just do what you're doing now with FindProvidersAsync and add peers known to have blocks to the session but that still means that we'll need to spam the entire session even if we know which specific peers may have the block.

Basically, the current architecture shrinks our default "ask set" from everyone to the peers in the session however, it makes it difficult to shrink this further when we have more information about where to find a block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should also add overlapping FindProvider peers to the sessions in a similar way, i like that idea, though doing that wouldnt necessarily allow us to remove this code (since its possible for us to get information from a peer that was not received by FindProviders).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I just think it may be worth waiting a few ms before spamming the entire session if we already know of some potential providers (i.e., we don't have to ask the DHT).

I still think the current session-centric architecture may make this a bit difficult but it'll probably be easier to just fix it later than to try to hash out all the issues up-front. For all I know, I'm totally wrong and arguing about a non-existent issue.

@whyrusleeping whyrusleeping force-pushed the feat/bitswap-sessions branch 2 times, most recently from 0c9dac0 to fca4ae7 Compare July 4, 2017 16:25
@@ -375,6 +373,10 @@ func (bs *Bitswap) ReceiveMessage(ctx context.Context, p peer.ID, incoming bsmsg
k := b.Cid()
log.Event(ctx, "Bitswap.GetBlockRequest.End", k)

for _, ses := range bs.SessionsForBlock(k) {
ses.receiveBlockFrom(p, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm mistaken, this is the only place that feeds blocks into the session. If a block gets evicted from the interested set before we find it, we'll never forward it to the session.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh, good catch. the 'InterestedIn' check should also check the liveWants map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what happens if the user adds the block to the local blockstore via the commandline while we're trying to fetch it from the network?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All blocks added to the datastore get added through either the blockservice or the dagservice (which wraps the blockservice). The blockservice calls "exchange.HasBlock" for each block being added to it, then the blocks get pushed through the notifications channel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that a bit racy? That is, what happens if the user adds the block after the call to blockstore.Get but before the call to exchange.GetBlock (here)?

I'm asking because there's also race condition in this PR when a block is neither in the interest cache (we're interested in too many blocks) nor in the liveWants set (we haven't started fetching it yet) but gets fetched by another session (or manually added by the user). In this case, we'll try to fetch the block from the network anyways even if we already have it.

So, I'm wondering if it's worth adding a notification framework to Blockstores where one can wait for blocks to be added. An alternative is to bitswap with ourself (but that seems kind of hacky).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That race condition exists in the code today

The first one already exists but the second one is a bit different and has a much larger window. To reproduce:

  1. Try fetching 3000 nodes that can't be found on the network.
  2. Manually add 100th node.
  3. Manually add the rest.

The operation will never complete because, when added, the 100th node was neither in the interest set nor in the liveWants set. The quick fix is to add an allWants set and use that instead of of checking liveWants and interest.


However, all these race conditions like this could be fixed by some centralized notification system (doesn't have to be built into the blockstore).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that a notification system would actually eliminate race conditions like this, it seems very tricky to do without taking a lock around all 'put block' calls that gets released once the notification has finished propogating. Will think about this more after some sleep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be fine as long as we always notify when putting a block into the blockstore. Yes, we may end up starting a bitswap search for a block that you just got but we can cancel it immediately and don't have to wait for it to finish. My primary worry here is that the bitswap search may never finish if the block was retrieved on some side-channel and can't be found on the network.

The way I see this working is as follows:

BlockService.GetBlock will:

  1. Try the blockstore first. Else...
  2. Put a request into a "want" channel.
  3. Subscribe(cid) on the internal notification system.
  4. Call blockstore.Get again in case it was delivered between 1 and 3.
  5. Wait on the channel returned by Subscribe.

BlockService.AddBlock will:

  1. Put the block.
  2. Publish on the notification system.

The main event loop for the BlockService will:

  • Wait for requests on the want channel:
    1. Throw away duplicate want requests.
    2. Record that we're looking for the block.
    3. Asynchronously:
    4. Ask the exchange to find the block.
    5. Send the response back on an exchangeResponse channel.
  • Wait on exchangeResponse channel.
    1. Record that we're done looking for the block.
    2. Call BlockService.AddBlock(...) (triggering the notification system).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note: this is very much "future work" in case that wasn't clear.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien I pushed a fix that at least doesnt lose track of blocks that may be added locally, and left some TODO notes for future work. I think that covers the major outstanding concerns, there will definitely be followup PRs, but this one should be good to go

@@ -31,6 +32,7 @@ type BlockService interface {
GetBlock(ctx context.Context, c *cid.Cid) (blocks.Block, error)
GetBlocks(ctx context.Context, ks []*cid.Cid) <-chan blocks.Block
DeleteBlock(o blocks.Block) error
NewSession(context.Context) *Session
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a public interface and Session has private fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really didnt feel like making yet another only-implemented-once interface.

Copy link
Member

@Stebalien Stebalien Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this just be a helper function?

func NewSession(ctx context.Context, bs BlockService) *Session {
	exchange := bs.Exchange()
	if bswap, ok := exchange.(*bitswap.Bitswap); ok {
		ses := bswap.NewSession(ctx)
		return &Session{
			ses: ses,
			bs:  bs.Blockstore(),
		}
	}
	return &Session{
		ses: exchange,
		bs:  bs.Blockstore(),
	}
}

(we could also make this non-bitswap specific by adding a NewSession(context.Context) exchange.Interface method to exchange.Interface but we don't need that now).

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
var f exchange.Fetcher
if s.exchange != nil {
f = s.exchange
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed, can't we just pass s.exchange?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, because then the response wouldnt be nil (it would be a typed nil, which in interface form is != nil)

func (bs *Bitswap) ReceiveMessage(ctx context.Context, p peer.ID, incoming bsmsg.BitSwapMessage) {
atomic.AddUint64(&bs.messagesRecvd, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomic will only work if address is 8 bytes, aliased. This might not be the case if there are other variables in the structure, I suggest allocating separate structure for all metrics and keeping all variable that will be used atomically as (u)int64 and at the front.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think thats something we actually have to worry about. If that were the case, then i'm quite certain there would be fairly significant warnings around the calls in the docs: https://golang.org/pkg/sync/atomic/#AddUint64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, how does that fix look?

// way to avoid adding a more complex set of mutexes around the liveWants map.
// note that in the average case (where this session *is* interested in the
// block we received) this function will not be called, as the cid will likely
// still be in the interest cache.
Copy link
Member

@Kubuxu Kubuxu Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go 1.9 has concurrent maps which will resolve this issue altogether.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that will be useful

case lwchk := <-s.interestReqs:
_, ok := s.liveWants[lwchk.c.KeyString()]
lwchk.resp <- ok
case <-ctx.Done():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done should close the ticker, otherwise it will keep the instance forever.

blocksutil "github.com/ipfs/go-ipfs/blocks/blocksutil"
blocks "gx/ipfs/QmXxGS5QsUxpR3iqL5DjmsYPHR1Yz74siRQ4ChJqWFosMh/go-block-format"

cid "gx/ipfs/Qma4RJSuh7mMeJQYCqMbKzekn6EwBo7HEs5AQYjVRMQATB/go-cid"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
@whyrusleeping
Copy link
Member Author

@Kubuxu addressed those concerns

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
whyrusleeping and others added 2 commits July 7, 2017 11:40
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protonmail.ch>
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@whyrusleeping whyrusleeping merged commit 0784a85 into master Jul 7, 2017
@whyrusleeping whyrusleeping deleted the feat/bitswap-sessions branch July 7, 2017 19:38
@whyrusleeping whyrusleeping removed the status/in-progress In progress label Jul 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants