Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Should the client repeat the broadcast block request? #65

Closed
bruinxs opened this issue Jan 31, 2019 · 6 comments · Fixed by #106
Closed

Should the client repeat the broadcast block request? #65

bruinxs opened this issue Jan 31, 2019 · 6 comments · Fixed by #106
Assignees

Comments

@bruinxs
Copy link

bruinxs commented Jan 31, 2019

scene

I have two nodes, which are in different LANs. They use a server on the public network for port forwarding, so that the two nodes are connected. The connection network bandwidth is about 5M and it is not very stable.

The A node has a resource of about 200M. When the Node B obtains the resource through the ipfs gateway, it will block on some data blocks and cannot obtain data.

The wantlist of node A is always hold hash of these data blocks, and the ledger of node B also has wantlist for node A, indicating that node B has received all data block requests, but for some reason The Node B does not send the data block to the A node, causing the A node to wait for the response of these data blocks.

If the A node can repeat the request to broadcast these data blocks, the B node can respond correctly when it receives the data block request again.

I am using ipfs private network

ipfs version: 
go-ipfs version: 0.4.18-
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1

my point of view

The client should repeatedly broadcast long waiting data block requests. In order to solve the problem that the nearby nodes have resources but cannot be quickly obtained.

@Stebalien
Copy link
Member

The Node B does not send the data block to the A node, causing the A node to wait for the response of these data blocks.

This sounds like a bug. It could be related to #51 but I'm not sure.

However, in general, I agree. We may want to occasionally rebroadcast existing wantlists to all peers (we used to but I'm not sure if we do anymore). Ideally, we wouldn't have to but bugs happen.

@Stebalien
Copy link
Member

So, you're saying you have two connected nodes and one isn't giving data to the other?

@hannahhoward
Copy link
Contributor

Yea that makes sense. I'm not sure it'd be #51 cause it sounds like node B and node A are connected over a long period of time. But that is still of concern

Also though I really want to get BS latest published, cause I can't even tell if this is with sessions code or the 0.4.18 BS, which didn't actually use sessions at all.

@bruinxs
Copy link
Author

bruinxs commented Feb 11, 2019

So, you're saying you have two connected nodes and one isn't giving data to the other?

@Stebalien yes, the process of sending data is blocked on several data blocks, and each time it is reproduced, it is caused by different data blocks. The corresponding wantlist of the two nodes is also the same.

I added the code to rebroadcast the blocking block and I was finally able to get the resources.

The test code is as follows:
//workers.go
func (bs *Bitswap) rebroadcastWorker(parent context.Context) {

    ...
    
	lastwl := struct {
		count  int
		blocks map[cid.Cid]struct{}
	}{}

	for {
		log.Event(ctx, "Bitswap.Rebroadcast.idle")
		select {
		case <-tick.C:
			n := bs.wm.wl.Len()
			if n > 0 {
				log.Debug(n, " keys in bitswap wantlist")
			}

			if n == 0 {
				break
			}

			snapshoot := lastwl
			lastwl.count = n
			lastwl.blocks = make(map[cid.Cid]struct{}, n)

			for _, entry := range bs.wm.wl.Entries() {
				lastwl.blocks[entry.Cid] = struct{}{}
			}

			if snapshoot.count != n {
				break
			}

			diff := false
			for c := range snapshoot.blocks {
				if _, ok := lastwl.blocks[c]; !ok {
					diff = true
					break
				}
			}
			if diff {
				break
			}

			msg := bsmsg.New(false)
			for _, entry := range bs.wm.wl.Entries() {
				if !entry.Trash {
					msg.AddEntry(entry.Cid, entry.Priority)
				}
			}
			for p, q := range bs.wm.peers {
				q.resendMessage(msg)
				log.Warningf("rebroadcast wantlist to peer %s", p.Pretty())
			}

        ...

		}
	}
}
//wantmanager.go
func (mq *msgQueue) resendMessage(msg bsmsg.BitSwapMessage) {
	var work bool
	mq.outlk.Lock()
	defer func() {
		mq.outlk.Unlock()
		if !work {
			return
		}
		select {
		case mq.work <- struct{}{}:
		default:
		}
	}()

	if mq.out == nil {
		mq.out = bsmsg.New(false)
	}

	for _, e := range msg.Wantlist() {
		if !e.Cancel {
			work = true
			mq.out.AddEntry(e.Cid, e.Priority)
		}
	}
}

@bruinxs
Copy link
Author

bruinxs commented Feb 11, 2019

@hannahhoward I don't think it's because #51, these two nodes are already connected.

The go-bitswap gx version I am using is QmNkxFCmPtr2RQxjZNRCNryLud4L9wMEiBJsLgF14MqTHj, and his code looks like this:

// Connected/Disconnected warns bitswap about peer connections
func (bs *Bitswap) PeerConnected(p peer.ID) {
	bs.wm.Connected(p)
	bs.engine.PeerConnected(p)
}
func (pm *WantManager) startPeerHandler(p peer.ID) *msgQueue {
	mq, ok := pm.peers[p]
	if ok {
		mq.refcnt++
		return nil
	}

	mq = pm.newMsgQueue(p)

	// new peer, we will want to give them our full wantlist
	fullwantlist := bsmsg.New(true)
	for _, e := range pm.bcwl.Entries() {
		for k := range e.SesTrk {
			mq.wl.AddEntry(e, k)
		}
		fullwantlist.AddEntry(e.Cid, e.Priority)
	}
	mq.out = fullwantlist
	mq.work <- struct{}{}

	pm.peers[p] = mq
	go mq.runQueue(pm.ctx)
	return mq
}

@bruinxs
Copy link
Author

bruinxs commented Feb 11, 2019

According to the following code, I think that even if the server node has the corresponding resources, it will cause the transmission failure due to the network in response to the data block, such as session shutdown, i/o deadline reached, etc.

go-bitswap/workers.go

Lines 107 to 110 in 916de59

err := bs.network.SendMessage(ctx, env.Peer, msg)
if err != nil {
log.Infof("sendblock error: %s", err)
}

I think it is necessary to detect blocked data block requests and rebroadcast them when they are idle.
1, there is no new data block request for a short period of time and no reception is received, it is idle.
2, the data block request in wantlist has no response, it should be re-broadcast wantlist every once in a while.

hannahhoward added a commit that referenced this issue Apr 4, 2019
Provide a failsafe to losing wants on other end by rebroadcasting a wantlist every thirty seconds

fix #99, fix #65
@ghost ghost assigned hannahhoward Apr 4, 2019
@ghost ghost added the status/in-progress In progress label Apr 4, 2019
@ghost ghost removed the status/in-progress In progress label Apr 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants