Should the client repeat the broadcast block request? #65

bruinxs · 2019-01-31T06:03:28Z

scene

I have two nodes, which are in different LANs. They use a server on the public network for port forwarding, so that the two nodes are connected. The connection network bandwidth is about 5M and it is not very stable.

The A node has a resource of about 200M. When the Node B obtains the resource through the ipfs gateway, it will block on some data blocks and cannot obtain data.

The wantlist of node A is always hold hash of these data blocks, and the ledger of node B also has wantlist for node A, indicating that node B has received all data block requests, but for some reason The Node B does not send the data block to the A node, causing the A node to wait for the response of these data blocks.

If the A node can repeat the request to broadcast these data blocks, the B node can respond correctly when it receives the data block request again.

I am using ipfs private network

ipfs version: 
go-ipfs version: 0.4.18-
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1

my point of view

The client should repeatedly broadcast long waiting data block requests. In order to solve the problem that the nearby nodes have resources but cannot be quickly obtained.

The text was updated successfully, but these errors were encountered:

Stebalien · 2019-02-06T00:46:03Z

The Node B does not send the data block to the A node, causing the A node to wait for the response of these data blocks.

This sounds like a bug. It could be related to #51 but I'm not sure.

However, in general, I agree. We may want to occasionally rebroadcast existing wantlists to all peers (we used to but I'm not sure if we do anymore). Ideally, we wouldn't have to but bugs happen.

Stebalien · 2019-02-06T00:46:20Z

So, you're saying you have two connected nodes and one isn't giving data to the other?

hannahhoward · 2019-02-06T00:57:29Z

Yea that makes sense. I'm not sure it'd be #51 cause it sounds like node B and node A are connected over a long period of time. But that is still of concern

Also though I really want to get BS latest published, cause I can't even tell if this is with sessions code or the 0.4.18 BS, which didn't actually use sessions at all.

bruinxs · 2019-02-11T02:39:35Z

So, you're saying you have two connected nodes and one isn't giving data to the other?

@Stebalien yes, the process of sending data is blocked on several data blocks, and each time it is reproduced, it is caused by different data blocks. The corresponding wantlist of the two nodes is also the same.

I added the code to rebroadcast the blocking block and I was finally able to get the resources.

The test code is as follows:

//workers.go
func (bs *Bitswap) rebroadcastWorker(parent context.Context) {

    ...
    
	lastwl := struct {
		count  int
		blocks map[cid.Cid]struct{}
	}{}

	for {
		log.Event(ctx, "Bitswap.Rebroadcast.idle")
		select {
		case <-tick.C:
			n := bs.wm.wl.Len()
			if n > 0 {
				log.Debug(n, " keys in bitswap wantlist")
			}

			if n == 0 {
				break
			}

			snapshoot := lastwl
			lastwl.count = n
			lastwl.blocks = make(map[cid.Cid]struct{}, n)

			for _, entry := range bs.wm.wl.Entries() {
				lastwl.blocks[entry.Cid] = struct{}{}
			}

			if snapshoot.count != n {
				break
			}

			diff := false
			for c := range snapshoot.blocks {
				if _, ok := lastwl.blocks[c]; !ok {
					diff = true
					break
				}
			}
			if diff {
				break
			}

			msg := bsmsg.New(false)
			for _, entry := range bs.wm.wl.Entries() {
				if !entry.Trash {
					msg.AddEntry(entry.Cid, entry.Priority)
				}
			}
			for p, q := range bs.wm.peers {
				q.resendMessage(msg)
				log.Warningf("rebroadcast wantlist to peer %s", p.Pretty())
			}

        ...

		}
	}
}

//wantmanager.go
func (mq *msgQueue) resendMessage(msg bsmsg.BitSwapMessage) {
	var work bool
	mq.outlk.Lock()
	defer func() {
		mq.outlk.Unlock()
		if !work {
			return
		}
		select {
		case mq.work <- struct{}{}:
		default:
		}
	}()

	if mq.out == nil {
		mq.out = bsmsg.New(false)
	}

	for _, e := range msg.Wantlist() {
		if !e.Cancel {
			work = true
			mq.out.AddEntry(e.Cid, e.Priority)
		}
	}
}

bruinxs · 2019-02-11T02:47:12Z

@hannahhoward I don't think it's because #51, these two nodes are already connected.

The go-bitswap gx version I am using is QmNkxFCmPtr2RQxjZNRCNryLud4L9wMEiBJsLgF14MqTHj, and his code looks like this:

// Connected/Disconnected warns bitswap about peer connections
func (bs *Bitswap) PeerConnected(p peer.ID) {
	bs.wm.Connected(p)
	bs.engine.PeerConnected(p)
}

func (pm *WantManager) startPeerHandler(p peer.ID) *msgQueue {
	mq, ok := pm.peers[p]
	if ok {
		mq.refcnt++
		return nil
	}

	mq = pm.newMsgQueue(p)

	// new peer, we will want to give them our full wantlist
	fullwantlist := bsmsg.New(true)
	for _, e := range pm.bcwl.Entries() {
		for k := range e.SesTrk {
			mq.wl.AddEntry(e, k)
		}
		fullwantlist.AddEntry(e.Cid, e.Priority)
	}
	mq.out = fullwantlist
	mq.work <- struct{}{}

	pm.peers[p] = mq
	go mq.runQueue(pm.ctx)
	return mq
}

bruinxs · 2019-02-11T10:24:05Z

According to the following code, I think that even if the server node has the corresponding resources, it will cause the transmission failure due to the network in response to the data block, such as session shutdown, i/o deadline reached, etc.

go-bitswap/workers.go

Lines 107 to 110 in 916de59

    
           err := bs.network.SendMessage(ctx, env.Peer, msg) 
        
           if err != nil { 
        
           	log.Infof("sendblock error: %s", err) 
        
           }

I think it is necessary to detect blocked data block requests and rebroadcast them when they are idle.
1, there is no new data block request for a short period of time and no reception is received, it is idle.
2, the data block request in wantlist has no response, it should be re-broadcast wantlist every once in a while.

Provide a failsafe to losing wants on other end by rebroadcasting a wantlist every thirty seconds fix #99, fix #65

bruinxs mentioned this issue Mar 14, 2019

Fail to get block from connected peer #99

Closed

hannahhoward added a commit that referenced this issue Apr 4, 2019

feat(messagequeue): rebroadcast wantlist

076f709

Provide a failsafe to losing wants on other end by rebroadcasting a wantlist every thirty seconds fix #99, fix #65

hannahhoward mentioned this issue Apr 4, 2019

feat(messagequeue): rebroadcast wantlist #106

Merged

ghost assigned hannahhoward Apr 4, 2019

ghost added the status/in-progress In progress label Apr 4, 2019

Stebalien closed this as completed in #106 Apr 10, 2019

ghost removed the status/in-progress In progress label Apr 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the client repeat the broadcast block request? #65

Should the client repeat the broadcast block request? #65

bruinxs commented Jan 31, 2019 •

edited

Loading

Stebalien commented Feb 6, 2019

Stebalien commented Feb 6, 2019

hannahhoward commented Feb 6, 2019

bruinxs commented Feb 11, 2019 •

edited

Loading

bruinxs commented Feb 11, 2019

bruinxs commented Feb 11, 2019

Should the client repeat the broadcast block request? #65

Should the client repeat the broadcast block request? #65

Comments

bruinxs commented Jan 31, 2019 • edited Loading

scene

my point of view

Stebalien commented Feb 6, 2019

Stebalien commented Feb 6, 2019

hannahhoward commented Feb 6, 2019

bruinxs commented Feb 11, 2019 • edited Loading

bruinxs commented Feb 11, 2019

bruinxs commented Feb 11, 2019

bruinxs commented Jan 31, 2019 •

edited

Loading

bruinxs commented Feb 11, 2019 •

edited

Loading