Automatically restart push channel #127

dirkmc · 2020-12-11T15:33:45Z

Implementation of auto-restart behaviour described in filecoin-project/go-fil-markets#463 (comment)

If a "push" data transfer channel stalls while transferring data, attempt to reconnect to the other party and send a "restart" request for the channel.

Note that the backoff behaviour on dial already exists in the network layer.

This PR adds a pushChannelMonitor to data-transfer manager. Each time the data-transfer manager opens a "push" data transfer channel, it adds the channel ID to the monitor.

Graphsync queues up data to be sent. The pushChannelMonitor periodically checks the amount of data queued against the amount sent. If the amount of pending data (queued - sent), is greater than the configured minimum amount over the configured interval (eg 1MB over 1s), the pushChannelMonitor assumes the transfer has stalled and attempts to send a "restart" request.

impl/pushchannelmonitor.go

impl/integration_test.go

nonsense

lgtm

hannahhoward

Something you should be aware of:
Graphsync has it's own notion of backpressure -- it will stop queuing when a certain among of memory has been allocated but not sent. We built this to resolve graphsync hogging memory if the network slowed down. So, that means, for this to work as a detection mechanism, the value of the diff between queued & sent must be lower than the backpressure default in graphsync. (I believe 16MB per peer). There aren't any defaults listed here, but I think you should inspect the Graphsync defaults: https://github.com/ipfs/go-graphsync/blob/master/impl/graphsync.go#L32, and your defaults which I assume are in the PR on markets, plus make sure Lotus isn't overriding the graphsync defaults (which would not surprise me)

impl/pushchannelmonitor.go

hannahhoward · 2020-12-15T16:07:53Z

Follow-up: One thing just to factor for the future is that while retrievals seem to be almost no one's priority, we're gonna have to deal with this problem eventually for Pulls too. And, while the ipfs/go-graphsync#129 might help, we may also want to do some kind of keep alive in go-graphsync? I guess maybe go-yamux does that already.

dirkmc · 2020-12-15T16:58:12Z

@hannahhoward for pulls I'm not sure we need this mechanism, I think it may make more sense to rely on go-graphsync itself to do retries. go-graphsync has some retry capability but it would be nice if it worked the same way as the retries in go-fil-markets and go-data-transfer

hannahhoward

LGTM -- just keep an eye out for the back pressure issue when you get to LOTUS

pushchannelmonitor/pushchannelmonitor.go

dirkmc · 2020-12-16T13:31:01Z

I realized that the data-rate monitoring mechanism I implemented was not very granular. If for example the pending data suddenly increased right before checking the data rate, it would appear that the data rate was too low and the monitor would trigger a channel restart.

The latest commit instead adds functionality such that:

the data-rate is checked and recorded multiple (configurable) times per interval, eg 10
each check compares the sent amount between the start and end of the interval against the minimum required data rate

nonsense · 2020-12-16T13:58:23Z

The new algorithm looks correct to me.

feat: latest go-graphsync

c03054f

dirkmc mentioned this pull request Dec 14, 2020

Verified Deals not transferring & unable to cancel in data-transfers filecoin-project/lotus#5185

Closed

dirkmc force-pushed the feat/push-auto-retry branch 2 times, most recently from 21a9ac3 to 28e8539 Compare December 14, 2020 14:11

dirkmc requested review from hannahhoward and nonsense December 14, 2020 14:17

dirkmc mentioned this pull request Dec 14, 2020

on restart miner shouldn't dial client filecoin-project/go-fil-markets#463

Merged

1 task

dirkmc marked this pull request as ready for review December 14, 2020 15:23

feat: auto-restart connection for push data channels

6e17350

dirkmc force-pushed the feat/push-auto-retry branch from 28e8539 to 6e17350 Compare December 15, 2020 14:52

nonsense reviewed Dec 15, 2020

View reviewed changes

impl/pushchannelmonitor.go Outdated Show resolved Hide resolved

nonsense reviewed Dec 15, 2020

View reviewed changes

impl/integration_test.go Outdated Show resolved Hide resolved

nonsense approved these changes Dec 15, 2020

View reviewed changes

hannahhoward reviewed Dec 15, 2020

View reviewed changes

impl/pushchannelmonitor.go Outdated Show resolved Hide resolved

impl/pushchannelmonitor.go Outdated Show resolved Hide resolved

refactor: simplify push channel monitor config

4939195

dirkmc requested review from hannahhoward and nonsense December 15, 2020 16:58

hannahhoward approved these changes Dec 15, 2020

View reviewed changes

nonsense reviewed Dec 15, 2020

View reviewed changes

pushchannelmonitor/pushchannelmonitor.go Outdated Show resolved Hide resolved

dirkmc force-pushed the feat/push-auto-retry branch from 90e701e to c6c9ac2 Compare December 16, 2020 12:05

fix: more granular interval checking of data rates

b2945c4

dirkmc force-pushed the feat/push-auto-retry branch from c6c9ac2 to b2945c4 Compare December 16, 2020 13:31

dirkmc requested review from nonsense and hannahhoward December 16, 2020 13:33

nonsense approved these changes Dec 16, 2020

View reviewed changes

refactor: simplify push channel monitor naming

0225513

dirkmc merged commit 288413b into master Dec 16, 2020

dirkmc deleted the feat/push-auto-retry branch December 16, 2020 14:09

This was referenced Dec 16, 2020

release: v1.2.4 #128

Merged

release: v1.0.11 filecoin-project/go-fil-markets#468

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically restart push channel #127

Automatically restart push channel #127

dirkmc commented Dec 11, 2020

nonsense left a comment

hannahhoward left a comment

hannahhoward commented Dec 15, 2020

dirkmc commented Dec 15, 2020

hannahhoward left a comment

dirkmc commented Dec 16, 2020 •

edited

Loading

nonsense commented Dec 16, 2020

Automatically restart push channel #127

Automatically restart push channel #127

Conversation

dirkmc commented Dec 11, 2020

nonsense left a comment

Choose a reason for hiding this comment

hannahhoward left a comment

Choose a reason for hiding this comment

hannahhoward commented Dec 15, 2020

dirkmc commented Dec 15, 2020

hannahhoward left a comment

Choose a reason for hiding this comment

dirkmc commented Dec 16, 2020 • edited Loading

nonsense commented Dec 16, 2020

dirkmc commented Dec 16, 2020 •

edited

Loading