-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically restart push channel #127
Conversation
21a9ac3
to
28e8539
Compare
28e8539
to
6e17350
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something you should be aware of:
Graphsync has it's own notion of backpressure -- it will stop queuing when a certain among of memory has been allocated but not sent. We built this to resolve graphsync hogging memory if the network slowed down. So, that means, for this to work as a detection mechanism, the value of the diff between queued & sent must be lower than the backpressure default in graphsync. (I believe 16MB per peer). There aren't any defaults listed here, but I think you should inspect the Graphsync defaults: https://github.com/ipfs/go-graphsync/blob/master/impl/graphsync.go#L32, and your defaults which I assume are in the PR on markets, plus make sure Lotus isn't overriding the graphsync defaults (which would not surprise me)
Follow-up: One thing just to factor for the future is that while retrievals seem to be almost no one's priority, we're gonna have to deal with this problem eventually for Pulls too. And, while the ipfs/go-graphsync#129 might help, we may also want to do some kind of keep alive in go-graphsync? I guess maybe go-yamux does that already. |
@hannahhoward for pulls I'm not sure we need this mechanism, I think it may make more sense to rely on go-graphsync itself to do retries. go-graphsync has some retry capability but it would be nice if it worked the same way as the retries in go-fil-markets and go-data-transfer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- just keep an eye out for the back pressure issue when you get to LOTUS
90e701e
to
c6c9ac2
Compare
I realized that the data-rate monitoring mechanism I implemented was not very granular. If for example the pending data suddenly increased right before checking the data rate, it would appear that the data rate was too low and the monitor would trigger a channel restart. The latest commit instead adds functionality such that:
|
c6c9ac2
to
b2945c4
Compare
The new algorithm looks correct to me. |
Implementation of auto-restart behaviour described in filecoin-project/go-fil-markets#463 (comment)
If a "push" data transfer channel stalls while transferring data, attempt to reconnect to the other party and send a "restart" request for the channel.
Note that the backoff behaviour on dial already exists in the network layer.
This PR adds a
pushChannelMonitor
to data-transfermanager
. Each time the data-transfermanager
opens a "push" data transfer channel, it adds the channel ID to the monitor.Graphsync queues up data to be sent. The
pushChannelMonitor
periodically checks the amount of data queued against the amount sent. If the amount of pending data (queued - sent), is greater than the configured minimum amount over the configured interval (eg 1MB over 1s), thepushChannelMonitor
assumes the transfer has stalled and attempts to send a "restart" request.