-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support timeouts on Request/Response protocol level #1345
Comments
xgreenx
added a commit
that referenced
this issue
Jan 6, 2024
For some reason we had a two layers of serialization for request/response messages. This doesn't seem useful at all, and complicates e.g. error handling. This PR removes the extra layer, substantially simplifying that logic. One major upside of this is that #1345 and #1350 can now be solved in a single follow-up PR. ~~Hopefully this doesn't conflict too much with the ongoing libp2p update PR #1379.~~ --------- Co-authored-by: Green Baneling <XgreenX9999@gmail.com>
3 tasks
Dentosal
added a commit
that referenced
this issue
Feb 5, 2024
Closes #1345. Closes #1346 Closes #1350. This PR stops discarding request errors from libp2p, and instead returns them to the sender of the request. Also penalizes peers for sending invalid responses or for not replying at all. Making penalty configurable should be a follow-up PR, as there are other penalties that should be configurable as well TODO: - [x] Make timeout configutable: Already seems to be case on master branch - [x] Add tests - [x] Fix current tests that for some reason don't terminate --------- Co-authored-by: xgreenx <xgreenx9999@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem description
Any error(it includes timeouts) that occurs during the request-response protocol is ignored. If we receive an error, the parts of the blockchain that await the response will be paralyzed. In the case of the
fuel-core-sync
, it may stuck the synchronization. In other cases, it can consume resources forever.Solution
Instead of silently removing the channel related to
request_id
we need to send either an error orNone
. It allows other parts of the project to handle it and request it again(maybe from another peer).Implementation details
The approach with error requires modification of the response type. But it triggers cascade changes in the places where we started a request. It will allow us to check all related places and to be sure that we handled an error response correctly and at least created retry logic(or any other logic that suits us to be sure that the node is in the "fine" state after this request).
The main goal of this issue is to have timeouts for requests. We need to add an integration test that verifies that. Also, maybe we need to make a timeout threshold configurable(the default value could be 20 seconds).
The text was updated successfully, but these errors were encountered: