-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In sc-network, isolate accesses to client, txpool, finality proof builder, ... in separate tasks. #559
Comments
Yes and no. It depends how paritytech/substrate#3230 is implemented. What I had in mind for paritytech/substrate#3230 is that when we call But even if we do that, it is still possible for But we could also consider using separate tasks as part of paritytech/substrate#3230, I don't know. |
We should start a channel on matrix and continue there. |
Why? This is the right place to discuss the implementation. |
The code that is relevant for this issue is here: https://github.com/paritytech/substrate/blob/ae36c62223a8191e5bbca14ca5df3bc7d0edb6ea/client/network/src/chain.rs#L45 Since this is far from being trivial, we should, as mentioned above, add to the networking code an When it comes to making the networking code async-friendly, it shouldn't be very hard to adjust I suggest we keep the synchronousity of |
How so? Isn't it run on a thread pool? I don't see how this proposed change will solve anything. Simply moving a bottleneck to a different place won't help much. You'd just get an overflow in the task queue. |
Answering ping, parsing messages, and everything "close to the socket" is done in a thread pool. Answering requests when access to the client is needed is not.
Right now if the client takes a long time we just accumulate network messages from nodes. After this change, we would instead process all messages quickly/immediately and instead accumulate messages that we need to answer. This means that: 1- We could detect when the client is over-capacity and return some sort of "sorry we're busy" error to the remote. As a more general problem, we call the client from within |
Why not? Client itself is thread safe. I'm pretty sure handling requests does not lock anything for the duration of the client query. At least it was not initially.
Request handling needs to be fair. E.g. bootnode resources need to scale evenly for all connecting peers. If the node is overloaded the request needs to wait till timeout and not be immediately denied.
See above. I don't see why it is not the case right now. This sounds like a lot of additional complexity to work around the actual problem. Handling network requests should be fast. If your network handler takes seconds to execute it needs to be optimized first, and not shoved into the queue. And if such a queue was to be organized, i'd argue that it should be managed explicitly, instead of relying on tokio executor or creating another unbounded channel. As it will most certainly require some kind of per-peer throttling. |
* Add runtime API `ConvertTransactionRuntime API` * ref: make transaction_converter optional * rustfmt
Right now the
sc-network
directly calls the client, tx-pool, finality proof builder, and so on (full list in theconfig
module).However whenever one of these components takes a long time to respond, the entire networking stack freezes for everyone.
Not only does this happens right now due to some runtime performance issues, but it also makes us quite vulnerable to attacks.
Instead of performing straight calls, we should in my opinion spawn a few background tasks (or even a new background task per call when that is possible) and perform the calls there.
My personal opinion is that isolating the calls to the clients/tx-pool/... should be performed by
sc-service
, but from a pragmatic perspective let's do it insc-network
.Instead of passing directly an
Arc<dyn Client>
to thesync
state machine, we should instead pass some sort ofIsolated<Arc<dyn Client>>
whereIsolated
exposes an asynchronous API.The text was updated successfully, but these errors were encountered: