-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Networking simplification #2264
Conversation
This PR isn't completely finished, but the only thing that remains to update is the code of the full node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Automatically approving tomaka's pull requests. This auto-approval will be removed once more maintainers are active.
twiggy diff reportDifference in .wasm size before and after this pull request.
|
As part of this PR, I've removed the |
There are high chances that this PR introduces some bugs, and I admit that I don't have the courage to chase bugs for days, and would prefer to merge this so that I do other networking-related changes on top of it. |
@melekes Do you want to review that? You can completely say no (and actually I'd expect you to say no) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
man this PR is massive 🙈 I've reviewed some code, but not everything. Probably okay to merge. I can review the logic later.
Also would be interesting to compare how introducing global lock affected the performance. Will the tracing or some other instrument (similar to what go tool pprof
provides) can give an insight into lock contention in Rust?
fnv::FnvBuildHasher, | ||
>, | ||
|
||
messages_from_connections_tx: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can one receive messages from sender? 😐 messages_to_connections_tx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You get the messages from the messages_from_connections_rx
, not sure I get the question
@@ -421,29 +319,14 @@ impl NetworkService { | |||
futures_timer::Delay::new(next_discovery).await; | |||
next_discovery = cmp::min(next_discovery * 2, Duration::from_secs(120)); | |||
|
|||
match inner | |||
let mut lock = inner.guarded.lock().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut lock = inner.guarded.lock().await; | |
let mut guarded = inner.guarded.lock().await; |
don't you think lock
is too general? like just by looking at the line below, you probably won't understand what lock is (because it can represent anything).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, guarded
isn't much better. guarded
here is just supposed to mean "protected by a mutex".
In the context of this module, there's only one mutex in the entire code, so to me it's not a problem to just call that lock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except one can find Guarded
struct above, but there's no Lock
I don't think we can actually measure the lock contention at the moment, for what it's worth. As for the light node, it's single threaded, so it's not measurable either. |
I think I've addressed everything 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This big unreviewable PR refactors the networking part of the code, and more precisely the layers that coordinate all the TCP/WebSocket connections together. The code concerning individual connections has barely been touched.
The main file that has been modified is
collection.rs
. This file contains a data structure that represents a set of connections.Before this PR, this data structure was "atomic". All the methods were taking
&self
as parameter, meaning that multiple methods could be called at the same time, and many of these methods were asynchronous and could be interrupted. If you needed to inject data into a connection, this had to be done through the data structure.This design made the code extremely difficult to understand, because of all the corner cases to handle, most notably around the fact that futures can be cancelled by the user before their completion.
After this PR, this data structure has been split in two: one with the set of connections, and one
ConnectionTask
for each individual connection. The set of connections has a queue of messages destined to theConnectionTask
s, and theConnectionTask
s have a queue of messages destined to the set. This is exposed in the APIs of these objects, and it is the role of the user to do the messages passing. Before this PR, all the "locking/multithreading strategy" was handled internally by the collection, while after this PR it needs to be done by the user.Thanks to this change in paradigm, the data structures are no longer atomic and are now simple mutable state machines. You have getters and you have methods that modify the state, and that's it. This considerably simplifies the implementation.
In the same vein,
peers.rs
andservice.rs
, which are data structures built on top ofcollection.rs
, have been modified in the same way.This new paradigm is in theory slightly less optimal than the one before. Before this PR, locking was fine-grained. If multiple threads wanted to access the set of connections at the same time, they could call a method at the same time, and if their changes didn't overlap they would actually run at the same time. After this PR, if multiple threads want to access the set, each thread needs to lock a mutex around the entire set.
However, the complexity of the previous implementation, notably around cancellable futures, has lead to a lot of overhead as well, where for example operations in progress needed to be buffered so that it can be resumed later in case the user interrupts the operation.
Overall I think that this change is more than worth it.