-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
59f7df3
to
e3d6406
Compare
I opened paritytech/polkadot#963, which should fix the Polkadot compilation. |
struct SpawnImpl<F>(F); | ||
impl<F: Fn(Pin<Box<dyn Future<Output = ()> + Send>>)> Executor for SpawnImpl<F> { | ||
fn exec(&self, f: Pin<Box<dyn Future<Output = ()> + Send>>) { | ||
(self.0)(f) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this here from libp2p-core
because I forgot to re-expose SwarmBuilder::executor_fn
before 0.17
, there is only SwarmBuilder::executor
, and I wanted to avoid another libp2p release just for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, this looks good to me other than a couple of suggestions. As someone who has been battling through changes and bugs in generic_proto/behaviour.rs
, this change is a bit scary, but I also can't spot anything wrong in the code.
Before merging, however, it would be great to deploy it on a validator node and checks its logs and graphs.
Would you mind merging master, so that we can test Polkadot with it? |
Just did so. For what it is worth, I already ran polkadot with it a few weeks ago, just to check that #4272 is resolved with these changes, which seemed to be the case. |
Deployed both on my Google Cloud instance (with incoming connections) around 3pm and on There are a couple warnings (notably about reserved nodes getting disconnected), but that was already the case before. In particular, comparing on Kibana the logs before and after this PR, the rate of all warnings seems exactly the same. In Grafana everything seems normal as well. The number of GrandPa notifications emitted has increased, but that is expected due to #5520. So everything looks good to me so far. I suggest to take a look tomorrow morning again and if there's nothing weird happening, we merge this. |
Well, the test is kind of inconclusive. On Grafana, however, everything looks normal. |
I'd be in favour of merging. |
Needs a merge/rebase over master. |
I think the test is failing because of the unused import warning.
|
Supersedes #5066. This PR upgrades substrate to
libp2p-0.17
. The following are the main libp2p changes which are visible in substrate:libp2p/rust-libp2p#1440 and libp2p/rust-libp2p#1519
Overview
As described in the libp2p PRs, the underlying changes are primarily in
libp2p-core
. Realistically, a second connection to the same peer only occurs if two peers connect to each other "at the same time". As a side-effect, existing connections are also no longer closed in favour of new ones, which addresses #4272.Details
send_packet
andwrite_notification
always send all data over the same connection to preserve the ordering provided by the transport, as long as that connection is open. If it closes, a second open connection may take over, if one exists, but that case should be no different than a single connection failing and being re-established in terms of potential reordering and dropped messages. Messages can be received on any connection and thus two peers which happened to connect to each other simultaneously may each a different connection for sending data.GenericProtoOut::CustomProtocolOpen
when the first connection reportsNotifsHandlerOut::Open
.GenericProtoOut::CustomProtocolClosed
when the last connection reportsNotifsHandlerOut::Closed
.In this way, the number of actual established connections to the peer is an implementation detail of the network behaviours. As mentioned before, in practice and at the time of this writing, there may be at most two connections to a peer and only as a result of simultaneous dialing. The network service even configures a hard limit of 2 connections per peer. However, the implementation in principle accommodates for any number of connections.
Noteworthy
During intermediate testing with the (by default disabled) integration tests
test_consensus
,test_sync
andtest_connectivity
it was revealed that when run in release mode these tests were very often failing, with the common symptom that the last node to start in a round of testing would often see no other peers (i.e. empty DHT routing table) and thus make no progress while all the others keep on running, causing the tests to time out waiting for the problematic peer to reach a certain state. The tests are mainly usingadd_reserved_peer
on the network to initialise the topology, however,add_reserved_peer
ultimately results in a call toadd_known_peer
on theDiscoveryBehaviour
which did not actually add that address to the Kademlia routing table, though it adds it to theuser_defined
peers which, when passed in the constructor of the behaviour, are added to the Kademlia routing table. I thus changedadd_known_peer
to also add the given address to the Kademlia routing table and that resolved the issues with these integration tests and thetest_connectivity
test seems to run notably faster (release mode). My current guess is that the tests were so far unknowingly relying on a timing assumption w.r.t. the initial discovery / connection setup and DHT queries in order for all peers to find each other, in particular when simultaneous connections attempts are in play, as often happens in release mode. Ultimately, the change of lettingadd_known_peer
add the given address to the Kademlia routing table may be a patch worth extracting separately, because it does look like an oversight to me.libp2p/rust-libp2p#1472
Insubstantial changes to adapt to the new APIs.
libp2p/rust-libp2p#1467
Insubstantial changes to add now necessary feature flags where there is a dependency on
libp2p
withdefault-features = false
.