-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: not dial all known peers in parallel on startup #698
Conversation
src/index.js
Outdated
const minPeers = this._options.connectionManager.minPeers || 0 | ||
|
||
// TODO: this should be ordered by score on PeerStoreV2 | ||
this.peerStore.peers.forEach(async (peer) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacobheun I started by doing the dials in serial as the token based dial will already use parallel dials for several multiaddrs the peer we are targeting might have. Also, the discovery services kicked in, and their new discoveries might be more important than the older ones (this is where having scores for the older peers would help a lot too).
Another solution I thought is to do the minPeers
parallel dials and then do the rest in serial (to compensate possible offline peers). Let me know what do you think about the pros and cons of both approaches.
src/index.js
Outdated
* @returns {void} | ||
*/ | ||
_maybeConnectPeerStorePeers () { | ||
if (!this._config.peerDiscovery.autoDial) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should deserve its own config in the peerStore instead? 🤔
e25915f
to
695f77c
Compare
I think the more general case that would be helpful here is going to the PeerStore when our connection count is below the low watermark. Right now I think we should add a counterpart of the |
Totally agree, but I don't think this is the correct timing for that. We are doing that in
This PR does that for boot and I agree with this, but with some other things in consideration. When having a more intelligent PeerStore/ConnectionManager, we should select peers based on their score and the protocols we are running.
It seems a good idea to do it on disconnect, but just iterating blindly in the PeerStore and connect will not likely bring much value. But I think we can include it here as it is quite simple.
This seems the way to go for me, but this should happen for the |
We can talk about this separately, it doesn't need to change for this, but it should change soon because we're going to hit scaling issues. If I know thousands of peers when I boot, that's going to get nasty real quick.
Actually, we do have some metrics we could use. Instead of looking at ALL peers in the peer store, for now we could prioritize peers we have keys/protocols for. Both should indicate a previous direct exchange, although keys might have come indirectly (i cant recall which subsystems we might have added this to in JS). |
src/connection-manager/index.js
Outdated
@@ -206,6 +210,7 @@ class ConnectionManager extends EventEmitter { | |||
this.connections.delete(peerId) | |||
this._peerValues.delete(connection.remotePeer.toB58String()) | |||
this.emit('peer:disconnect', connection) | |||
this._maybeConnectN() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a disconnect occurs and the previous call to _maybeConnectN
happens we're going to run two calls in parallel, we should avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean the _maybeConnectN
in start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can potentially have it running from the start call while trying to dial a bunch of peers and one of them meanwhile disconnected. I can add a verification for wether we are currently running it
src/connection-manager/index.js
Outdated
this._isTryingToConnect = true | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can go away
this._isTryingToConnect = true |
src/connection-manager/index.js
Outdated
} | ||
} | ||
|
||
this._maybeConnectTimeout = setTimeout(this._maybeConnectN, this._options.maybeConnectInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use retimer
for this. We're already using it and it will cut down on multiple timers in the background.
}, | ||
connectionManager: { | ||
minConnections, | ||
maybeConnectInterval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be autoDialInterval
. Since it's the wrong param, the default should be 10s, it's odd this test is passing. I assume it's actively in the process of connecting when the first check is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I debugged it earlier. It was failing sometimes before I add the interval param. I will fix it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We'll look at cleaning up this connection manager flow when we tackle the v2 work for that (tentatively 0.30/0.31).
This PR aims to block large numbers of dials on a peer restart, if this peer already new a large number of peers in the past.
It checks the
minPeers
as themaybeConnect
function used by the discovery event handlers. This was also using theminPeers
, but as all the dials were put in parallel, theconnectionManager.size
would not be updated to avoid further dials.When we work on connectionManager, we should work on better strategies for this, and likely have them as configurable