Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: refactor connection manager to use a prioritised queue #1678

Merged
merged 9 commits into from
Apr 13, 2023

Conversation

achingbrain
Copy link
Member

@achingbrain achingbrain commented Apr 6, 2023

Refactors the connection manager.

  1. Internally it uses a queue to control concurrency instead of dial tokens (e.g. maxParallelDials)
  2. A second queue is used for each peer to prevent inividual peers from swamping the dial queue (e.g. maxParallelDialsPerPeer)
  3. The auto dialler now also checks the minimum connection limit on peer discovery and peer disconnect events for fast reconnect
  4. Auto dialled peers are sorted based on tag value so valuable peers should be re-dialled first
  5. Auto dialled peers are dialled in parallel to prevent a slow peer locking up the auto-dial queue
  6. A getDialQueue method has been added to libp2p to allow inspection of the dial queue though it needs exposing in the interface
  7. The connection gater denyDialMultiaddr method now only takes a multiaddr
  8. Fixes duplicate connections to fast peers
  9. Fixes unhanded promise rejections Use of Promise.any() can cause unhandled promise rejection  #1587
  10. Connection pruning now happens after peer:connect events not before - fixes Should not emit peer:connect with a closed connection #1565

BREAKING CHANGE: some connection manager options have changed - please see the upgrade guide for full details

@achingbrain achingbrain changed the title fix!: refactor connection manager to use a queue and add priority dia… fix!: refactor connection manager to use a queue and add priority dialing Apr 6, 2023
@achingbrain achingbrain changed the title fix!: refactor connection manager to use a queue and add priority dialing fix!: refactor connection manager to use a queue Apr 6, 2023
@achingbrain achingbrain changed the title fix!: refactor connection manager to use a queue fix!: refactor connection manager to use a prioritsed queue Apr 6, 2023
@achingbrain achingbrain changed the title fix!: refactor connection manager to use a prioritsed queue fix!: refactor connection manager to use a prioritised queue Apr 6, 2023
Copy link
Member

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

Comment on lines +32 to +36
// these should now be ranges expressed as MultiaddrFilters
allow: [
'/ip4/0.0.0.0/tcp/123'
],

// these should now be ranges expressed as MultiaddrFilters
deny: [
'/ip4/0.0.0.0/tcp/123'
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a "allow peer" that allows any appropriate transports given a specific peerId?

I was trying to do that the other day in kubo and realized i needed a full multiaddr. I was trying to grant full-access to/from my helia node and couldn't figure out how to do it within an hour or so and i gave up.

It would be great to be able to do that in js-libp2p

Copy link
Member Author

@achingbrain achingbrain Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused, access is allowed by default, it's only disallowed if a configured connection gater denys it or you're at maxConnections already.

The allow list above enables specific networks to breach connection limits - a DoS mitigation for when your node is being eclipsed.

It's almost certainly not the solution to your problem - maybe you can open an issue on the helia repo with a repro case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. I will re-open if I run into it.. essentially what I was trying to do was prefer my local node to others, and to allow connections to that one no matter what. it sounds like that may not be necessary

doc/migrations/v0.43-v0.44.md Outdated Show resolved Hide resolved
Comment on lines +57 to +55
// a low value allows user-initiated dials to take priority over
// auto dials
autoDialPriority: 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Comment on lines +61 to +58
// this was previously named `maxAddrsToDial`
maxPeerAddrsToDial: 20,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PerConnection attempt? Or total(then we banlist the peer as bad??)

Might be good to call out which one and how in jsdoc, or in the name if possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check the jsdoc for this config option? If it's not clear could you suggest an edit?

Comment on lines 64 to 65
// this was previously named `maxDialsPerPeer`
maxParallelDialsPerPeer: 20,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so i understand fully, this setting changes whether we can attempt to dial 2 or 20 multiaddrs at once when attempting to connect to a specific peer, correct?

Would 'maxParallelMultiAddrDialsPerPeer' be a more explicit name? Is there a way we can get that explicitness without the verbosity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only dial multiaddrs so maybe having MultiAddr in the name is redundant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yes, a setting of 2 with a peer that has 10 addresses means we'll dial all 10 addresses but only ever 2 at once. This stops a peer with a lot of spam addresses blocking the dial queue.

Comment on lines 67 to 71
// these should now be ranges expressed as MultiaddrFilters
allow: [
new MultiaddrFilter('/ip4/192.168.1.1/ipcidr/32')
],

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, can i create a multiaddrFilter consisting only of '/p2p/peerId' ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the allow/deny lists are for allowing/denying classes of network to make connections - this happens before the peer id exchange so we don't know who the remote peer is at that point.

It's a DoS mitigation strategy to always allow connections from networks you know, control or trust, or disallow connections from networks that are attacking you.

To deny specific peers you can implement a connection gater.

Comment on lines +108 to +97
denyDialMultiaddr: (multiaddr) => {
if (multiaddr.getPeerId() != null) {
// there is a peer id present in the multiaddr
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

* The minimum number of connections below which libp2p will start to dial
* peers from the peer book. (default: 0)
*/
minConnections?: number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MinConnections: 0 is the same as saying, I'm allowed to have zero connections, don't autoDial, correct?

Might be useful to call out that setting this to zero disables it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that default was wrong 🤭 - would much rather generate that info but not sure how.

I've updated the jsdoc for this setting - if it's still not clear could you suggest an edit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
* The minimum number of connections below which libp2p will start to dial peers
* from the peer book. Setting this to 0 effectively disables this behaviour.
* (default: 50)
*/
minConnections?: number
looks much better to me, thanks :)

src/connection-manager/auto-dial.ts Outdated Show resolved Hide resolved
Copy link
Member

@maschad maschad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this @achingbrain will definitely improve with connection management overall. I made some comments and suggestions.

I think these are a lot of changes though and perhaps we should split this into some smaller PRs to make reviewing a bit easier and the changes more manageable. I would suggest separate PRs for:

doc/CONFIGURATION.md Show resolved Hide resolved
doc/CONFIGURATION.md Show resolved Hide resolved
doc/migrations/v0.43-v0.44.md Show resolved Hide resolved
doc/migrations/v0.43-v0.44.md Outdated Show resolved Hide resolved
doc/migrations/v0.43-v0.44.md Show resolved Hide resolved
src/connection-manager/auto-dial.ts Outdated Show resolved Hide resolved
src/connection-manager/auto-dial.ts Show resolved Hide resolved
src/connection-manager/dial-queue.ts Show resolved Hide resolved
src/connection-manager/dial-queue.ts Show resolved Hide resolved
src/connection-manager/index.ts Show resolved Hide resolved
achingbrain and others added 6 commits April 11, 2023 12:58
…ling

Refactors the connection manager.

1. Internally it uses a queue to control concurrency instead of dial tokens
2. A second queue is used for each peer to prevent inividual peers from swamping the dial queue
3. The auto dialler now checks the minimum connection limit when peers disconnect instead of using a timer
4. Auto dialled peers are sorted based on tag value so valuable peers should be re-dialled first
5. Auto dialled peers are dialled in parallel to prevent a slow peer locking up the auto-dial queue
6. allow/deny lists are now `MultiaddrFilter`s
7. A `getDialQueue` method has been added to libp2p to allow inspection of the dial queue though it needs exposing in the interface
8. The connection gater `denyDialMultiaddr` method now only takes a multiaddr

BREAKING CHANGE: some connection manager options have changed - please see the upgrade guide for full details
Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com>
@achingbrain achingbrain force-pushed the fix/refactor-connection-manager branch from dbca29e to 3d92f93 Compare April 11, 2023 11:59
@achingbrain
Copy link
Member Author

I've removed the attempt at resolving the double-peer discovery bug - I think it needs a bit more thought, basically the logic to determine whether to emit the event is complicated and will be vastly simplified by having an atomic peer store.

The auto dial interval has also returned - this is because I found an edge case where we could still end up with no peers if all auto-dials failed - if network connection is lost, for example.

I've reverted the MultiaddrFilter change for allow/deny lists too - with a bit more thought it would be better to have the connection gater control allow/deny behaviour, then we aren't splitting responsibility for that between the connection manager and the connection gater.

Copy link
Member

@maschad maschad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@achingbrain
Copy link
Member Author

achingbrain commented Apr 13, 2023

Please hold off on merging until jacobheun/any-signal#25 or jacobheun/any-signal#26 is merged

All good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Should not emit peer:connect with a closed connection
3 participants