Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Connection Manager #4288

Merged
merged 2 commits into from
Oct 16, 2017
Merged

WIP: Connection Manager #4288

merged 2 commits into from
Oct 16, 2017

Conversation

whyrusleeping
Copy link
Member

@whyrusleeping whyrusleeping commented Oct 6, 2017

This is the first WIP branch of the connection manager. It allows you to set up a ConnMgr that tries to keep the number of open connections your node has within certain bounds.

Theres still a lot to do here.

@whyrusleeping
Copy link
Member Author

supplementary: libp2p/go-libp2p-kad-dht#95

@whyrusleeping
Copy link
Member Author

also: libp2p/go-libp2p-circuit#20

@whyrusleeping
Copy link
Member Author

proposal:

Merge this roughly as is (without connection tagging logic) so that we can start using it in 'dumb' mode asap. Then, PR the different connection tagging things in individually.

@whyrusleeping whyrusleeping force-pushed the feat/connmgr branch 2 times, most recently from 33d901c to 1f3decc Compare October 8, 2017 17:29
@whyrusleeping
Copy link
Member Author

note: disabled by default, you will need to set up the configuration as defined in the config.md doc

Copy link
Member

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good here. I still need to dig more into the libp2p side of things.

core/core.go Outdated
@@ -814,6 +839,10 @@ func constructPeerHost(ctx context.Context, id peer.ID, ps pstore.Peerstore, bwr
if !opts.DisableNatPortMap {
hostOpts = append(hostOpts, p2pbhost.NATPortMap)
}
if opts.ConnectionManager != nil {
fmt.Println("adding conn manager to host opts")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like debugging leftover.

core/core.go Outdated
switch cfg.Type {
case "", "none":
return nil, nil
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be on the bottom of the switch

LowWater is the minimum number of connections to maintain.

- `HighWater`
HighWater is the number of connections that, when exceeded, will trigger a connection GC operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like these names. Wouldn't simple Low/High be enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... I'd something a little more descriptive than that. I've seen these terms used in this context before. But i'm open to suggestions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a simple Min and Max or maybe MinConn or MaxConn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watermark is something that is familiar to me in terms of setting upper and lower bounds to something in programming. No special feelings attached though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'd avoid min/low as they imply that we'll actively seek peers up to this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... are we in favor of still using HighWater and LowWater ? I'm getting mixed signals (especially from @Stebalien who thumbs upped davids comment, but suggested changing things).

I agree that "min" would imply we actively open connections until we have that many, but I think "low" avoids this connotation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(GC)Treshold(Conn/Peers), (GC)Target(Conn/Peers)? Water may sound confusing for some even if it's technically correct.

@magik6k
Copy link
Member

magik6k commented Oct 8, 2017

Note: connections used in ipfs p2p may need tagging

@whyrusleeping
Copy link
Member Author

Maybe we should actually have this enabled by default, limit people to 900-1000 connections or so. Even if it does cause potentially strange behaviour, its probably better than endless "too many open file descriptor" errors...

@whyrusleeping
Copy link
Member Author

For reference, the connection closing logic is here: https://github.com/libp2p/go-libp2p-connmgr

@whyrusleeping
Copy link
Member Author

To set the config, run:

ipfs config --json Swarm.ConnMgr '{"GracePeriod": "30s","HighWater": 1000,"LowWater": 900,"Type": "basic"}'

@@ -380,7 +404,7 @@ func setupDiscoveryOption(d config.Discovery) DiscoveryOption {
if d.MDNS.Interval == 0 {
d.MDNS.Interval = 5
}
return discovery.NewMdnsService(ctx, h, time.Duration(d.MDNS.Interval)*time.Second)
return discovery.NewMdnsService(ctx, h, time.Duration(d.MDNS.Interval)*time.Second, discovery.ServiceTag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it just happened to change in a dependency that i needed to update here

hostopts := &ConstructPeerHostOpts{
AddrsFactory: addrsFactory,
DisableNatPortMap: cfg.Swarm.DisableNatPortMap,
DisableRelay: cfg.Swarm.DisableRelay,
EnableRelayHop: cfg.Swarm.EnableRelayHop,
ConnectionManager: connmgr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to instantiate the ConnManager here? As a user of libp2p, I would assume that ConnManager is just a thing that exists and that all I need to do is pass an option to select the strategy, saving users from yet importing another package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to have a libp2p-bundle package that imports everything all together, but for now the tying together of all the pieces is done here in the core constructor. Importing things at the top level as much as possible makes dependency tree modifications much cheaper. If this were done in a lower package, then every time i make a change to the connection manager logic, i have to update that package (or packages) and then update those into here.

Sets the type of connection manager to use, options are: `"none"` and `"basic"`.

- `LowWater`
LowWater is the minimum number of connections to maintain.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is lowWater enforced? Does ConnManager run Discovery Mechanisms if we are poorly connected? (Didn't see this in the go-libp2p-connmgr code)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, its not a 'minimum number of connections' its just the point at which we stop closing connections. So if LowWater is 900 and HighWater is 1000, we will trigger a Trim once we have over 1000, and then close connections until we have 900 or less

LowWater is the minimum number of connections to maintain.

- `HighWater`
HighWater is the number of connections that, when exceeded, will trigger a connection GC operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watermark is something that is familiar to me in terms of setting upper and lower bounds to something in programming. No special feelings attached though.

docs/config.md Outdated
- `HighWater`
HighWater is the number of connections that, when exceeded, will trigger a connection GC operation.
- `GracePeriod`
GracePeriod is the length of time that new connections are immune from being closed by the connection manager.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is in seconds? Can get that written down?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a time duration (like all the other time related things). 30s == 30 seconds, 1h == one hour, 4m == 4 minutes, etc

@@ -11,7 +11,7 @@ import (
ma "gx/ipfs/QmXY77cVe7rVRQXZZQRioukUM7aRW3BTcAgJe12MCtb3Ji/go-multiaddr"
peer "gx/ipfs/QmXYjuNuxVzXKJCfWasQk1RqkhVLDM9jtUKhqc2WPQmFSB/go-libp2p-peer"
pro "gx/ipfs/QmZNkThpqfVXs9GNbexPrfBbXSLNYeKrE7jwFM2oqHbyqN/go-libp2p-protocol"
p2phost "gx/ipfs/QmaSxYRuMq4pkpBBG2CYaRrPx2z7NmMVEs34b9g61biQA6/go-libp2p-host"
p2phost "gx/ipfs/Qmc1XhrFEiSeBNn3mpfg6gEuYCt5im2gYmNVmncsvmpeAk/go-libp2p-host"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiosity, in go, interface packages don't get an interface name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they do. But this is where we're instantiating the concrete type.

@daviddias
Copy link
Member

I see that https://github.com/libp2p/go-libp2p-connmgr/blob/master/connmgr.go keeps track of the peers to whom we opened connections. I guess in go memory is very cheap, in our JS implementation I would/will implement this ConnManager as just checking everything that is on PeerBook (your PeerStore) and using the .isConnected a method in each PeerInfo instance to check that, just to save allocating more memory to store the same info and avoiding having to update in 2 places.

@whyrusleeping
Copy link
Member Author

I see that https://github.com/libp2p/go-libp2p-connmgr/blob/master/connmgr.go keeps track of the peers to whom we opened connections.

Primarily, it keeps track of tags on each connection.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it's probably not too hard for a determined attacker to DoS an IPFS node but the grace period seems a bit too easy to exploit.

In the future, we can use some form of simple proof of work to ensure peers actually want to talk to us (under high loads) but, for now, the simplest solution I can think of is to limit the number of nodes in the grace period set (note: connections from our node should always have a grace period). It's still possible to make it hard to establish new inbound connections but that can already be an issue.

LowWater is the minimum number of connections to maintain.

- `HighWater`
HighWater is the number of connections that, when exceeded, will trigger a connection GC operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'd avoid min/low as they imply that we'll actively seek peers up to this point.

- `HighWater`
HighWater is the number of connections that, when exceeded, will trigger a connection GC operation.
- `GracePeriod`
GracePeriod is a time duration that new connections are immune from being closed by the connection manager.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm mistaken, this can trivially be used to DOS a node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldnt call it trivially, but yes, you can use this to exceed a nodes maximum connection count and make a connection closing sweep not reach the low water mark.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, for some reason, I assumed we had a hard "don't open new connections past the high water mark" limit (I was a bit sleep deprived due to jet lag...). That would have effectively taken the node offline.


However, still worried that this will be a way to monopolize a peer by tying up their connection quota with connections in the grace period. Basically, you can ensure that no connection lasts more than 10 seconds. Maybe don't count connections in the grace period?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, connections in the grace period are basically not counted, see here: https://github.com/libp2p/go-libp2p-connmgr/blob/master/connmgr.go#L61

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so we literally just skip them. Nevermind, LGTM.

Copy link
Member

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interface looks good to me. I trust all of you to make the right decision on figuring out the names for the marks ;)

@Kubuxu
Copy link
Member

Kubuxu commented Oct 11, 2017

Observations from some of our nodes, connection closing was run

  • pluto, which is one of bootstrap nodes: has started maxing out its CPU when connection closing - it was almost maxing out its CPU when it was not closing too. Fetching files from pluto is almost impossible but it was hard before that.
  • biham, which is one of the storage hosts: I have observed significant reduction in CPU, FDs and RAM usage and it is still responsive in regards of fetching files. Connection closing for it is great addition.

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
@ghost ghost assigned whyrusleeping Oct 14, 2017
@ghost ghost added the status/in-progress In progress label Oct 14, 2017
@whyrusleeping
Copy link
Member Author

ping @diasdavid @lgierth @Kubuxu for review (or response to previous review follow up)

"name": "go-testutil",
"version": "1.1.12"
"version": "1.1.11"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is go-testutil being downgraded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm... thats probably the fault of a rebase gone rogue. I'll take care of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, its because we accidentally updated go-testutil in master without bubbling the changes from the rest of the deps at the same time. I think this change is safe for now, we can bubble up those deps and fix things later.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👏 👍

One tiny comment about the go-testutil downgrade

Copy link
Member

@Kubuxu Kubuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part LGTM, I am still not done with reviewing some of the dep changes but as far as I got 👍

@Kubuxu
Copy link
Member

Kubuxu commented Oct 14, 2017

Only comment I have is that it would be good to have info about this feature returned somewhere from the API and have a test that config is being loaded correctly.

@whyrusleeping
Copy link
Member Author

@Kubuxu

Only comment I have is that it would be good to have info about this feature returned somewhere from the API and have a test that config is being loaded correctly.

Any suggestions here?

@ghost
Copy link

ghost commented Oct 16, 2017

Any suggestions here?

The network section in ipfs diag sys would do

@whyrusleeping
Copy link
Member Author

@Kubuxu i think we should merge this now, and do the api introspection as a separate PR. Theres a fair amount of discussion to be had around that (i realize as i write it).

@whyrusleeping
Copy link
Member Author

see #4308

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
@Kubuxu
Copy link
Member

Kubuxu commented Oct 16, 2017

I am not blocking this PR. I just raised a note that we should have it.

@whyrusleeping
Copy link
Member Author

Choo Choo!

@whyrusleeping whyrusleeping merged commit 6beab66 into master Oct 16, 2017
@Kubuxu
Copy link
Member

Kubuxu commented Oct 16, 2017

@whyrusleeping whyrusleeping removed the status/in-progress In progress label Oct 17, 2017
@skorokithakis
Copy link

skorokithakis commented Oct 30, 2017

Semi-relatedly, and because I want to activate this feature and see what difference it makes, is there a command that will show me on the CLI the (number of) active connections so I can tune the high/low water marks sensibly? Is ipfs swarm peers what I'm looking for?

@whyrusleeping whyrusleeping deleted the feat/connmgr branch October 30, 2017 22:43
@whyrusleeping
Copy link
Member Author

@skorokithakis ipfs swarm peers lists all active connections. I'm also working on a subcommand for the connection manager which should allow some more fine grained information

laurentsenta pushed a commit to laurentsenta/kubo that referenced this pull request Feb 25, 2022
laurentsenta pushed a commit to laurentsenta/kubo that referenced this pull request Feb 25, 2022
laurentsenta pushed a commit to laurentsenta/kubo that referenced this pull request Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants