Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection counts climbing far past HighWater setting #6286

Closed
leerspace opened this issue May 1, 2019 · 18 comments
Closed

Connection counts climbing far past HighWater setting #6286

leerspace opened this issue May 1, 2019 · 18 comments

Comments

@leerspace
Copy link
Contributor

Version information:

$ ipfs version --all
go-ipfs version: 0.4.20-
Repo version: 7
System version: amd64/linux
Golang version: go1.12.4

Type:

bug

Description:

While using my node, the connection counts started climbing rapidly past the HighWater connection setting and seemed to be stuck in a climb -- reaching 2000+ connections before I shutdown the daemon (see ipfs.peers file in link below). At the time of the connection count climb I think I was pinning a few hashes and publishing an IPNS entry.

It looks like there are a couple of old issues that sound similar (e.g., #4718, #5248) but they are closed. I wonder if this could be related to #3532, but it's not clear to me if connections are building rapidly in that issue.

My node's LowWater and HighWater connection counts are set to the defaults (see ipfs.config for full output from ipfs config show). QUIC and EnableAutoRelay are both enabled, but EnableRelayHop is not.

Debug data using these instructions and ipfs swarm peers and ipfs config show output from after the issue started and before the daemon were killed are available here:

https://ipfs.io/ipfs/QmZqhucUHSoW3WzXsu7gepmQX9NqQzkjcsQAN1r4kjQYBH

@vyzo
Copy link
Contributor

vyzo commented May 1, 2019

Have you recently advertised as a relay hop with autorelay?
The provider record stays for 24hrs, and that would certainly inundate you with new connections.

@leerspace leerspace changed the title Connection Connection counts climbing far past HighWater setting May 1, 2019
@leerspace
Copy link
Contributor Author

leerspace commented May 1, 2019

@vyzo Assuming I would do that using the EnableRelayHop setting, I have not done that on this node as far as I can remember (I'd say 99% sure).

@Stebalien
Copy link
Member

Stebalien commented May 1, 2019

@leerspace could you post the output of ipfs swarm peers -v. The -v will help me figure out if these are inbound connections or outbound connections and what protocols you're speaking.

@leerspace
Copy link
Contributor Author

I didn't think to get verbose output from swarm peers before; but once I notice it happening again I'll grab it and add it to this issue.

@leerspace
Copy link
Contributor Author

I got this to happen again to some extent (2000+ peers), but I think this is probably just a duplicate of #6283.

Here's the ipfs swarm peers -v output in case I'm wrong (see 1556850569 for example, file names are unix timestamps from date +%s): https://ipfs.io/ipfs/QmSzBfiMwU5ChkMtZpFbUtWWXQPQBhbPSg6kh5GZQxBx6x

I should probably be using ipfs daemon --routing=dhtclient on this node as suggested in another issue since the *Water connection thresholds don't seem to keep connections under control with default routing.

@Stebalien
Copy link
Member

Unfortunately, we don't (yet) have anything to simply stop new connections. Libp2p needs to feed the connection manager down through to the transports themselves.

But still, that's a lot of inbound connections.

@swedneck
Copy link
Contributor

i seem to still be running into this with v0.4.21, my node consistently has around 8000 peers even though my highwater is set to 900.

@dokterbob
Copy link
Contributor

dokterbob commented Aug 25, 2019

Some problem with 0.4.22. With the following settings:

    "ConnMgr": {
      "GracePeriod": "30s",
      "HighWater": 15000,
      "LowWater": 10000,
      "Type": "basic"
    },

I'm consistently seeing 35-50k connections, which is essentially bringing our server to it's knees.

Lowering it down to:

    "ConnMgr": {
      "Type": "basic",
      "LowWater": 3000,
      "HighWater": 5000,
      "GracePeriod": "30s"
    }

Still yields about 35-40k connections.

Lowering it down to :

    "ConnMgr": {
      "Type": "basic",
      "LowWater": 1000,
      "HighWater": 3000,
      "GracePeriod": "30s"
    }

Still gives around 35k connections!

@Stebalien There really seems to be an issue here!

This seems to be somewhat of a runaway feedback loop; ones the DHT starts routing, and it's a good peer, more peers use it and it gets overloaded. Or something like that.

Note that we're consistently fetching 100-150 files (ipfs-search).

ipfs --version --all:

go-ipfs version: 0.4.22-
Repo version: 7
System version: amd64/linux
Golang version: go1.12.7

@dokterbob
Copy link
Contributor

Additional note: this seems to have started right after I enabled RelayHop and AutoRelay (second vertical white line), which I then quickly disabled (third vertical line) - the first line is the 0.4.22 upgrade. Could the removal thereof not have been propagated well throughout the DHT?

2019-08-25 13 56 24

@dokterbob
Copy link
Contributor

Sadly, this problem persists. I think it's time to reopen this issue.
ssp_temp_capture

On an 8-core CPU it soaks up a good 700% of load (purple here is IPFS).
ssp_temp_capture

@aschmahmann
Copy link
Contributor

@dokterbob @Stebalien could confirm, but I think that enabling both EnableRelayHop (I'm willing to serve as a relay) and EnableAutoRelay(I'm looking for relays) together is a bad idea. In theory your advertisements should have disappeared a day after you turned them off, but it looks like some nodes in the network have decided to continue advertising for you.

My understanding is that the plan is to remove serving as a relay node from IPFS and making it available through the libp2p daemon instead to minimize confusion and people shooting themselves in the foot. In the meanwhile, if you can, I'd recommend rotating your node's peerID to a new one which should restore your traffic to normal.

If you need any help figuring out how to do peerID rotation that's probably best asked on discuss.ipfs.io.

@Stebalien
Copy link
Member

^^ That's the issue.

@dokterbob
Copy link
Contributor

dokterbob commented Aug 27, 2019

@aschmahmann Great, thanks for the quick feedback!

Note that the [config doc] specifically state:

EnableAutoRelay Enables automatic relay for this node. If the node is a HOP relay (EnableRelayHop is true) then it will advertise itself as a relay through the DHT.

Note that, by now, my server is slowly returning to normal.

@Stebalien
Copy link
Member

Note that the [config doc] specifically state:

Are you noting that the current behavior is documented or is the documentation confusing?

@dokterbob
Copy link
Contributor

dokterbob commented Aug 28, 2019 via email

@Stebalien
Copy link
Member

Got it. I agree the whole flags interacting with each other is really confusing and I'll try to improve the documentation to make it less confusing.

@Stebalien
Copy link
Member

Actually, @vyzo, could you take a pass at this?

@dokterbob
Copy link
Contributor

dokterbob commented Aug 28, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants