Reduce the impact of the DHT #6283

Stebalien · 2019-05-01T03:58:24Z

Currently, all nodes participate as full DHT servers by default. Unfortunately, this means:

We have a lot of crappy unreachable DHT servers.
I have to repeatedly tell users to run their client with ipfs daemon --routing=dhtclient because the DHT is causing the network to DoS their system.

Related work:

Connect to fewer peers when querying (improve query performance by limiting query width to KValue peers libp2p/go-libp2p-kad-dht#291).
Not become DHT servers unless we're "stable" (Add an "Auto" client/server DHT mode libp2p/go-libp2p-kad-dht#216).
Prefer UDP-based transports (QUIC by default).
Reduce the overhead of DHT queries (better buffering, multistream-2.0, tls/quic, etc.).
Store peer info on disk (Store peerstore data to disk #2848) so per-peer metadata isn't so expensive.
Switch away from to ed25519 by default (RSA handshakes are expensive).

However, I'm wondering if we should consider an interim solution: run in
DHT-client mode by default, at least for now.

Create a "laptop" config profile and make it the default. The laptop profile will use a "client" routing option.
Create a "desktop" config profile and use an "auto" routing option. At the moment, this will default to client until we have the ability to switch between client/server mode dynamically.
Modify the "server" config profile to default the routing option to "dht".

The significant drawback to this solution is that it'll make the IPFS network significantly less "p2p". That is, in a pure p2p network, all nodes are equal. On the other hands, all nodes are clearly not equal in terms of hardware so I'm not that concerned about this.

Thoughts and concerns?

cc @whyrusleeping & @daviddias?

The text was updated successfully, but these errors were encountered:

vyzo · 2019-05-01T07:20:51Z

If we get a laptop profile, we might want to enable autorelay by default for it as well.

vyzo · 2019-05-01T07:31:35Z

Create a "desktop" config profile and use an "auto" routing option. At the moment, this will default to client until we have the ability to switch between client/server mode dynamically.

This might be unreasonable, we might want to have this be dht by default until we have the magic option to switch dynamically.

obo20 · 2019-05-01T15:18:25Z

I'm in favor of defaulting people to dhtclient for now. This point specifically resonates with me:

We have a lot of crappy unreachable DHT servers.

My thoughts are that most people who would opt-in to be a DHT server would have somewhat of an idea what they're doing, and the nodes opting in would likely be more stable as they're intentionally configured to redistribute content.

Stebalien · 2019-05-01T20:09:00Z

If we get a laptop profile, we might want to enable autorelay by default for it as well.

SGTM.

Create a "desktop" config profile and use an "auto" routing option. At the moment, this will default to client until we have the ability to switch between client/server mode dynamically.

This might be unreasonable, we might want to have this be dht by default until we have the magic option to switch dynamically.

My thinking is that, at the moment, the DHT is too much overhead even for the average desktop. Fixing the issues I noted the issue description will help with that but, IMO, not even desktops should be DHT nodes till then.

My thoughts are that most people who would opt-in to be a DHT server would have somewhat of an idea what they're doing, and the nodes opting in would likely be more stable as they're intentionally configured to redistribute content.

Exactly.

Stebalien · 2019-05-01T20:29:20Z

Requires #6287.

vyzo · 2019-05-01T20:31:41Z

My concern is that we might end up with a DHT that is vastly undersized for the scale of the network.

Stebalien · 2019-05-01T20:44:31Z

My concern is that we might end up with a DHT that is vastly undersized for the scale of the network.

I agree although I think we'll get that simply by defaulting to the "laptop" profile. However, I'd be fine defaulting the desktop profile to "dht" at first (for a slower transition).

BillDStrong · 2019-05-03T09:34:29Z

Keeping this as a stop gap measure sounds fine. As an experimenting user, I don't want to know about all of this.

To prevent an undersized DHT I would suggest a simple test of the users hardware resources at first run. Declare some minimum threshold, and if the user exceeds that threshold, ask the user if they would like to enable some services to keep the network healthy.

You would want to overestimate the minimum hardware. You don't want the user to ever have to think about ipfs is running in the background, taking precious cycles from their games/work.

bonekill · 2019-05-14T11:56:41Z

+1 to interim plan
You should not have to worry to much about destabilizing the network because...

1. Most nodes do not update with haste.
Notably there are a few large projects that have their clients and server clusters hang back a few versions, so in the event that this starts leading to an absurd client to server ratio a patch can be released to start reversing the swarm back to the old behavior before problems arise.
Edit: The new release process mostly outdates this.

The change is not applied to upgraded instances.
For this change take effect the user would need to run "ipfs init" or make an explicit config alteration as I don't believe we can decipher between an existing config being explicitly set to "dht" or just defaulted to it.~~This reinforces 1. as the adoption rate is further reduced.~~

3. Ability to lower the α (alpha concurrency) parameter.
IIRC the α parameter for searching through the DHT is cranked up to deal with all the useless nodes, Once a large number of useless nodes are removed you can pull back the α to a more sane number (ex. α = 3). While you cannot make this particular change in the same patch (because of point 1. and 2.) this should eventually lower the swarm "cost" for each query due to less canceled RPCs while in flight. Hopefully while the raw capacity drops the query efficiency rises, netting in a greater effective capacity than before.
Edit: α is already = 3, and has been for a very long time,.. whoops

4. Reduces "scattershot" behavior.
IPFS seems to increase the number of in flight requests the longer it takes to find valid results. Lots of useless nodes waste a lot of time to timeouts and IPFS seems to spawn many RPCs to make up for a failed one. Less wasted RPCs result in faster queries and fewer panic "scattershots" through routing tables. This behavior should decrease proportionally as the usable node ratio gets better. Not sure if this behavior is intentional/still exists but just something I have observed in the past,
Edit: I cannot replicate this behavior anymore.

A warning however, you should observe post patch to see if your own DHT nodes are getting hit to hard or not and if a reversal is required. ~~While unlikely due to the above,~~ However IF the DHT client to server ratio hits a critical point the entire DHT swarm may cascade fail and be difficult to bootstrap again. You would either need to wait for a large number of requesting clients to give up and/or bulk online a large number of healthy DHT serving nodes to fix it.

yiannisbot · 2022-12-21T16:19:28Z

Isn't this issue obsolete after the IPFS v0.5 version where new nodes use AutoNAT to get their node status?

Stebalien added the kind/enhancement A net-new feature or improvement to an existing feature label May 1, 2019

leerspace mentioned this issue May 3, 2019

Connection counts climbing far past HighWater setting #6286

Closed

Stebalien mentioned this issue May 16, 2019

Writeup of router kill issue #3320

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the impact of the DHT #6283

Reduce the impact of the DHT #6283

Stebalien commented May 1, 2019

vyzo commented May 1, 2019

vyzo commented May 1, 2019

obo20 commented May 1, 2019

Stebalien commented May 1, 2019

Stebalien commented May 1, 2019

vyzo commented May 1, 2019

Stebalien commented May 1, 2019

BillDStrong commented May 3, 2019

bonekill commented May 14, 2019 •

edited

Loading

yiannisbot commented Dec 21, 2022

Reduce the impact of the DHT #6283

Reduce the impact of the DHT #6283

Comments

Stebalien commented May 1, 2019

vyzo commented May 1, 2019

vyzo commented May 1, 2019

obo20 commented May 1, 2019

Stebalien commented May 1, 2019

Stebalien commented May 1, 2019

vyzo commented May 1, 2019

Stebalien commented May 1, 2019

BillDStrong commented May 3, 2019

bonekill commented May 14, 2019 • edited Loading

yiannisbot commented Dec 21, 2022

bonekill commented May 14, 2019 •

edited

Loading