Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large amounts of traffic are being sent to offline nodes #12409

Closed
leoluk opened this issue Sep 23, 2020 · 11 comments · Fixed by #12620
Closed

Large amounts of traffic are being sent to offline nodes #12409

leoluk opened this issue Sep 23, 2020 · 11 comments · Fixed by #12620
Assignees
Milestone

Comments

@leoluk
Copy link
Contributor

leoluk commented Sep 23, 2020

Problem

After shutting down a node, the remainder of the cluster keeps sending traffic. This has been the source of frequent complaints by users who ran Solana on a home connection, shut it down, and then continued to get DDoSed by the cluster.

Proposed Solution

Stop sending traffic to nodes that aren't up-to-date in gossip.

This will only help with nodes that are fully offline. Congestion control is needed as well: #12410

A malicious attacker could still create fake entries since the gossip IP isn't authenticated via a three-way handshake that proves that there's a valid return path, see #9491.

@leoluk
Copy link
Contributor Author

leoluk commented Sep 24, 2020

Did a little experiment and shut down our TdS node (which has average stake, like everyone else). Traffic spiked to >300 Mbps of sustained UDP traffic to 8000/udp, which is our gossip port:

image

300 Mbps is enough to kill many residential broadband connections, and there were no signs of it stopping until I turned the node back on after 20 minutes. It's easy to mistake such a spike of UDP traffic, especially to a common port like 8000/udp, for a DDoS attack. What's worse is that egress traffic from the node stops at the same time, so it looks like a successful attack at that :-)

In fact, there was recent case of a validator's Hetzner server getting locked for a suspected DDoS attack on another validator.

@MBGBuzzer
Copy link

I confirm that our server received a block from the hardware provider Hetzner, said that they blocked outgoing traffic and will not unblock it until we fix this situation. There was a clear suspicion of a DDoS attack. No explanation that this is legitimate traffic, the reason they are not interested, the only fix.

From this we can say that this is the narrowest point of the network, attackers are not asleep, and if this happens again, it will cause a big problem for the network and validators.
4BLOCK.TEAM

@mvines mvines added this to the v1.4.0 milestone Sep 24, 2020
@urb4n-thr34t
Copy link

Hello!
Couple days ago I have recieved the complaint from Hetzner about the outgoing attack from my server.
Host bans server due to outgoing traffic from port 8001/udp.
As you can see it was just a regular solana traffic.

We know, when solana node is down, incoming udp traffic doesn't stop, but become even more.
So if IP will be used by someone else, it will still recieve huge amount of traffic, which could be recognized as DoS attack,
and the complaints potentialy could be sent to all cluster validators.
As I know, I am not the one who have received the complaint about this server.

I think it's a critical to implement some mechanism to set node state to offline to stop unnecessary traffic after node's shut down, overwise it could lead to disaster.
At the current state someone could potentialy attack entire cluster by starting hundreds of VPS with solana at different locations, then shut them down.
Validators will send to these dead nodes UDP spam, and complaints from providers won't keep you waiting long.

You can even try to snipe a top-stake validators a with manual complaints (IP of all nodes are openly visible).

@pkrasam
Copy link
Contributor

pkrasam commented Oct 1, 2020

A call to/for the network domain security specialists ...how might one go about solving for such an issue that could potentially become a network halter ?!

Are there any temporary workarounds we can put in place ...while the long-term solution is being worked out ?!

@mvines
Copy link
Member

mvines commented Oct 1, 2020

Workaround: When your staked node is shut down, run this command for ~1 minute to ensure the cluster receives an update to your IP address marking it as invalid.

$ solana-gossip spy --identity ~/validator-keypair.json --gossip-host 0.0.0.0 --entrypoint testnet.solana.com:8001

@leoluk
Copy link
Contributor Author

leoluk commented Oct 5, 2020

We're still seeing plenty of excess traffic even after waiting 20 minutes:

image

Much less but I think the only acceptable amount of traffic for an offline node is close to zero.

(traffic from validators < 1.3.15 has been filtered from the graph)

@leoluk leoluk reopened this Oct 5, 2020
behzadnouri added a commit to behzadnouri/solana that referenced this issue Oct 5, 2020
solana-labs#12620
patched the DDOS issue with nodes which go offline:
solana-labs#12409

However, offline nodes still see some traffic spike which is likely
because no origins are pruned from their bloom filter in active set, and
so multiple nodes push redundant duplicate messages to them
simultaneously.
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.
behzadnouri added a commit to behzadnouri/solana that referenced this issue Oct 5, 2020
solana-labs#12620
patched the DDOS issue with nodes which go offline:
solana-labs#12409

However, offline nodes still see some traffic spike which is likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.
behzadnouri added a commit to behzadnouri/solana that referenced this issue Oct 5, 2020
solana-labs#12620
patched the DDOS issue with nodes which go offline:
solana-labs#12409

However, offline nodes still see (much lesser) traffic spike, likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.
behzadnouri added a commit to behzadnouri/solana that referenced this issue Oct 5, 2020
solana-labs#12620
patched the DDOS issue with nodes which go offline:
solana-labs#12409

However, offline nodes still see (much lesser) traffic spike, likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.
behzadnouri added a commit to behzadnouri/solana that referenced this issue Oct 5, 2020
solana-labs#12620
patched the DDOS issue with nodes which go offline:
solana-labs#12409

However, offline nodes still see (much lesser) traffic spike, likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.
behzadnouri added a commit to behzadnouri/solana that referenced this issue Oct 5, 2020
solana-labs#12620
patched the DDOS issue with nodes which go offline:
solana-labs#12409

However, offline nodes still see (much lesser) traffic spike, likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.
behzadnouri added a commit that referenced this issue Oct 6, 2020
* filters out inactive nodes from push options

#12620
patched the DDOS issue with nodes which go offline:
#12409

However, offline nodes still see (much lesser) traffic spike, likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.

* uses current timestamp in test/crds_gossip
mergify bot pushed a commit that referenced this issue Oct 14, 2020
* filters out inactive nodes from push options

#12620
patched the DDOS issue with nodes which go offline:
#12409

However, offline nodes still see (much lesser) traffic spike, likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.

* uses current timestamp in test/crds_gossip

(cherry picked from commit a5c6a78)
mergify bot pushed a commit that referenced this issue Oct 20, 2020
* filters out inactive nodes from push options

#12620
patched the DDOS issue with nodes which go offline:
#12409

However, offline nodes still see (much lesser) traffic spike, likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.

* uses current timestamp in test/crds_gossip

(cherry picked from commit a5c6a78)
mergify bot added a commit that referenced this issue Oct 20, 2020
* filters out inactive nodes from push options

#12620
patched the DDOS issue with nodes which go offline:
#12409

However, offline nodes still see (much lesser) traffic spike, likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.

* uses current timestamp in test/crds_gossip

(cherry picked from commit a5c6a78)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
@behzadnouri
Copy link
Contributor

Is it possible to check if this is still an issue even after #12674?
Note that staked offline node should still receive some amount of traffic (but much less than before) in order to mitigate eclipse attack.

behzadnouri added a commit to behzadnouri/solana that referenced this issue Nov 11, 2020
Inactive nodes are still observing incoming gossip traffic:
https://discord.com/channels/428295358100013066/670512312339398668/776140351291260968
likely because of pull-requests.

Previous related issues and commits:
solana-labs#12409
solana-labs#12620
solana-labs#12674

This commit implements same logic as
solana-labs#12674
to exclude inactive nodes from pull options, with the same periodic
retry logic for offline staked nodes in order to mitigate eclipse
attack.
behzadnouri added a commit to behzadnouri/solana that referenced this issue Nov 11, 2020
Inactive nodes are still observing incoming gossip traffic:
https://discord.com/channels/428295358100013066/670512312339398668/776140351291260968
likely because of pull-requests.

Previous related issues and commits:
solana-labs#12409
solana-labs#12620
solana-labs#12674

This commit implements same logic as
solana-labs#12674
to exclude inactive nodes from pull options, with the same periodic
retry logic for offline staked nodes in order to mitigate eclipse
attack.
behzadnouri added a commit that referenced this issue Nov 12, 2020
Inactive nodes are still observing incoming gossip traffic:
https://discord.com/channels/428295358100013066/670512312339398668/776140351291260968
likely because of pull-requests.

Previous related issues and commits:
#12409
#12620
#12674

This commit implements same logic as
#12674
to exclude inactive nodes from pull options, with the same periodic
retry logic for offline staked nodes in order to mitigate eclipse
attack.
mergify bot pushed a commit that referenced this issue Nov 12, 2020
Inactive nodes are still observing incoming gossip traffic:
https://discord.com/channels/428295358100013066/670512312339398668/776140351291260968
likely because of pull-requests.

Previous related issues and commits:
#12409
#12620
#12674

This commit implements same logic as
#12674
to exclude inactive nodes from pull options, with the same periodic
retry logic for offline staked nodes in order to mitigate eclipse
attack.

(cherry picked from commit 4e4e12b)
behzadnouri added a commit that referenced this issue Nov 12, 2020
Inactive nodes are still observing incoming gossip traffic:
https://discord.com/channels/428295358100013066/670512312339398668/776140351291260968
likely because of pull-requests.

Previous related issues and commits:
#12409
#12620
#12674

This commit implements same logic as
#12674
to exclude inactive nodes from pull options, with the same periodic
retry logic for offline staked nodes in order to mitigate eclipse
attack.

(cherry picked from commit 4e4e12b)
mergify bot added a commit that referenced this issue Nov 12, 2020
Inactive nodes are still observing incoming gossip traffic:
https://discord.com/channels/428295358100013066/670512312339398668/776140351291260968
likely because of pull-requests.

Previous related issues and commits:
#12409
#12620
#12674

This commit implements same logic as
#12674
to exclude inactive nodes from pull options, with the same periodic
retry logic for offline staked nodes in order to mitigate eclipse
attack.

(cherry picked from commit 4e4e12b)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
@stale
Copy link

stale bot commented Jan 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 9, 2022
@stale
Copy link

stale bot commented Mar 2, 2022

This stale issue has been automatically closed. Thank you for your contributions.

@stale stale bot closed this as completed Mar 2, 2022
@behzadnouri behzadnouri removed the stale [bot only] Added to stale content; results in auto-close after a week. label Mar 2, 2022
@behzadnouri behzadnouri reopened this Mar 2, 2022
@behzadnouri
Copy link
Contributor

Fixed by
#12620
#13533
#12674

@github-actions
Copy link
Contributor

This issue has been automatically locked since there has not been any activity in past 7 days after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants