Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implements ping-pong packets between nodes #12794

Merged
merged 15 commits into from
Oct 28, 2020

Conversation

behzadnouri
Copy link
Contributor

@behzadnouri behzadnouri commented Oct 10, 2020

Problem

https://hackerone.com/reports/991106

It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker can spoof IP address in UDP packet when sending PullRequest to the node. There's no any validation if provided source IP address is not spoofed and the node can send much larger PullResponse to victim's IP. As I checked, PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means that amplification is about 34x. This way an attacker can easily perform DDoS attack both on Solana node and third-party server.

To prevent it, need for example to implement ping-pong mechanism similar as in Ethereum: Before accepting requests from remote client needs to validate his IP. Local node sends Ping packet to the remote node and it needs to reply with Pong packet that contains hash of matching Ping packet. Content of Ping packet is unpredictable. If hash from Pong packet matches, local node can remember IP where Ping packet was sent as correct and allow further communication.

More info:
https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol

Summary of Changes

The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.

When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.

@codecov
Copy link

codecov bot commented Oct 10, 2020

Codecov Report

Merging #12794 into master will increase coverage by 0.0%.
The diff coverage is 86.7%.

@@           Coverage Diff            @@
##           master   #12794    +/-   ##
========================================
  Coverage    82.1%    82.1%            
========================================
  Files         372      373     +1     
  Lines       86562    86904   +342     
========================================
+ Hits        71109    71405   +296     
- Misses      15453    15499    +46     

@behzadnouri behzadnouri marked this pull request as ready for review October 10, 2020 19:33
@behzadnouri
Copy link
Contributor Author

@sakridge @carllin The code is complete except for tests. I wanted to get the initial review before adding more tests.

also @mvines since this is an abi change.

@behzadnouri behzadnouri changed the title implements ping-pong mechanism between nodes implements ping-pong packets between nodes Oct 10, 2020
@aeyakovenko
Copy link
Member

Can you expand the problem? the hacker one links are behind a login. I'd like to have the details in the GitHub PR

@behzadnouri
Copy link
Contributor Author

Can you expand the problem? the hacker one links are behind a login. I'd like to have the details in the GitHub PR

sure, done

@aeyakovenko
Copy link
Member

@behzadnouri doesn't the PullRequest/PullResponse have the same effect? Do we need another set of messages, or can we infer the same information form the pulls?

@behzadnouri
Copy link
Contributor Author

@behzadnouri doesn't the PullRequest/PullResponse have the same effect? Do we need another set of messages, or can we infer the same information form the pulls?

I am not sure we can. An attacker can keep sending fake pull/push traffic, without an indication that it is spoofing an ip address. With ping/pong packets, the pong package can be verified as soon as it is received.

@leoluk
Copy link
Contributor

leoluk commented Oct 11, 2020

That Hackerone report is a duplicate of #9492.

Glad to see it's getting fixed!

@aeyakovenko
Copy link
Member

Yea. You age right. We need a signed cookie to be sent back to authenticate the address.

@behzadnouri behzadnouri force-pushed the ping-pong branch 3 times, most recently from 8044b38 to 8772b4e Compare October 12, 2020 19:43
@sakridge
Copy link
Member

Looks like a good direction to me, just need to figure out the backwards-compatibility part.

core/src/ping_pong.rs Outdated Show resolved Hide resolved
// Because pull-responses are sent back to packet.meta.addr() of
// incoming pull-requests, pings are also sent to request.from_addr (as
// opposed to caller.gossip address).
move |request| {
Copy link
Contributor

@carllin carllin Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, currying!

@@ -95,6 +96,9 @@ const MAX_GOSSIP_TRAFFIC: usize = 128_000_000 / PACKET_DATA_SIZE;

/// Keep the number of snapshot hashes a node publishes under MAX_PROTOCOL_PAYLOAD_SIZE
pub const MAX_SNAPSHOT_HASHES: usize = 16;
const GOSSIP_PING_TOKEN_SIZE: usize = 32;
const GOSSIP_PING_CACHE_CAPACITY: usize = 16384;
Copy link
Contributor

@carllin carllin Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be based on the number of peers in gossip instead of a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should just be large enough not to kick out ping/pong packets too early. That includes scenarios where there may be spurious pull-requests from attackers. There might also be multiple packets associated with the same peer (e.g. multiple on-the-fly ping packets, or packets which were dropped).

Is there an easy/efficient way to get the number of peers here? may dynamically adjust it to account for larger network.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behzadnouri, got it, thanks! I think the most convenient available way would be to use cluster_info.gossip_peers(). It's not the most efficient because it scans the table (broadcast does something similar with cluster_info.tvu_peers()) but may not be too bad if done periodically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, since the alternatives would require table scan and locking, i lean toward keeping this constant for now, and later test alternatives.

core/src/ping_pong.rs Outdated Show resolved Hide resolved
@carllin
Copy link
Contributor

carllin commented Oct 12, 2020

@behzadnouri looks good! Maybe we can use the pongs to do the upgrade? I.e. once a peer responds to a pong message, then we can mark that peer as upgraded and use the hard verify path in check_pull_request for that peer

// TODO: For backward compatibility, this unconditionally returns
// true for now. It has to return _check, once nodes start
// responding to ping messages.
true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can plumb in the final code here as well, making it conditional on a new feature added to runtime/src/feature_set.rs. This way we can dynamically activate this feature on clusters once 95% of the validator fleet has upgraded. Right now FeatureSet is not plumbed though cluster_info though, so I suggest adding this plumbing first as a forerunner PR to this one. core/ should probably use the FeatureSet from the highest rooted bank.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, working on it.

Copy link
Contributor Author

@behzadnouri behzadnouri Oct 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the diff to use the feature-set added in #12878
I think in addition of 95% fleet upgrade we also need to ensure that the newly added metric pull_request_ping_pong_check_failed_count looks fine.

@@ -76,7 +76,7 @@ macro_rules! inc_counter_info {
#[macro_export]
macro_rules! inc_new_counter {
($name:expr, $count:expr, $level:expr, $lograte:expr, $metricsrate:expr) => {{
if log_enabled!($level) {
if $count != 0 && log_enabled!($level) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: separate out little changes like this into their own PRs. They can land independently, and sooner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually reverted this change and instead put the if ... != 0 at the call-site.
Though the counter value does not change when event count is zero, some of the logging logic in the implementation relies on how often that function is called regardless of events counts being zero or not:
https://github.com/solana-labs/solana/blob/f5ed017d6/metrics/src/counter.rs#L172-L203
It might be somewhere in the code relies on that behavior.

core/src/ping_pong.rs Outdated Show resolved Hide resolved
@@ -97,6 +98,10 @@ const MAX_GOSSIP_TRAFFIC: usize = 128_000_000 / PACKET_DATA_SIZE;

/// Keep the number of snapshot hashes a node publishes under MAX_PROTOCOL_PAYLOAD_SIZE
pub const MAX_SNAPSHOT_HASHES: usize = 16;
/// Number of bytes in the randomly generated token sent with ping messages.
const GOSSIP_PING_TOKEN_SIZE: usize = 32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 32-bytes seems like overkill. What about just u64 which seems like enough and allows for better memory efficiency and easy atomic use if needed, less cycles for compare/randomization generation etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that will compromise security in some way, and limit usability of these tokens in the future for other security related applications beyond the current one (i.e. ip-spoofing prevention). For example responding with sha256 hash of ping token which itself was only 64 bits is kind of degenerate.

These ping/pong packets will be super low traffic, since once received they will be cached for a while. So I doubt that this will cause any noticeable performance difference.

sakridge
sakridge previously approved these changes Oct 27, 2020
Copy link
Member

@sakridge sakridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mergify mergify bot dismissed sakridge’s stale review October 27, 2020 18:11

Pull request has been modified.

@behzadnouri behzadnouri merged commit ae91270 into solana-labs:master Oct 28, 2020
@behzadnouri behzadnouri deleted the ping-pong branch October 28, 2020 17:03
mergify bot pushed a commit that referenced this pull request Oct 28, 2020
https://hackerone.com/reports/991106

> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol

The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.

When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.

(cherry picked from commit ae91270)
mergify bot added a commit that referenced this pull request Oct 28, 2020
https://hackerone.com/reports/991106

> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol

The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.

When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.

(cherry picked from commit ae91270)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
@behzadnouri behzadnouri added v1.3 and removed v1.3 labels Oct 29, 2020
mergify bot pushed a commit that referenced this pull request Oct 29, 2020
https://hackerone.com/reports/991106

> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol

The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.

When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.

(cherry picked from commit ae91270)

# Conflicts:
#	Cargo.lock
#	core/src/cluster_info.rs
mergify bot added a commit that referenced this pull request Oct 29, 2020
* implements ping-pong packets between nodes (#12794)

https://hackerone.com/reports/991106

> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol

The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.

When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.

(cherry picked from commit ae91270)

# Conflicts:
#	Cargo.lock
#	core/src/cluster_info.rs

* resolves mergify merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants