-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implements ping-pong packets between nodes #12794
Conversation
Codecov Report
@@ Coverage Diff @@
## master #12794 +/- ##
========================================
Coverage 82.1% 82.1%
========================================
Files 372 373 +1
Lines 86562 86904 +342
========================================
+ Hits 71109 71405 +296
- Misses 15453 15499 +46 |
Can you expand the problem? the hacker one links are behind a login. I'd like to have the details in the GitHub PR |
sure, done |
@behzadnouri doesn't the PullRequest/PullResponse have the same effect? Do we need another set of messages, or can we infer the same information form the pulls? |
I am not sure we can. An attacker can keep sending fake pull/push traffic, without an indication that it is spoofing an ip address. With ping/pong packets, the pong package can be verified as soon as it is received. |
d540841
to
33687f1
Compare
That Hackerone report is a duplicate of #9492. Glad to see it's getting fixed! |
Yea. You age right. We need a signed cookie to be sent back to authenticate the address. |
8044b38
to
8772b4e
Compare
Looks like a good direction to me, just need to figure out the backwards-compatibility part. |
8772b4e
to
d15675a
Compare
d15675a
to
4f3ed4a
Compare
// Because pull-responses are sent back to packet.meta.addr() of | ||
// incoming pull-requests, pings are also sent to request.from_addr (as | ||
// opposed to caller.gossip address). | ||
move |request| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, currying!
@@ -95,6 +96,9 @@ const MAX_GOSSIP_TRAFFIC: usize = 128_000_000 / PACKET_DATA_SIZE; | |||
|
|||
/// Keep the number of snapshot hashes a node publishes under MAX_PROTOCOL_PAYLOAD_SIZE | |||
pub const MAX_SNAPSHOT_HASHES: usize = 16; | |||
const GOSSIP_PING_TOKEN_SIZE: usize = 32; | |||
const GOSSIP_PING_CACHE_CAPACITY: usize = 16384; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be based on the number of peers in gossip instead of a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should just be large enough not to kick out ping/pong packets too early. That includes scenarios where there may be spurious pull-requests from attackers. There might also be multiple packets associated with the same peer (e.g. multiple on-the-fly ping packets, or packets which were dropped).
Is there an easy/efficient way to get the number of peers here? may dynamically adjust it to account for larger network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behzadnouri, got it, thanks! I think the most convenient available way would be to use cluster_info.gossip_peers()
. It's not the most efficient because it scans the table (broadcast does something similar with cluster_info.tvu_peers()
) but may not be too bad if done periodically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, since the alternatives would require table scan and locking, i lean toward keeping this constant for now, and later test alternatives.
@behzadnouri looks good! Maybe we can use the pongs to do the upgrade? I.e. once a peer responds to a pong message, then we can mark that peer as upgraded and use the hard verify path in |
core/src/cluster_info.rs
Outdated
// TODO: For backward compatibility, this unconditionally returns | ||
// true for now. It has to return _check, once nodes start | ||
// responding to ping messages. | ||
true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can plumb in the final code here as well, making it conditional on a new feature added to runtime/src/feature_set.rs
. This way we can dynamically activate this feature on clusters once 95% of the validator fleet has upgraded. Right now FeatureSet
is not plumbed though cluster_info though, so I suggest adding this plumbing first as a forerunner PR to this one. core/
should probably use the FeatureSet
from the highest rooted bank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, working on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the diff to use the feature-set added in #12878
I think in addition of 95% fleet upgrade we also need to ensure that the newly added metric pull_request_ping_pong_check_failed_count
looks fine.
metrics/src/counter.rs
Outdated
@@ -76,7 +76,7 @@ macro_rules! inc_counter_info { | |||
#[macro_export] | |||
macro_rules! inc_new_counter { | |||
($name:expr, $count:expr, $level:expr, $lograte:expr, $metricsrate:expr) => {{ | |||
if log_enabled!($level) { | |||
if $count != 0 && log_enabled!($level) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: separate out little changes like this into their own PRs. They can land independently, and sooner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually reverted this change and instead put the if ... != 0
at the call-site.
Though the counter value does not change when event count is zero, some of the logging logic in the implementation relies on how often that function is called regardless of events counts being zero or not:
https://github.com/solana-labs/solana/blob/f5ed017d6/metrics/src/counter.rs#L172-L203
It might be somewhere in the code relies on that behavior.
4f3ed4a
to
2d21ee3
Compare
This reverts commit bae6195.
ace958b
to
4f1791c
Compare
@@ -97,6 +98,10 @@ const MAX_GOSSIP_TRAFFIC: usize = 128_000_000 / PACKET_DATA_SIZE; | |||
|
|||
/// Keep the number of snapshot hashes a node publishes under MAX_PROTOCOL_PAYLOAD_SIZE | |||
pub const MAX_SNAPSHOT_HASHES: usize = 16; | |||
/// Number of bytes in the randomly generated token sent with ping messages. | |||
const GOSSIP_PING_TOKEN_SIZE: usize = 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just 32-bytes seems like overkill. What about just u64 which seems like enough and allows for better memory efficiency and easy atomic use if needed, less cycles for compare/randomization generation etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried that will compromise security in some way, and limit usability of these tokens in the future for other security related applications beyond the current one (i.e. ip-spoofing prevention). For example responding with sha256 hash of ping token which itself was only 64 bits is kind of degenerate.
These ping/pong packets will be super low traffic, since once received they will be cached for a while. So I doubt that this will cause any noticeable performance difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Pull request has been modified.
https://hackerone.com/reports/991106 > It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker > can spoof IP address in UDP packet when sending PullRequest to the node. > There's no any validation if provided source IP address is not spoofed and > the node can send much larger PullResponse to victim's IP. As I checked, > PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means > that amplification is about 34x. This way an attacker can easily perform DDoS > attack both on Solana node and third-party server. > > To prevent it, need for example to implement ping-pong mechanism similar as > in Ethereum: Before accepting requests from remote client needs to validate > his IP. Local node sends Ping packet to the remote node and it needs to reply > with Pong packet that contains hash of matching Ping packet. Content of Ping > packet is unpredictable. If hash from Pong packet matches, local node can > remember IP where Ping packet was sent as correct and allow further > communication. > > More info: > https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof > https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol The commit adds a PingCache, which maintains records of remote nodes which have returned a valid response to a ping message, and on-the-fly ping messages pending a pong response from the remote node. When handling pull-requests, those from addresses which have not passed the ping-pong check are filtered out, and additionally ping packets are added for addresses which need to be (re)verified. (cherry picked from commit ae91270)
https://hackerone.com/reports/991106 > It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker > can spoof IP address in UDP packet when sending PullRequest to the node. > There's no any validation if provided source IP address is not spoofed and > the node can send much larger PullResponse to victim's IP. As I checked, > PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means > that amplification is about 34x. This way an attacker can easily perform DDoS > attack both on Solana node and third-party server. > > To prevent it, need for example to implement ping-pong mechanism similar as > in Ethereum: Before accepting requests from remote client needs to validate > his IP. Local node sends Ping packet to the remote node and it needs to reply > with Pong packet that contains hash of matching Ping packet. Content of Ping > packet is unpredictable. If hash from Pong packet matches, local node can > remember IP where Ping packet was sent as correct and allow further > communication. > > More info: > https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof > https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol The commit adds a PingCache, which maintains records of remote nodes which have returned a valid response to a ping message, and on-the-fly ping messages pending a pong response from the remote node. When handling pull-requests, those from addresses which have not passed the ping-pong check are filtered out, and additionally ping packets are added for addresses which need to be (re)verified. (cherry picked from commit ae91270) Co-authored-by: behzad nouri <behzadnouri@gmail.com>
https://hackerone.com/reports/991106 > It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker > can spoof IP address in UDP packet when sending PullRequest to the node. > There's no any validation if provided source IP address is not spoofed and > the node can send much larger PullResponse to victim's IP. As I checked, > PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means > that amplification is about 34x. This way an attacker can easily perform DDoS > attack both on Solana node and third-party server. > > To prevent it, need for example to implement ping-pong mechanism similar as > in Ethereum: Before accepting requests from remote client needs to validate > his IP. Local node sends Ping packet to the remote node and it needs to reply > with Pong packet that contains hash of matching Ping packet. Content of Ping > packet is unpredictable. If hash from Pong packet matches, local node can > remember IP where Ping packet was sent as correct and allow further > communication. > > More info: > https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof > https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol The commit adds a PingCache, which maintains records of remote nodes which have returned a valid response to a ping message, and on-the-fly ping messages pending a pong response from the remote node. When handling pull-requests, those from addresses which have not passed the ping-pong check are filtered out, and additionally ping packets are added for addresses which need to be (re)verified. (cherry picked from commit ae91270) # Conflicts: # Cargo.lock # core/src/cluster_info.rs
* implements ping-pong packets between nodes (#12794) https://hackerone.com/reports/991106 > It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker > can spoof IP address in UDP packet when sending PullRequest to the node. > There's no any validation if provided source IP address is not spoofed and > the node can send much larger PullResponse to victim's IP. As I checked, > PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means > that amplification is about 34x. This way an attacker can easily perform DDoS > attack both on Solana node and third-party server. > > To prevent it, need for example to implement ping-pong mechanism similar as > in Ethereum: Before accepting requests from remote client needs to validate > his IP. Local node sends Ping packet to the remote node and it needs to reply > with Pong packet that contains hash of matching Ping packet. Content of Ping > packet is unpredictable. If hash from Pong packet matches, local node can > remember IP where Ping packet was sent as correct and allow further > communication. > > More info: > https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof > https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol The commit adds a PingCache, which maintains records of remote nodes which have returned a valid response to a ping message, and on-the-fly ping messages pending a pong response from the remote node. When handling pull-requests, those from addresses which have not passed the ping-pong check are filtered out, and additionally ping packets are added for addresses which need to be (re)verified. (cherry picked from commit ae91270) # Conflicts: # Cargo.lock # core/src/cluster_info.rs * resolves mergify merge conflicts Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Problem
https://hackerone.com/reports/991106
Summary of Changes
The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.
When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.