-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoNAT should correlate dialback results with actual incoming connections #1480
Comments
Risk of not doing this: an attacker could lead us to believe we are public when we aren't, therefore jeopardising inbound connectivity. |
this is rather complicated... |
@vyzo care to elaborate? |
My understanding is that this is easy to achieve. You register a Notifee that ignores all events unless you're undergoing AutoNAT determination. At that point, you track all incoming connections, and when the peer has responded positively or negatively, you check to see if their answer is coherent with what you observed:
|
It's not that simple. The AutoNATService peer uses a background host to dial back, so the peer ID is unknown. |
Ah, understood the complexity now. I was sure I was missing something. I agree the IP address is an unreliable heuristic. I wonder if we can have the server open a stream and sign a message with its real identity, so that the client can do the matching. I really think we need to solve this one way or another. |
Can we solve this by making the AutoNAT server send the dial response as a signed peer record where the public key is the one from which the peerID of the host we asked for a dialback to was derived ? |
The above solution isn't a solution to this problem. The problem we want to solve is: "We want to be absolutely sure that the AutoNAT server did indeed dial us before sending us a dial response & isn't just faking it" |
One simple way to solve what @vyzo pointed out is for the requesting peer to send a nonce in the request, and have the responding peer return a certificate of its dialback host’s identity. It would return the peer ID, public key, and a signature of This makes the system more Byzantine Fault Tolerant. If we don’t implement this, a DHT client could be trivially misled into thinking it’s diallable, and would attempt to join the DHT as a server. |
Note: Even after we finish this, an AutoNAT server can still falsely tell a client that it's NAT status is private. |
Can't an attacker just tell us the wrong addresses? This may help, a little, in some cases, but I want to make sure it's worth the extra complexity. Also note: forcing the dial to complete means we can't optimize the dial later. In an ideal world, the AutoNAT server would just (with TLS/QUIC):
This saves the AutoNAT server from having to do any fancy crypto beyond computing the initial DH params, making this service significantly more efficient.
It's a little tricker than that.
If we do go with this, I'd like to avoid unnecessary crypto. Instead of a per-request nonce, we should just let the AutoNAT server sign their main key with their dialer/testing key once up-front. |
It can but it would also require the attacker to do some POW in the form of signing the nonce & thus isn't free. We should also validate that the returned address is among the ones we asked it to probe. I don't think we do it right now.
So, if we don't have the dialerId for an AutoNAT server, we should ask the server for a certificate & then send the dial request ? We would still have to match the dialerID with incoming connections & face the races that you mention. I agree with everything else. |
Also, note that there are ways to solve the races that you mention.
We could modify the protocol to roughly do something like: Client connects to the Server and asks for the Identity certificate -> Server sends a signed Identity certificate so we can start tracking the dialer -> Client asks the Server to go ahead with the dial -> When the Client receives the inbound dial, it sends back a nonce on the same connection -> Server echoes back the nonce in the dial back response. It wouldn't be cheap though and I haven't thought of the things that can go wrong here. |
ping @raulk to address @Stebalien's concerns. |
That's my concern. Note: If we can ensure that AutoNAT peer selection is actually random (e.g., by querying the DHT for a random set of peers as suggested by @petar), we can make this attack really hard to pull off. |
@petar Please can you elaborate on the approach @Stebalien is talking about ? |
I am guessing @Stebalien is referring to a discussion we had in person about discovering whether a node is behind a NAT. The problem that @Stebalien pointed out: If the peer you are talking to is behind the same NAT (e.g. both of you are on the same private network), then you would conclude that you are not behind a NAT. I proposed that if you lookup a random peer ID on the DHT and use them to discover whether you are behind a NAT, the chosen peer will not be in your private network (with high probability) and so you will be able to make an accurate determination. |
Note: My point here is that that solution would also help protect us (somewhat) against sibyls because we'd be choosing the nodes to test instead of just using the first ones we come across. |
@Stebalien I'm not following the line of thinking that leads to stalling here. The mechanism proposed here is a strict improvement over the status quo. Just to be clear, the scope of this issue is not to suddenly make us 100% byzantine fault tolerant (if that is even possible), but rather to make us a little more intelligent. Let's take it step by step. The first step is to not be entirely gullible. Right now, we just believe what our peer is telling us, every time. Correlating what we observe with what our peer tells us is, IMO, common sense. This would harden the private => public transition. If we consider ourselves private, and a peer tells us we're public, we should've seen an inbound dial. If not, that peer is misleading us. The risk of not performing this correlation is that it would be relatively easy to conduct a sybil attack where AutoNAT peers unconditionally report public reachability (without even performing the promised dial), and therefore trigger downstream effects, such as having everybody join the DHT (barring local conditions in those protocols). Let me address your comments individually, in follow-up comments. |
FWIW, this is an entirely different attack than the one this issue aims to thwart. Suggestion: track in another issue. |
Suggestion: let's open another issue to track this concern. |
This honestly sounds like premature optimisation. I do not expect AutoNAT to incur in a vast amount of dials such that it would make this observable. I think the global footprint of this overhead is negligible. It could be network-wide uneven if we have too few AutoNAT servers and too many AutoNAT clients (i.e. the servers are overloaded), but if we're moving to a true p2p model (where all publicly diallable nodes operate as AutoNAT servers), I expect the global load to be a lot more distributed. For perspective, comparatively, I expect DHT queries to perform a lot more dials (and in a spiky fashion) than AutoNAT. So alleviating the crypto handshake would benefit the DHT protocol a lot more than AutoNAT IMO. Suggestion: track elsewhere, at the go-libp2p level probably. |
That's fine. I suggested a per-request opaque and stateless nonce because I considered it more secure. It makes the server work a little to prove that the dialback is in response to a given request, but that might be superfluous and wouldn't award much extra security. |
I'm late to the party, but this is something we might want to pick up at some point, so here's my proposal.
Agreed, that sounds like a good idea. In my opinion, this should be part of a multi-layered defense, i.e. we should still fix the underlying vulnerability. I think we can simplify the various suggestions quite a bit and get rid of all additional crypto (no signing, no encrypting) altogether, if we're willing to pay the price of the libp2p handshake. First of all, I think establishing a connection acceptable because:
The protocol I'm suggesting is a simple 2-step protocol:
If the nonce is chosen from a large enough space (a uint64 should provide plenty of space for this purpose), collisions are sufficiently unlikely. Possible attack: There's no way to actually prove that the receiver actually dialed the address contained in the request to send a certain identifier. An attacker could wait for a request, and transmit the identifier one a connection dialed to a different address, falsely leading the initiator to believe that the requested address is actually reachable. I don't see any defense against this attack, other than randomly selecting the peers. |
Really? Isn't it possible to connect to a QUIC endpoint, receive their side of the handshake, then kill the connection before authenticating? |
Note: your proposal sounds reasonable, and I guess my previous comment might fall under "hacky".
Eh, there's no going around this really. |
There's the Retry mechanism, which is designed for the server to validate return routability to the client's address. It's extremely lightweight, as it doesn't even require decryption of the packet, but for the client there's no reliable way to trigger a Retry packet. A client could also abort the handshake right after receiving the server's TLS certificate, but at this point, the computationally expensive part of the handshake is already over.
We need to decide if we keep the protocol ID constant (and add fields to the protobufs), or bump the version number of this protocol. As this is quite a significant deviation from what we have so far (in terms of wire encoding, logic and security properties), I'm leaning towards bumping the version number, and doing a phased upgrade. |
Yes, I think we'd need to bump the protocol version. |
Right now it's pretty trivial to lie to an AutoNAT client by reporting incorrect dialback results. We should register a Notifee and track incoming connections when a dialback is requested, so we can correlate an OK result with an actual observed incoming connection. This makes it more difficult for enemies to confuse us.
The text was updated successfully, but these errors were encountered: