-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
relay/DCUtR: Add Direct Connection Upgrade through Relay protocol #173
Changes from 12 commits
727f8b1
9db77f0
fee2b99
97e5d61
75ed30b
4ccccf5
4b9549a
73064f9
dfc988c
46bd410
4e94481
9d42524
9958df2
4b7c1ce
fe64a21
db9475e
2d8b38f
6530d45
0076c69
b420064
af0b9bb
6f475de
17f6275
5943d3b
6f558f1
f7b43df
85f567d
cab60cc
8001cd9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,148 @@ | ||
# Direct Connection Upgrade through Relay | ||
|
||
| Lifecycle Stage | Maturity | Status | Latest Revision | | ||
|-----------------|---------------|--------|--------------------| | ||
| 1A | Working Draft | Active | DRAFT, 2019-05-29 | | ||
|
||
Authors: [@vyzo] | ||
|
||
Interest Group: [@raulk], [@stebalien], [@whyrusleeping] | ||
|
||
[@vyzo]: https://github.com/vyzo | ||
[@raulk]: https://github.com/raulk | ||
[@stebalien]: https://github.com/stebalien | ||
[@whyrusleeping]: https://github.com/whyrusleeping | ||
|
||
See the [lifecycle document](https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md) | ||
for context about maturity level and spec status. | ||
|
||
## Table of Contents | ||
|
||
- [Direct Connection Upgrade through Relay](#direct-connection-upgrade-through-relay) | ||
- [Introduction](#introduction) | ||
- [The Protocol](#the-protocol) | ||
- [Protobuf](#protobuf) | ||
- [Implementation Considerations](#implementation-considerations) | ||
- [References](#references) | ||
|
||
## Introduction | ||
|
||
NAT traversal is a quintessential problem in peer-to-peer networks. | ||
|
||
We currently utilize relays, which allow us to traverse NATs by using | ||
a third party as proxy. Relays are a reliable fallback, that can | ||
connect peers behind NAT albeit with a high-latency, low-bandwidth | ||
connection. Unfortunately, they are expensive to scale and maintain | ||
if they have to carry all the NATed node traffic in the network. | ||
|
||
It is often possible for two peers behind NAT to communicate directly | ||
by utilizing a technique called _hole punching_[1]. The technique | ||
relies on the two peers synchronizing and simultaneously opening | ||
connections to each other to their predicted external address. It | ||
works well for UDP, with an estimated 80% success rate, and reasonably | ||
well for TCP, with an estimated 60% success rate. | ||
|
||
The problem in hole punching, apart from not working all the time, is | ||
the need for rendezvous and synchronization. This is usually | ||
accomplished using dedicated signaling servers [2]. However, this | ||
introduces yet another piece of infrastructure, while still requiring | ||
the use of relays as a fallback for the cases where a direct | ||
connection is not possible. | ||
|
||
In this draft, we describe a synchronization protocol for direct | ||
connectivity with hole punching that eschews signaling servers and | ||
utilizes existing relay connections instead. That is, peers start | ||
with a relay connection and synchronize directly, without the use of a | ||
signaling server. If the hole punching attempt is successful, the | ||
peers _upgrade_ their connection to a direct connection and they can | ||
close the relay connection. If the hole punching attempt fails, they | ||
can keep using the relay connection as they were. | ||
|
||
## The Protocol | ||
|
||
Consider two peers, `A` and `B`. `A` wants to connect to `B`, which is | ||
behind a NAT and advertises relay addresses. `A` may itself be behind | ||
a NAT or be a public node. | ||
|
||
The protocol starts with the completion of a relay connection from `A` | ||
to `B`. Upon observing the new connection, the inbound peer (here `B`) | ||
raulk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
checks the addresses advertised by `A` via identify. If that set | ||
includes public addresses, then `A` _may_ be reachable by a direct | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't it possible that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes it is possible, but that would have been dialed directly as the private addresses are still advertised with relay addresses. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think @albrow has a point. @vyzo: while that should be the case, if we want to be resilient and robust, this protocol should not make assumptions about how any other part of the system behaves. Usually those implicit assumptions make systems brittle. Luckily our spec lifecycle process allows us to add this topic as an active discussion:
from: https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not making this assumption will make us dial private addresses in vain multiple times. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At best, we can consider dialing them in the bidirectional part of the protocol. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, if A is public and B is private, we can't possibly be behind the same NAT. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Furthermore, for the bidirectional part of the protocol we could check the public address of the other node. If that doesn't match our own, we can't possibly be behind the same NAT and dialing private addrs is pointless. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice to avoid dialing private addrs if we can avoid it though. Perhaps we could still exchange them, but in a separate field. Then they can be ignored unless your public address matches the other node and you infer that you're behind the same NAT. Or your implementation may be able to always ignore them, since they would have been dialed previously. Anyway, I agree that we could punt on this for this round and discuss when we promote to candidate rec. |
||
connection, in which case `B` attempts a unilateral connection upgrade | ||
by initiating a direct connection to `A`. | ||
|
||
If the unilateral connection upgrade attempt fails or if `A` is itself a NATed peer that | ||
doesn't advertise public address, then `B` initiates the direct connection | ||
mxinden marked this conversation as resolved.
Show resolved
Hide resolved
|
||
upgrade protocol as follows: | ||
<!-- Note the golang implementation is using "/libp2p/holepunch/1.0.0" --> | ||
1. `B` opens a stream to `A` using the `/libp2p/connect` protocol | ||
2. `B` sends to `A` a `Connect` message containing its observed (and possibly predicted) | ||
mxinden marked this conversation as resolved.
Show resolved
Hide resolved
|
||
addresses from identify and starts a timer to measure RTT of the relay connection. | ||
3. Upon receving the `Connect`, `A` responds back with a `Connect` message containing | ||
its observed (and possibly predicted) addresses. | ||
4. Upon receiving the `Connect`, `B` sends a `Sync` message and starts a timer for | ||
half the RTT measured from the time between sending the initial `Connect` and receiving | ||
the response. | ||
5. Simultaneous Connect | ||
mxinden marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Upon receiving the `Sync`, `A` immediately starts a direct dial to B using the addresses | ||
obtained from the `Connect` message. | ||
- Upon expiry of the timer, `B` starts a direct dial to `A` using the addresses obtained | ||
from the `Connect` message. | ||
|
||
<!-- TODO: Document retry logic --> | ||
|
||
The purpose of the `Sync` message and `B`'s timer is to allow the two peers to synchronize | ||
so that they perform a simultaneous open that allows hole punching to succeed. | ||
|
||
raulk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
If the direct connection is successful, then the peers should migrate | ||
marten-seemann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
to it by prioritizing over the existing relay connection. All new | ||
streams should be opened in the direct connection, while the relay | ||
connection should be closed after a grace period. Existing indefinite | ||
duration streams will have to be recreated in the new connection once | ||
the relay connection is closed. This can be accomplised by observing | ||
network notifications: the new direct connection will emit a new | ||
`Connected` notification, while closing the relay connection will | ||
sever existing streams and emit `Disconnected` notification. | ||
|
||
|
||
### RPC messages | ||
|
||
All RPC messages sent over a stream are prefixed with the message length in | ||
bytes, encoded as an unsigned variable length integer as defined by the | ||
[multiformats unsigned-varint spec][uvarint-spec]. | ||
|
||
Implemntations SHOULD refuse encoded RPC messages (length prefix excluded) | ||
mxinden marked this conversation as resolved.
Show resolved
Hide resolved
|
||
larger than 4 KiB. | ||
|
||
RPC messages conform to the following protobuf schema: | ||
|
||
```proto | ||
syntax = "proto2"; | ||
|
||
package holepunch.pb; | ||
|
||
message HolePunch { | ||
enum Type { | ||
CONNECT = 100; | ||
SYNC = 300; | ||
} | ||
|
||
optional Type type=1; | ||
|
||
// For hole punching, we'll send some additional observed addresses to the remote peer | ||
// that could have been filtered by the Host address factory (for example: AutoRelay removes all public addresses if peer has private reachability). | ||
// This is a hack! | ||
// We plan to have a better address discovery and advertisement mechanism in the future. | ||
// See https://github.com/libp2p/go-libp2p-autonat/pull/98 | ||
repeated bytes ObsAddrs = 2; | ||
mxinden marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
``` | ||
|
||
## References | ||
|
||
1. Peer-to-Peer Communication Across Network Address Translators. B. Ford and P. Srisuresh. | ||
https://pdos.csail.mit.edu/papers/p2pnat.pdf | ||
2. Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols. IETF RFC 5245. | ||
https://tools.ietf.org/html/rfc5245 | ||
|
||
[uvarint-spec]: https://github.com/multiformats/unsigned-varint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we see much better numbers than this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they have been in the same ballpark, but I might as well be mistaken. Unfortunately I am unable to access the data from project flare phase 1. Either the data or my access seems to be removed.
@vyzo do you know more here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion continued on #173 (comment).