feat: auto relay #723

vasco-santos · 2020-07-31T10:21:28Z

This PR is part of the #699 initiative to add support for AutoRelay.

It includes the initial points described in the issue for connecting with peers with a relay protocol, more precisely, it includes this points:

When a node connects to a peer that announces support for the relay protocol, /libp2p/circuit/relay/0.1.0, it may ask that node if it support HOP.
If the peer supports HOP, libp2p should add it to its list of known relays.
When a known relay is added, libp2p should check to see if it needs a/another relay (to meet some min threshold). If below that threshold, libp2p should bind to that relay and add it to its announced addresses.

Follow up PRs:

When libp2p updates its addresses, it should update the network - Identify push on add/remove listen + self peer record update on addr change
Query the network if it needs more relays
New connectionManager integration for disconnect improvement

It is important to mention the following implementation details:

After a connection is established, the identify protocol will kick in. As a result of the identify protocols, peers will exchange the protocols they support. The auto-relay relies on the change:protocols event from the PeerStore, in order to understand what peers to ask for HOP support
When a peer supports HOP, this information is stored in the metadataBook
When a peer disconnects (or does not support relay protocol anymore - this should not be possible at the moment), remaining open connections with peers supporting HOP will be used to replace the disconnected one
- This should be refactored to leverage the new connectionManager as a follow up PR
Already relayed connections, will not be used for auto relay
Regarding 3, maxListeners was added (happy to reconsider the name) instead the minThreshold. It limits the number of relay addresses to listen on. Each time a node is connected, it will ask if it supports HOP (if peer has the relay protocol). This should be polished with the connectionManagement improvements to have a general approach for the autoDial

src/circuit/auto-relay.js

src/circuit/listener.js

vasco-santos · 2020-07-31T13:00:11Z

@jacobheun your thoughts regarding the current direction of this work are welcome :)
I think this first iteration is almost ready

jacobheun · 2020-08-04T10:59:15Z

test/relay/auto-relay.node.js

+      ;[libp2p, relayLibp2p1, relayLibp2p2, relayLibp2p3] = peerIds.map((peerId, index) => {
+        const opts = baseOptions
+
+        if (index !== 0) {


This is creating 3 hop relays, is that intended?

Yes, this is used to test the maximum number of relays to listen on

src/circuit/auto-relay.js

vasco-santos · 2020-09-03T10:08:22Z

@jacobheun this is ready for review now. Should we create a base branch for 0.30.x or for the work included in #703 ?

jacobheun · 2020-09-03T10:11:52Z

Should we create a base branch for 0.30.x or for the work included in #703?

@vasco-santos Yeah let's do that, there are a few things we want to land in conjunction with this so it would be good to gather those outside of master.

vasco-santos · 2020-09-03T10:14:39Z

I changed it to 0.30.x branch as we might not include all of #703 in the next release

jacobheun · 2020-09-08T14:56:55Z

src/circuit/auto-relay.js

+    this._listenRelays.add(id)
+
+    // Create relay listen addr
+    const remoteMultiaddr = connection.remoteAddr


We may need to do some better selection for the remoteAddr as this is an observed address and might not actually be dialable. It's possible MDNS could have caused us to receive an incoming connection, or we could have dialed a private address on a VPN. We should probably take from the relays announced addresses and avoid using private multiaddrs.

That is a good point! I will just get the first addr (that is certified) from the AddressBook. We will already have the connection to the peer, so it will just be reused

We will already have the connection to the peer, so it will just be reused

The problem is our advertised address for this. The actual address we're connected to the peer doesn't matter, the address of the relay we advertise to other peers does.

jacobheun · 2020-09-08T15:06:38Z

src/circuit/auto-relay.js

+    if (!remoteMultiaddr.protoNames().includes('p2p')) {
+      listenAddr = `${remoteMultiaddr.toString()}/p2p/${connection.remotePeer.toB58String()}/p2p-circuit/p2p/${this._peerId.toB58String()}`
+    } else {
+      listenAddr = `${remoteMultiaddr.toString()}/p2p-circuit/p2p/${this._peerId.toB58String()}`


We shouldn't need our peer id in the listen address, it should just end with /p2p-circuit. The circuit listener logic will need to be fixed because it expects our id at the end.

The circuit listener logic will need to be fixed because it expects our id at the end.

It does not seem necessary. I just checked and the Circuit listener does not seem to use the id at all. Also removed it and everything seems to work as expected.

jacobheun · 2020-09-08T15:19:33Z

src/circuit/auto-relay.js

+   * Try to listen on available hop relay connections.
+   * @return {Promise<void>}
+   */
+  async _listenOnAvailableHopRelays () {


This function should change, really what we want to do is find relays if we don't have enough. This would involve:

Check the metadata store for known relays, try them first if we're not connected

If we dont have enough, find relays on the network. (https://github.com/libp2p/go-libp2p/blob/fb3179e617bdf049734f58aa8ced2bb42de7ea5d/p2p/host/relay/autorelay.go#L250)

Autorelay should also start with a network search for relays. Ideally this would happen once we determined ourselves to be private (via autoNat) so we don't bind to relays when we're publicly dialable, but we dont have autonat yet.

If we find multiple relays we should eventually prioritize by lowest latency, but for now we can focus on the use case of users configuring their node to listen to a specific relay, and allow ambient canhop checks to supplement it.

I mentioned in the description that this was the initial PR for the issue. The follow up PR would include that part to ease the implementation/review. But, if we start by a simple version, I can also include it here:

I will change the logic a bit, it is important to notice that we can be connected to a peer that supports hop, but we discarded it before and we should leverage it after if we need to.

Regarding the start part, I was also expecting as a follow up. We can start it when the circuit listener start, but we need to guarantee that the peerStore is loaded first.

Reconsidering this, I think we should try to leverage the connections we already have before dialling other peers. I think we should start by doing what I had:

iterate the available connections, if we are connected to a peer which supports hop but we are not listening on it, we should add it as a listener.

if we still need more peers, go to the PeerStore (MetadataBook)

Finally (if we need more), we find relays on the network

I am hitting an "issue" with this approach, which also happens as you suggested. Considering that we call this if a disconnect happens and we remove a listen relay, if we iterate on the metadataBook, we will just try to reconnect to that node. This flow is odd and should be avoided. We would sort peers by score and eventually not dial it in the future, but for now we cannot do that. Or we "accept" this behaviour for now, or _listenOnAvailableHopRelays should receive an optional list of peers to ignore (where we would add the disconnected peer, if we are call _listenOnAvailableHopRelays as a consequence of it.

Why not prioritize known relays by whether or not they're connected?

Check the Peer Store for relay peers

If we're connected and they're not already a relay, bind to them

If we don't have enough, perform a search

Once the search is done, attempt to bind to known relays (which may include a dial)

If we're already connected to them we should have already checked their HOP status, so prioritizing all known relays by their connectedness makes sense, instead fo checking each connected peer which would be less efficient.

The immediate redial problem should really be solved with backoffs (connection gating, plus maybe a ttl tag in the peer store), especially if they disconnected from us.

If we have already connections that we can use to bind, I think they should be used first instead of establishing new connections with other peers. Specially as we head towards having more meaningful connections to the peer. We will be establishing a new connection that we will want to maintain (auto relay binds will be important connections to the peer), while we could be using a connection that might be already important for the peer (as it already exists). It will also be faster to replace the listen address.

instead fo checking each connected peer which would be less efficient.

Yes, we will need to iterate on the connections and do a metadataBook.get.
But, we can also iterate the metadataBook and check if we have a connection to the peers that support hop. Listen on them if we have, and follow up on the ones we were not connected after. We will get some more complex logic and additional gets to the connections map, but isn't this worth than establishing new connections if we can use the ones we already have?

The immediate redial problem should really be solved with backoffs (connection gating, plus maybe a ttl tag in the peer store), especially if they disconnected from us.

Yeah, so I will tackle that together with the connection manager stuff.

src/circuit/circuit/hop.js

jacobheun · 2020-09-08T15:23:39Z

src/circuit/listener.js

  const listener = new EventEmitter()
  const listeningAddrs = new Map()

+  // Remove listeningAddrs when a peer disconnects
+  libp2p.connectionManager.on('peer:disconnect', (connection) => {
+    listeningAddrs.delete(connection.remotePeer.toB58String())


If this actually causes a change in our addresses we need to perform an identify push .

I will call the identify push there if deleted. But, once we support a dynamic transport manager with runtime transport add and remove, we should probably have the IdentifyService to listen on updates

I was planning on doing this in a follow up PR (point 4 in the issue), as I was also not doing the push when the new listen addr is added.
Anyway, I can do both now

Follow up PRs are fine for this, but it would be good to at least denote some todo comments so we don't forget code paths in the subsequent PRs.

I will have the comments and do it as a follow up after all. I found a bug on connect + disconnect because the addr were not certified anymore. Then I remembered that we still did not support invalidating the self record on multiaddr change: https://github.com/libp2p/js-libp2p/blob/v0.29.0/src/identify/index.js#L319

I will do that in the follow up PR with the identify push updates

vasco-santos · 2020-09-10T08:28:21Z

This is ready for review again @jacobheun , the expected follow up PRs are described in the main post of the PR:

When libp2p updates its addresses, it should update the network - Identify push on add/remove listen + self peer record update on addr change
Query the network if it needs more relays
New connectionManager integration for disconnect improvement

Note: codecov jobs are not working properly atm. I rebuild everything, but it is failing the gcov command. So, some of the annotations in the PR diff are not valid as there are actually tests for that flows, but were added in the last commit

jacobheun · 2020-09-15T10:58:28Z

src/circuit/auto-relay.js

+   * @constructor
+   * @param {object} props
+   * @param {Libp2p} props.libp2p
+   * @param {number} props.maxListeners maximum number of relays to listen.


Should we just default this to 1?

jacobheun · 2020-09-15T11:05:56Z

src/circuit/auto-relay.js

+
+    try {
+      const remoteAddrs = this._peerStore.addressBook.get(connection.remotePeer)
+      remoteMultiaddr = remoteAddrs.find(a => a.isCertified).multiaddr // Get first announced address certified


This isn't going to be sufficient, we'll need to try and prioritize public addresses for this. It's still pretty common for peers to advertise private addresses. We can make a note to do this in a followup PR though.

HOP relays should really avoid advertising private addresses (we should document this in a "setting up relays" section of the production guides), but we can't rely on this behavior.

Yes, we have a milestone for it, but I will add a comment: #699 (comment)

I am not sure yet on the best approach for this, but it is being tracked

jacobheun · 2020-09-15T11:12:06Z

src/circuit/auto-relay.js

+    }
+
+    // Attempt to listen on relay
+    this._listenRelays.add(id)


Why not just do the add after the listen call in the try? Then the delete isn't required if it fails.

Well, we will block inside the following try ... catch block, waiting for the transportManager.listen.
If we get multiple calls at the same time, all of them will not be blocked from reaching the transportManager.listen and we will probably end up with more listenRelays than needed.
Perhaps, I can use a p-queue instead? Otherwise, I might also cancel others that fail (should not expect transportManager.listen to fail, but it can)

I think we can leave it like this for now. It really shouldn't fail, we're already connected. If it does fail that should mean the connection was dropped during the listen attempt, which should trigger us to connect to another known relay if it exists.

jacobheun · 2020-09-15T11:53:15Z

src/circuit/listener.js

@@ -24,7 +34,7 @@ module.exports = (circuit) => {
  listener.listen = async (addr) => {
    const addrString = String(addr).split('/p2p-circuit').find(a => a !== '')

-    const relayConn = await circuit._dialer.connectToPeer(multiaddr(addrString))
+    const relayConn = await libp2p.dial(multiaddr(addrString))
    const relayedAddr = relayConn.remoteAddr.encapsulate('/p2p-circuit')


Something to consider here is that in AutoRelay we are creating a listen addr and then calling transportManager.listen to "connect". We will already have a connection per the AutoRelay logic, so that address is going to get thrown out and replaced with whatever this address is. We need to do some reconciliation here, I would find it odd to call listen with one address, and then have my actual address end up being something different.

I think this likely won't be a huge issue for initial relay connections, but if we reconnect to known relays, this address could change. The provided address should be the address we end up with, but if it fails for some reason we will dial other known addresses for the peer.

We need to do some reconciliation here, I would find it odd to call listen with one address, and then have my actual address end up being something different.

That is true! I changed this to avoid creating multiple connections to the same peer, but I agree with your point.
So you suggest that we go back and use the connectToPeer with a fallback to the dial if we fail?

So you suggest that we go back and use the connectToPeer with a fallback to the dial if we fail?

No. We shouldn't care how we're connected to the peer, really, but we need to be careful about the address we're advertising for this. Something like: Prioritize public addresses, if the connected address matches one of those use it, otherwise pick one of the others.

Again, I think we can do a follow PR for this, so we can focus on clear tests for that.

jacobheun · 2020-09-15T13:15:26Z

CI is failing, but otherwise I think we can merge once that's fixed and continue work in subsequent PRs

vasco-santos · 2020-09-15T13:51:10Z

Done @jacobheun ! CI test are good, don't know what happened before, but just removed cache and re-run and everything got good again. However, the codecov CI got broken in this PR and it never ran again 🤷‍♂️

* feat: auto relay * fix: leverage protoBook events to ask relay peers if they support hop * chore: refactor disconnect * chore: do not listen on a relayed conn * chore: tweaks * chore: improve _listenOnAvailableHopRelays logic * chore: default value of 1 to maxListeners on auto-relay

vasco-santos force-pushed the feat/auto-relay branch from 22ef975 to 9067e63 Compare July 31, 2020 10:54

vasco-santos commented Jul 31, 2020

View reviewed changes

src/circuit/auto-relay.js Outdated Show resolved Hide resolved

src/circuit/listener.js Show resolved Hide resolved

vasco-santos mentioned this pull request Jul 31, 2020

[discovery] Add support for AutoRelay #699

Closed

jacobheun reviewed Aug 4, 2020

View reviewed changes

jacobheun force-pushed the 0.29.x branch from 54bdff6 to 02b6248 Compare August 12, 2020 15:11

jacobheun force-pushed the feat/auto-relay branch from 9067e63 to aa5a596 Compare August 12, 2020 15:23

vasco-santos force-pushed the feat/auto-relay branch from aa5a596 to f1e3bfd Compare August 26, 2020 10:23

Base automatically changed from 0.29.x to master August 27, 2020 13:38

vasco-santos force-pushed the feat/auto-relay branch 2 times, most recently from 0e76d82 to 2c11cca Compare August 28, 2020 16:43

vasco-santos added 4 commits September 1, 2020 11:21

feat: auto relay

d4bee23

fix: leverage protoBook events to ask relay peers if they support hop

6c901b9

chore: refactor disconnect

1dccb7c

chore: do not listen on a relayed conn

02dbdc3

vasco-santos force-pushed the feat/auto-relay branch 4 times, most recently from 5b79b09 to ee3682c Compare September 1, 2020 14:49

chore: tweaks

d8f0849

vasco-santos force-pushed the feat/auto-relay branch from ee3682c to d8f0849 Compare September 3, 2020 09:23

vasco-santos marked this pull request as ready for review September 3, 2020 09:23

vasco-santos requested a review from jacobheun September 3, 2020 10:07

vasco-santos changed the base branch from master to 0.30.x September 3, 2020 10:14

jacobheun reviewed Sep 8, 2020

View reviewed changes

chore: improve _listenOnAvailableHopRelays logic

bd613cb

vasco-santos force-pushed the feat/auto-relay branch from db39ca7 to bd613cb Compare September 9, 2020 19:34

vasco-santos requested a review from jacobheun September 10, 2020 08:25

This was referenced Sep 10, 2020

chore: auto relay multiaddr update push #748

Merged

feat: auto relay network query #749

Merged

jacobheun reviewed Sep 15, 2020

View reviewed changes

chore: default value of 1 to maxListeners on auto-relay

ac430a2

jacobheun approved these changes Sep 16, 2020

View reviewed changes

jacobheun merged commit c3039a0 into 0.30.x Sep 16, 2020

jacobheun deleted the feat/auto-relay branch September 16, 2020 14:43

vasco-santos restored the feat/auto-relay branch September 17, 2020 07:58

vasco-santos deleted the feat/auto-relay branch September 17, 2020 07:59

vasco-santos mentioned this pull request Dec 10, 2020

⚡️ 0.30 RELEASE 🚀 #655

Closed

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto relay #723

feat: auto relay #723

vasco-santos commented Jul 31, 2020 •

edited

Loading

vasco-santos commented Jul 31, 2020

jacobheun Aug 4, 2020

vasco-santos Aug 4, 2020

vasco-santos commented Sep 3, 2020

jacobheun commented Sep 3, 2020

vasco-santos commented Sep 3, 2020

jacobheun Sep 8, 2020

vasco-santos Sep 9, 2020 •

edited

Loading

jacobheun Sep 9, 2020

jacobheun Sep 8, 2020

vasco-santos Sep 9, 2020 •

edited

Loading

jacobheun Sep 8, 2020

vasco-santos Sep 9, 2020

vasco-santos Sep 9, 2020 •

edited

Loading

jacobheun Sep 9, 2020

vasco-santos Sep 9, 2020

vasco-santos Sep 9, 2020

jacobheun Sep 8, 2020

vasco-santos Sep 9, 2020

vasco-santos Sep 9, 2020

jacobheun Sep 9, 2020

vasco-santos Sep 9, 2020

vasco-santos commented Sep 10, 2020 •

edited

Loading

jacobheun Sep 15, 2020

jacobheun Sep 15, 2020

vasco-santos Sep 15, 2020

jacobheun Sep 15, 2020

vasco-santos Sep 15, 2020

jacobheun Sep 15, 2020

jacobheun Sep 15, 2020

vasco-santos Sep 15, 2020

jacobheun Sep 15, 2020

jacobheun commented Sep 15, 2020

vasco-santos commented Sep 15, 2020

feat: auto relay #723

feat: auto relay #723

Conversation

vasco-santos commented Jul 31, 2020 • edited Loading

vasco-santos commented Jul 31, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vasco-santos commented Sep 3, 2020

jacobheun commented Sep 3, 2020

vasco-santos commented Sep 3, 2020

Choose a reason for hiding this comment

vasco-santos Sep 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vasco-santos Sep 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vasco-santos Sep 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vasco-santos commented Sep 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacobheun commented Sep 15, 2020

vasco-santos commented Sep 15, 2020

vasco-santos commented Jul 31, 2020 •

edited

Loading

vasco-santos Sep 9, 2020 •

edited

Loading

vasco-santos Sep 9, 2020 •

edited

Loading

vasco-santos Sep 9, 2020 •

edited

Loading

vasco-santos commented Sep 10, 2020 •

edited

Loading