Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: auto relay #723

Merged
merged 7 commits into from
Sep 16, 2020
Merged

feat: auto relay #723

merged 7 commits into from
Sep 16, 2020

Conversation

vasco-santos
Copy link
Member

@vasco-santos vasco-santos commented Jul 31, 2020

This PR is part of the #699 initiative to add support for AutoRelay.

It includes the initial points described in the issue for connecting with peers with a relay protocol, more precisely, it includes this points:

  1. When a node connects to a peer that announces support for the relay protocol, /libp2p/circuit/relay/0.1.0, it may ask that node if it support HOP.
  2. If the peer supports HOP, libp2p should add it to its list of known relays.
  3. When a known relay is added, libp2p should check to see if it needs a/another relay (to meet some min threshold). If below that threshold, libp2p should bind to that relay and add it to its announced addresses.

Follow up PRs:

  1. When libp2p updates its addresses, it should update the network - Identify push on add/remove listen + self peer record update on addr change
  2. Query the network if it needs more relays
  3. New connectionManager integration for disconnect improvement

It is important to mention the following implementation details:

  • After a connection is established, the identify protocol will kick in. As a result of the identify protocols, peers will exchange the protocols they support. The auto-relay relies on the change:protocols event from the PeerStore, in order to understand what peers to ask for HOP support
  • When a peer supports HOP, this information is stored in the metadataBook
  • When a peer disconnects (or does not support relay protocol anymore - this should not be possible at the moment), remaining open connections with peers supporting HOP will be used to replace the disconnected one
    • This should be refactored to leverage the new connectionManager as a follow up PR
  • Already relayed connections, will not be used for auto relay
  • Regarding 3, maxListeners was added (happy to reconsider the name) instead the minThreshold. It limits the number of relay addresses to listen on. Each time a node is connected, it will ask if it supports HOP (if peer has the relay protocol). This should be polished with the connectionManagement improvements to have a general approach for the autoDial

src/circuit/auto-relay.js Outdated Show resolved Hide resolved
src/circuit/listener.js Show resolved Hide resolved
@vasco-santos
Copy link
Member Author

@jacobheun your thoughts regarding the current direction of this work are welcome :)
I think this first iteration is almost ready

;[libp2p, relayLibp2p1, relayLibp2p2, relayLibp2p3] = peerIds.map((peerId, index) => {
const opts = baseOptions

if (index !== 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is creating 3 hop relays, is that intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is used to test the maximum number of relays to listen on

src/circuit/auto-relay.js Show resolved Hide resolved
Base automatically changed from 0.29.x to master August 27, 2020 13:38
@vasco-santos vasco-santos force-pushed the feat/auto-relay branch 2 times, most recently from 0e76d82 to 2c11cca Compare August 28, 2020 16:43
@vasco-santos vasco-santos force-pushed the feat/auto-relay branch 4 times, most recently from 5b79b09 to ee3682c Compare September 1, 2020 14:49
@vasco-santos vasco-santos marked this pull request as ready for review September 3, 2020 09:23
@vasco-santos
Copy link
Member Author

@jacobheun this is ready for review now. Should we create a base branch for 0.30.x or for the work included in #703 ?

@jacobheun
Copy link
Contributor

Should we create a base branch for 0.30.x or for the work included in #703?

@vasco-santos Yeah let's do that, there are a few things we want to land in conjunction with this so it would be good to gather those outside of master.

@vasco-santos vasco-santos changed the base branch from master to 0.30.x September 3, 2020 10:14
@vasco-santos
Copy link
Member Author

I changed it to 0.30.x branch as we might not include all of #703 in the next release

this._listenRelays.add(id)

// Create relay listen addr
const remoteMultiaddr = connection.remoteAddr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to do some better selection for the remoteAddr as this is an observed address and might not actually be dialable. It's possible MDNS could have caused us to receive an incoming connection, or we could have dialed a private address on a VPN. We should probably take from the relays announced addresses and avoid using private multiaddrs.

Copy link
Member Author

@vasco-santos vasco-santos Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point! I will just get the first addr (that is certified) from the AddressBook. We will already have the connection to the peer, so it will just be reused

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will already have the connection to the peer, so it will just be reused

The problem is our advertised address for this. The actual address we're connected to the peer doesn't matter, the address of the relay we advertise to other peers does.

if (!remoteMultiaddr.protoNames().includes('p2p')) {
listenAddr = `${remoteMultiaddr.toString()}/p2p/${connection.remotePeer.toB58String()}/p2p-circuit/p2p/${this._peerId.toB58String()}`
} else {
listenAddr = `${remoteMultiaddr.toString()}/p2p-circuit/p2p/${this._peerId.toB58String()}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need our peer id in the listen address, it should just end with /p2p-circuit. The circuit listener logic will need to be fixed because it expects our id at the end.

Copy link
Member Author

@vasco-santos vasco-santos Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The circuit listener logic will need to be fixed because it expects our id at the end.

It does not seem necessary. I just checked and the Circuit listener does not seem to use the id at all. Also removed it and everything seems to work as expected.

* Try to listen on available hop relay connections.
* @return {Promise<void>}
*/
async _listenOnAvailableHopRelays () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should change, really what we want to do is find relays if we don't have enough. This would involve:

  1. Check the metadata store for known relays, try them first if we're not connected
  2. If we dont have enough, find relays on the network. (https://github.com/libp2p/go-libp2p/blob/fb3179e617bdf049734f58aa8ced2bb42de7ea5d/p2p/host/relay/autorelay.go#L250)

Autorelay should also start with a network search for relays. Ideally this would happen once we determined ourselves to be private (via autoNat) so we don't bind to relays when we're publicly dialable, but we dont have autonat yet.

If we find multiple relays we should eventually prioritize by lowest latency, but for now we can focus on the use case of users configuring their node to listen to a specific relay, and allow ambient canhop checks to supplement it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned in the description that this was the initial PR for the issue. The follow up PR would include that part to ease the implementation/review. But, if we start by a simple version, I can also include it here:

I will change the logic a bit, it is important to notice that we can be connected to a peer that supports hop, but we discarded it before and we should leverage it after if we need to.

Regarding the start part, I was also expecting as a follow up. We can start it when the circuit listener start, but we need to guarantee that the peerStore is loaded first.

Copy link
Member Author

@vasco-santos vasco-santos Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconsidering this, I think we should try to leverage the connections we already have before dialling other peers. I think we should start by doing what I had:

  1. iterate the available connections, if we are connected to a peer which supports hop but we are not listening on it, we should add it as a listener.
  2. if we still need more peers, go to the PeerStore (MetadataBook)
  3. Finally (if we need more), we find relays on the network

I am hitting an "issue" with this approach, which also happens as you suggested. Considering that we call this if a disconnect happens and we remove a listen relay, if we iterate on the metadataBook, we will just try to reconnect to that node. This flow is odd and should be avoided. We would sort peers by score and eventually not dial it in the future, but for now we cannot do that. Or we "accept" this behaviour for now, or _listenOnAvailableHopRelays should receive an optional list of peers to ignore (where we would add the disconnected peer, if we are call _listenOnAvailableHopRelays as a consequence of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not prioritize known relays by whether or not they're connected?

  • Check the Peer Store for relay peers
  • If we're connected and they're not already a relay, bind to them
  • If we don't have enough, perform a search
  • Once the search is done, attempt to bind to known relays (which may include a dial)

If we're already connected to them we should have already checked their HOP status, so prioritizing all known relays by their connectedness makes sense, instead fo checking each connected peer which would be less efficient.

The immediate redial problem should really be solved with backoffs (connection gating, plus maybe a ttl tag in the peer store), especially if they disconnected from us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have already connections that we can use to bind, I think they should be used first instead of establishing new connections with other peers. Specially as we head towards having more meaningful connections to the peer. We will be establishing a new connection that we will want to maintain (auto relay binds will be important connections to the peer), while we could be using a connection that might be already important for the peer (as it already exists). It will also be faster to replace the listen address.

instead fo checking each connected peer which would be less efficient.

Yes, we will need to iterate on the connections and do a metadataBook.get.
But, we can also iterate the metadataBook and check if we have a connection to the peers that support hop. Listen on them if we have, and follow up on the ones we were not connected after. We will get some more complex logic and additional gets to the connections map, but isn't this worth than establishing new connections if we can use the ones we already have?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immediate redial problem should really be solved with backoffs (connection gating, plus maybe a ttl tag in the peer store), especially if they disconnected from us.

Yeah, so I will tackle that together with the connection manager stuff.

src/circuit/circuit/hop.js Show resolved Hide resolved
const listener = new EventEmitter()
const listeningAddrs = new Map()

// Remove listeningAddrs when a peer disconnects
libp2p.connectionManager.on('peer:disconnect', (connection) => {
listeningAddrs.delete(connection.remotePeer.toB58String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this actually causes a change in our addresses we need to perform an identify push .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will call the identify push there if deleted. But, once we support a dynamic transport manager with runtime transport add and remove, we should probably have the IdentifyService to listen on updates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning on doing this in a follow up PR (point 4 in the issue), as I was also not doing the push when the new listen addr is added.
Anyway, I can do both now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up PRs are fine for this, but it would be good to at least denote some todo comments so we don't forget code paths in the subsequent PRs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have the comments and do it as a follow up after all. I found a bug on connect + disconnect because the addr were not certified anymore. Then I remembered that we still did not support invalidating the self record on multiaddr change: https://github.com/libp2p/js-libp2p/blob/v0.29.0/src/identify/index.js#L319

I will do that in the follow up PR with the identify push updates

@vasco-santos
Copy link
Member Author

vasco-santos commented Sep 10, 2020

This is ready for review again @jacobheun , the expected follow up PRs are described in the main post of the PR:

  1. When libp2p updates its addresses, it should update the network - Identify push on add/remove listen + self peer record update on addr change
  2. Query the network if it needs more relays
  3. New connectionManager integration for disconnect improvement

Note: codecov jobs are not working properly atm. I rebuild everything, but it is failing the gcov command. So, some of the annotations in the PR diff are not valid as there are actually tests for that flows, but were added in the last commit

* @constructor
* @param {object} props
* @param {Libp2p} props.libp2p
* @param {number} props.maxListeners maximum number of relays to listen.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just default this to 1?


try {
const remoteAddrs = this._peerStore.addressBook.get(connection.remotePeer)
remoteMultiaddr = remoteAddrs.find(a => a.isCertified).multiaddr // Get first announced address certified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to be sufficient, we'll need to try and prioritize public addresses for this. It's still pretty common for peers to advertise private addresses. We can make a note to do this in a followup PR though.

HOP relays should really avoid advertising private addresses (we should document this in a "setting up relays" section of the production guides), but we can't rely on this behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have a milestone for it, but I will add a comment: #699 (comment)

I am not sure yet on the best approach for this, but it is being tracked

}

// Attempt to listen on relay
this._listenRelays.add(id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just do the add after the listen call in the try? Then the delete isn't required if it fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we will block inside the following try ... catch block, waiting for the transportManager.listen.
If we get multiple calls at the same time, all of them will not be blocked from reaching the transportManager.listen and we will probably end up with more listenRelays than needed.
Perhaps, I can use a p-queue instead? Otherwise, I might also cancel others that fail (should not expect transportManager.listen to fail, but it can)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it like this for now. It really shouldn't fail, we're already connected. If it does fail that should mean the connection was dropped during the listen attempt, which should trigger us to connect to another known relay if it exists.

@@ -24,7 +34,7 @@ module.exports = (circuit) => {
listener.listen = async (addr) => {
const addrString = String(addr).split('/p2p-circuit').find(a => a !== '')

const relayConn = await circuit._dialer.connectToPeer(multiaddr(addrString))
const relayConn = await libp2p.dial(multiaddr(addrString))
const relayedAddr = relayConn.remoteAddr.encapsulate('/p2p-circuit')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to consider here is that in AutoRelay we are creating a listen addr and then calling transportManager.listen to "connect". We will already have a connection per the AutoRelay logic, so that address is going to get thrown out and replaced with whatever this address is. We need to do some reconciliation here, I would find it odd to call listen with one address, and then have my actual address end up being something different.

I think this likely won't be a huge issue for initial relay connections, but if we reconnect to known relays, this address could change. The provided address should be the address we end up with, but if it fails for some reason we will dial other known addresses for the peer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do some reconciliation here, I would find it odd to call listen with one address, and then have my actual address end up being something different.

That is true! I changed this to avoid creating multiple connections to the same peer, but I agree with your point.
So you suggest that we go back and use the connectToPeer with a fallback to the dial if we fail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you suggest that we go back and use the connectToPeer with a fallback to the dial if we fail?

No. We shouldn't care how we're connected to the peer, really, but we need to be careful about the address we're advertising for this. Something like: Prioritize public addresses, if the connected address matches one of those use it, otherwise pick one of the others.

Again, I think we can do a follow PR for this, so we can focus on clear tests for that.

@jacobheun
Copy link
Contributor

CI is failing, but otherwise I think we can merge once that's fixed and continue work in subsequent PRs

@vasco-santos
Copy link
Member Author

Done @jacobheun ! CI test are good, don't know what happened before, but just removed cache and re-run and everything got good again. However, the codecov CI got broken in this PR and it never ran again 🤷‍♂️

@jacobheun jacobheun merged commit c3039a0 into 0.30.x Sep 16, 2020
@jacobheun jacobheun deleted the feat/auto-relay branch September 16, 2020 14:43
@vasco-santos vasco-santos restored the feat/auto-relay branch September 17, 2020 07:58
@vasco-santos vasco-santos deleted the feat/auto-relay branch September 17, 2020 07:59
vasco-santos added a commit that referenced this pull request Sep 25, 2020
* feat: auto relay

* fix: leverage protoBook events to ask relay peers if they support hop

* chore: refactor disconnect

* chore: do not listen on a relayed conn

* chore: tweaks

* chore: improve _listenOnAvailableHopRelays logic

* chore: default value of 1 to maxListeners on auto-relay
vasco-santos added a commit that referenced this pull request Oct 7, 2020
* feat: auto relay

* fix: leverage protoBook events to ask relay peers if they support hop

* chore: refactor disconnect

* chore: do not listen on a relayed conn

* chore: tweaks

* chore: improve _listenOnAvailableHopRelays logic

* chore: default value of 1 to maxListeners on auto-relay
vasco-santos added a commit that referenced this pull request Oct 26, 2020
* feat: auto relay

* fix: leverage protoBook events to ask relay peers if they support hop

* chore: refactor disconnect

* chore: do not listen on a relayed conn

* chore: tweaks

* chore: improve _listenOnAvailableHopRelays logic

* chore: default value of 1 to maxListeners on auto-relay
vasco-santos added a commit that referenced this pull request Nov 9, 2020
* feat: auto relay

* fix: leverage protoBook events to ask relay peers if they support hop

* chore: refactor disconnect

* chore: do not listen on a relayed conn

* chore: tweaks

* chore: improve _listenOnAvailableHopRelays logic

* chore: default value of 1 to maxListeners on auto-relay
vasco-santos added a commit that referenced this pull request Dec 10, 2020
* feat: auto relay

* fix: leverage protoBook events to ask relay peers if they support hop

* chore: refactor disconnect

* chore: do not listen on a relayed conn

* chore: tweaks

* chore: improve _listenOnAvailableHopRelays logic

* chore: default value of 1 to maxListeners on auto-relay
@vasco-santos vasco-santos mentioned this pull request Dec 10, 2020
17 tasks
vasco-santos added a commit that referenced this pull request Dec 16, 2020
* feat: auto relay

* fix: leverage protoBook events to ask relay peers if they support hop

* chore: refactor disconnect

* chore: do not listen on a relayed conn

* chore: tweaks

* chore: improve _listenOnAvailableHopRelays logic

* chore: default value of 1 to maxListeners on auto-relay
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants