-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: auto relay #723
feat: auto relay #723
Conversation
22ef975
to
9067e63
Compare
@jacobheun your thoughts regarding the current direction of this work are welcome :) |
;[libp2p, relayLibp2p1, relayLibp2p2, relayLibp2p3] = peerIds.map((peerId, index) => { | ||
const opts = baseOptions | ||
|
||
if (index !== 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is creating 3 hop relays, is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is used to test the maximum number of relays to listen on
9067e63
to
aa5a596
Compare
aa5a596
to
f1e3bfd
Compare
0e76d82
to
2c11cca
Compare
5b79b09
to
ee3682c
Compare
ee3682c
to
d8f0849
Compare
@jacobheun this is ready for review now. Should we create a base branch for |
@vasco-santos Yeah let's do that, there are a few things we want to land in conjunction with this so it would be good to gather those outside of master. |
I changed it to 0.30.x branch as we might not include all of #703 in the next release |
src/circuit/auto-relay.js
Outdated
this._listenRelays.add(id) | ||
|
||
// Create relay listen addr | ||
const remoteMultiaddr = connection.remoteAddr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to do some better selection for the remoteAddr as this is an observed address and might not actually be dialable. It's possible MDNS could have caused us to receive an incoming connection, or we could have dialed a private address on a VPN. We should probably take from the relays announced addresses and avoid using private multiaddrs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point! I will just get the first addr (that is certified) from the AddressBook. We will already have the connection to the peer, so it will just be reused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will already have the connection to the peer, so it will just be reused
The problem is our advertised address for this. The actual address we're connected to the peer doesn't matter, the address of the relay we advertise to other peers does.
src/circuit/auto-relay.js
Outdated
if (!remoteMultiaddr.protoNames().includes('p2p')) { | ||
listenAddr = `${remoteMultiaddr.toString()}/p2p/${connection.remotePeer.toB58String()}/p2p-circuit/p2p/${this._peerId.toB58String()}` | ||
} else { | ||
listenAddr = `${remoteMultiaddr.toString()}/p2p-circuit/p2p/${this._peerId.toB58String()}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need our peer id in the listen address, it should just end with /p2p-circuit
. The circuit listener logic will need to be fixed because it expects our id at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The circuit listener logic will need to be fixed because it expects our id at the end.
It does not seem necessary. I just checked and the Circuit listener does not seem to use the id at all. Also removed it and everything seems to work as expected.
src/circuit/auto-relay.js
Outdated
* Try to listen on available hop relay connections. | ||
* @return {Promise<void>} | ||
*/ | ||
async _listenOnAvailableHopRelays () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should change, really what we want to do is find
relays if we don't have enough. This would involve:
- Check the metadata store for known relays, try them first if we're not connected
- If we dont have enough, find relays on the network. (https://github.com/libp2p/go-libp2p/blob/fb3179e617bdf049734f58aa8ced2bb42de7ea5d/p2p/host/relay/autorelay.go#L250)
Autorelay should also start with a network search for relays. Ideally this would happen once we determined ourselves to be private (via autoNat) so we don't bind to relays when we're publicly dialable, but we dont have autonat yet.
If we find multiple relays we should eventually prioritize by lowest latency, but for now we can focus on the use case of users configuring their node to listen to a specific relay, and allow ambient canhop checks to supplement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned in the description that this was the initial PR for the issue. The follow up PR would include that part to ease the implementation/review. But, if we start by a simple version, I can also include it here:
I will change the logic a bit, it is important to notice that we can be connected to a peer that supports hop, but we discarded it before and we should leverage it after if we need to.
Regarding the start part, I was also expecting as a follow up. We can start it when the circuit listener start, but we need to guarantee that the peerStore is loaded first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconsidering this, I think we should try to leverage the connections we already have before dialling other peers. I think we should start by doing what I had:
- iterate the available connections, if we are connected to a peer which supports hop but we are not listening on it, we should add it as a listener.
- if we still need more peers, go to the PeerStore (MetadataBook)
- Finally (if we need more), we find relays on the network
I am hitting an "issue" with this approach, which also happens as you suggested. Considering that we call this if a disconnect happens and we remove a listen relay, if we iterate on the metadataBook, we will just try to reconnect to that node. This flow is odd and should be avoided. We would sort peers by score and eventually not dial it in the future, but for now we cannot do that. Or we "accept" this behaviour for now, or _listenOnAvailableHopRelays
should receive an optional list of peers to ignore (where we would add the disconnected peer, if we are call _listenOnAvailableHopRelays
as a consequence of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not prioritize known relays by whether or not they're connected?
- Check the Peer Store for relay peers
- If we're connected and they're not already a relay, bind to them
- If we don't have enough, perform a search
- Once the search is done, attempt to bind to known relays (which may include a dial)
If we're already connected to them we should have already checked their HOP status, so prioritizing all known relays by their connectedness makes sense, instead fo checking each connected peer which would be less efficient.
The immediate redial problem should really be solved with backoffs (connection gating, plus maybe a ttl tag in the peer store), especially if they disconnected from us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have already connections that we can use to bind, I think they should be used first instead of establishing new connections with other peers. Specially as we head towards having more meaningful connections to the peer. We will be establishing a new connection that we will want to maintain (auto relay binds will be important connections to the peer), while we could be using a connection that might be already important for the peer (as it already exists). It will also be faster to replace the listen address.
instead fo checking each connected peer which would be less efficient.
Yes, we will need to iterate on the connections and do a metadataBook.get.
But, we can also iterate the metadataBook and check if we have a connection to the peers that support hop. Listen on them if we have, and follow up on the ones we were not connected after. We will get some more complex logic and additional gets to the connections map, but isn't this worth than establishing new connections if we can use the ones we already have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The immediate redial problem should really be solved with backoffs (connection gating, plus maybe a ttl tag in the peer store), especially if they disconnected from us.
Yeah, so I will tackle that together with the connection manager stuff.
src/circuit/listener.js
Outdated
const listener = new EventEmitter() | ||
const listeningAddrs = new Map() | ||
|
||
// Remove listeningAddrs when a peer disconnects | ||
libp2p.connectionManager.on('peer:disconnect', (connection) => { | ||
listeningAddrs.delete(connection.remotePeer.toB58String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this actually causes a change in our addresses we need to perform an identify push .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will call the identify push there if deleted. But, once we support a dynamic transport manager with runtime transport add and remove, we should probably have the IdentifyService to listen on updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning on doing this in a follow up PR (point 4 in the issue), as I was also not doing the push when the new listen addr is added.
Anyway, I can do both now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up PRs are fine for this, but it would be good to at least denote some todo comments so we don't forget code paths in the subsequent PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have the comments and do it as a follow up after all. I found a bug on connect + disconnect because the addr were not certified anymore. Then I remembered that we still did not support invalidating the self record on multiaddr change: https://github.com/libp2p/js-libp2p/blob/v0.29.0/src/identify/index.js#L319
I will do that in the follow up PR with the identify push updates
db39ca7
to
bd613cb
Compare
This is ready for review again @jacobheun , the expected follow up PRs are described in the main post of the PR:
Note: codecov jobs are not working properly atm. I rebuild everything, but it is failing the |
src/circuit/auto-relay.js
Outdated
* @constructor | ||
* @param {object} props | ||
* @param {Libp2p} props.libp2p | ||
* @param {number} props.maxListeners maximum number of relays to listen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just default this to 1?
|
||
try { | ||
const remoteAddrs = this._peerStore.addressBook.get(connection.remotePeer) | ||
remoteMultiaddr = remoteAddrs.find(a => a.isCertified).multiaddr // Get first announced address certified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't going to be sufficient, we'll need to try and prioritize public addresses for this. It's still pretty common for peers to advertise private addresses. We can make a note to do this in a followup PR though.
HOP relays should really avoid advertising private addresses (we should document this in a "setting up relays" section of the production guides), but we can't rely on this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have a milestone for it, but I will add a comment: #699 (comment)
I am not sure yet on the best approach for this, but it is being tracked
} | ||
|
||
// Attempt to listen on relay | ||
this._listenRelays.add(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just do the add after the listen call in the try? Then the delete isn't required if it fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we will block inside the following try ... catch
block, waiting for the transportManager.listen
.
If we get multiple calls at the same time, all of them will not be blocked from reaching the transportManager.listen
and we will probably end up with more listenRelays than needed.
Perhaps, I can use a p-queue
instead? Otherwise, I might also cancel others that fail (should not expect transportManager.listen
to fail, but it can)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave it like this for now. It really shouldn't fail, we're already connected. If it does fail that should mean the connection was dropped during the listen attempt, which should trigger us to connect to another known relay if it exists.
@@ -24,7 +34,7 @@ module.exports = (circuit) => { | |||
listener.listen = async (addr) => { | |||
const addrString = String(addr).split('/p2p-circuit').find(a => a !== '') | |||
|
|||
const relayConn = await circuit._dialer.connectToPeer(multiaddr(addrString)) | |||
const relayConn = await libp2p.dial(multiaddr(addrString)) | |||
const relayedAddr = relayConn.remoteAddr.encapsulate('/p2p-circuit') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to consider here is that in AutoRelay we are creating a listen addr and then calling transportManager.listen
to "connect". We will already have a connection per the AutoRelay logic, so that address is going to get thrown out and replaced with whatever this address is. We need to do some reconciliation here, I would find it odd to call listen with one address, and then have my actual address end up being something different.
I think this likely won't be a huge issue for initial relay connections, but if we reconnect to known relays, this address could change. The provided address should be the address we end up with, but if it fails for some reason we will dial other known addresses for the peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to do some reconciliation here, I would find it odd to call listen with one address, and then have my actual address end up being something different.
That is true! I changed this to avoid creating multiple connections to the same peer, but I agree with your point.
So you suggest that we go back and use the connectToPeer
with a fallback to the dial if we fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you suggest that we go back and use the connectToPeer with a fallback to the dial if we fail?
No. We shouldn't care how we're connected to the peer, really, but we need to be careful about the address we're advertising for this. Something like: Prioritize public addresses, if the connected address matches one of those use it, otherwise pick one of the others.
Again, I think we can do a follow PR for this, so we can focus on clear tests for that.
CI is failing, but otherwise I think we can merge once that's fixed and continue work in subsequent PRs |
Done @jacobheun ! CI test are good, don't know what happened before, but just removed cache and re-run and everything got good again. However, the codecov CI got broken in this PR and it never ran again 🤷♂️ |
* feat: auto relay * fix: leverage protoBook events to ask relay peers if they support hop * chore: refactor disconnect * chore: do not listen on a relayed conn * chore: tweaks * chore: improve _listenOnAvailableHopRelays logic * chore: default value of 1 to maxListeners on auto-relay
* feat: auto relay * fix: leverage protoBook events to ask relay peers if they support hop * chore: refactor disconnect * chore: do not listen on a relayed conn * chore: tweaks * chore: improve _listenOnAvailableHopRelays logic * chore: default value of 1 to maxListeners on auto-relay
* feat: auto relay * fix: leverage protoBook events to ask relay peers if they support hop * chore: refactor disconnect * chore: do not listen on a relayed conn * chore: tweaks * chore: improve _listenOnAvailableHopRelays logic * chore: default value of 1 to maxListeners on auto-relay
* feat: auto relay * fix: leverage protoBook events to ask relay peers if they support hop * chore: refactor disconnect * chore: do not listen on a relayed conn * chore: tweaks * chore: improve _listenOnAvailableHopRelays logic * chore: default value of 1 to maxListeners on auto-relay
* feat: auto relay * fix: leverage protoBook events to ask relay peers if they support hop * chore: refactor disconnect * chore: do not listen on a relayed conn * chore: tweaks * chore: improve _listenOnAvailableHopRelays logic * chore: default value of 1 to maxListeners on auto-relay
* feat: auto relay * fix: leverage protoBook events to ask relay peers if they support hop * chore: refactor disconnect * chore: do not listen on a relayed conn * chore: tweaks * chore: improve _listenOnAvailableHopRelays logic * chore: default value of 1 to maxListeners on auto-relay
This PR is part of the #699 initiative to add support for AutoRelay.
It includes the initial points described in the issue for connecting with peers with a relay protocol, more precisely, it includes this points:
Follow up PRs:
It is important to mention the following implementation details:
change:protocols
event from the PeerStore, in order to understand what peers to ask for HOP supportmaxListeners
was added (happy to reconsider the name) instead theminThreshold
. It limits the number of relay addresses to listen on. Each time a node is connected, it will ask if it supports HOP (if peer has the relay protocol). This should be polished with the connectionManagement improvements to have a general approach for theautoDial