Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autodial seems to mis-behave with CircuitRelayv2 #1894

Closed
justin0mcateer opened this issue Jul 25, 2023 · 8 comments
Closed

Autodial seems to mis-behave with CircuitRelayv2 #1894

justin0mcateer opened this issue Jul 25, 2023 · 8 comments
Labels
need/triage Needs initial labeling and prioritization

Comments

@justin0mcateer
Copy link

justin0mcateer commented Jul 25, 2023

  • Version:
    0.45.9

  • Platform:
    Linux master 6.3.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Tue, 30 May 2023 13:44:01 +0000 x86_64 GNU/Linux
    Brave Browser 114.1.52.117

  • Subsystem:
    AutoDial
    CircuitRelayv2

Severity:

Medium
Generally, it would be 'high', but at least there seems to be a workaround.

Description:

I am trying to get two browsers to communicate using the WebRTC Transport via a CircuitRelayV2 relay. The browsers connect to the Relay via WebSocket. The browsers connect to the relay (via HOP), discover each other, connect to each other via the Relay (via STOP). However, when the AutoDial configuration drives the browser to attempt to dial the other browser directly, multiple addresses are selected for the dial attempt. The 'p2p-circuit/p2p' address is always sorted first and seems to succeed immediately. Upon success of the first dial, remaining open dials (always WebRTC) are given the 'abort' signal.

Screenshot_WebRTCDialAborted_2023-07-25 12-54-48

Providing a ConnectionGater that causes these specific addresses to be excluded from dialing seems to resolve the issue. However, it seems quite hacky and leads me to think either I have found a non-working configuration, or there is something wrong with the logic.

    connectionGater: {
      denyDialMultiaddr: (add: Multiaddr) => add.toString().includes('p2p-circuit/p2p'),
    },

Steps to reproduce the error:

  • Use configure CircuitRelayV2 with an additional tranport (eg WebRTC) configure AutoDial enabled (eg. 'minConnections: 10')
  • Observe that every dial using the WebRTC transport is cancelled immediately
  • Observe that every dial includes a 'p2p-circuit/p2p' address which seems to success immediately
  • Configure 'connectionGater.denyDialMultiaddr' to exclude 'p2p-circuit/p2p' addresses
  • Observe that WebRTC dials begin succeeding
@justin0mcateer justin0mcateer added the need/triage Needs initial labeling and prioritization label Jul 25, 2023
@SgtPooki
Copy link
Member

SgtPooki commented Jul 25, 2023

A slightly better fix could be to sort addresses to prefer the dial methods you want (so it can fallback to others if your preferences fail), it's not documented well, but connectionManager.addressSorter should be able to help.

https://github.com/SgtPooki/helia-service-worker-gateway/blob/ccecb41f0080fb5346c9576c9beb16ff8f82ce3d/src/libp2pConfigs/getDhtLibp2pConfig.ts#L19-L32 contains a code sample I used to try and prioritize browser transports.

@maschad
Copy link
Member

maschad commented Jul 26, 2023

We perhaps should do a better job of documenting the connection manager flow but as @SgtPooki mentioned the Address Sorter is used by the connection manager to sort addresses into order before dialling, the default addressSorter config is to dial public addresses first which in your case would be the public p2p-circuit relay nodes (in order to then dial private nodes behind NATs / firewalls) which is desirable considering dialling private addresses often fails.

Closing this for now as it's intended behaviour but feel free to re-open.

@maschad maschad closed this as completed Jul 26, 2023
@justin0mcateer
Copy link
Author

FWIW, in the CircuitRelayV2 case it makes no sense to 'prefer' public addresses if there is a private/direct path available (eg. via WebRTC).

@maschad
Copy link
Member

maschad commented Jul 26, 2023

In the case of CircuitRelayV2 there are use cases where the public relay node would need to be dialled first in order then dial a private node, regardless the connectionManager is not explicitly designed for this specific use case and as such the addressSorter needs to be flexible for scenarios where a user may want to dial a public address first.

@justin0mcateer
Copy link
Author

justin0mcateer commented Jul 28, 2023

@SgtPooki Thanks for the AddressSorter suggestion, this is my current plan. I can report that trying to use the ConnectionGater to solve this problem did not go well. It created a good bit of flakiness trying to block dials on 'p2p-circuit/p2p'.

In thinking about the AddressSorter approach a bit more, I am thinking the effectiveness of sorting the addresses to solve the Abort issue would be limited to a single dial per peer (eg maxParallelDialsPerPeer: 1). Otherwise, regardless of the 'sorting', the connection attempts are racing and WebRTC is always going to lose due to the SDP+ICE negotiation process. Thoughts?

@maschad
Copy link
Member

maschad commented Jul 28, 2023

In thinking about the AddressSorter approach a bit more, I am thinking the effectiveness of sorting the addresses to solve the Abort issue would be limited to a single dial per peer (eg maxParallelDialsPerPeer: 1). Otherwise, regardless of the 'sorting', the connection attempts are racing and WebRTC is always going to lose due to the SDP+ICE negotiation process. Thoughts?

I think there is a misconception here in achieving the overall goal of trying to get two browsers to connect on WebRTC via CircuitRelayV2.

Circuit Relay outes traffic between two peers over a third-party “relay” peer.

In many cases, peers will be unable to traverse their NAT and/or firewall in a way that makes them publicly accessible. Or they may not share common transport protocols that would allow them to communicate directly.

To enable peer-to-peer architectures in the face of connectivity barriers like NAT we need p2p-circuit. When a peer isn’t able to listen on a public address, it can dial out to a relay peer, which will keep a long-lived connection open. Other peers will be able to dial through the relay peer using a p2p-circuit address, which will forward traffic to its destination.

Once a relay reservation has been established then a browser can dial another browser directly dial that other browser over the relayed address (which the default addressSorter would dial first) once that connectivity is established then the SDP + ICE negotiation process can begin to try an establish a direction connection for two private nodes.

@justin0mcateer
Copy link
Author

I am sorry for not being more clear. I understand that the WebRTC connection does not go 'over' the Circuit Relay connection. When I said dialing WebRTC over the Relay, I thought it would be obvious I was referring to the signaling.

I guess I have also not made the problem I am trying to explain clear. You said, " a browser can dial another browser directly dial that other browser over the relayed address", that is correct, if you are referring to the MultiAddress. HOWEVER (and this is the issue I am having and the point I am trying to make) Auto Dial doesn't dial MultiAddresses, it dials PeerIds. All MultiAddresses for the Peer and the first one to Connect wins. This is specific to Auto Dial, as I mentioned in the title of the issue.

For WebRTC signalled over Circuit Relay, the dial is not complete until WebRTC is fully negotiated. Hence, it is always aborted if there is any other connectable MultiAddress for that peer.

This can be demonstrated by configuring a ConnectionGater, as mentioned above, but that seems to creat different problems.

@maschad
Copy link
Member

maschad commented Jul 31, 2023

HOWEVER (and this is the issue I am having and the point I am trying to make) Auto Dial doesn't dial MultiAddresses, it dials PeerIds. All MultiAddresses for the Peer and the first one to Connect wins. This is specific to Auto Dial, as I mentioned in the title of the issue.

I am not sure the distinction you are trying to make here, a PeerID can have multiple dial-able addresses and so If a peerID is passed to the dial queue (which is the queue used to manage parallel dials in the AutoDialler) all known multiaddrs will be tried. This is not specific to the Autodialler, as the externally exposed dial method has the same signature

For WebRTC signalled over Circuit Relay, the dial is not complete until WebRTC is fully negotiated. Hence, it is always aborted if there is any other connectable MultiAddress for that peer.

Let's say you have browser node A and browser node B, and a Relay R, the scenario you are describing could be a result of:

  1. Browser node A or B is also acting as a relay, and so if either Node was connected to Relay R it still attempts to dial the circuit multiaddr for the other browser node.

  2. There was no connection / the connection has been closed to Relay node R and so there is attempt to connect to it.

Otherwise if there is an existing relayed connection from A to Relay node R, then the hop protocol stream would have sent a CONNECT message to R. The relay R would have verified that it has a reservation and connection for A and then opened a stop protocol stream to A, sending a CONNECT message.

A would then begin to create the RTCPeerConnection and begin the SDP + Ice exchange process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

3 participants