Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix relayed connections, add DCUtR and reduce CLI args #502

Merged
merged 64 commits into from
Aug 22, 2023

Conversation

sandreae
Copy link
Member

@sandreae sandreae commented Aug 17, 2023

I had a few goals with this PR:

  1. Troubleshoot and get relayed connection establishment to be more reliable
  2. Reduce the number of network related CLI args
  3. Integrate and test if DCUtR now works
  4. Choose random QUIC and HTTP ports when they're taken

This all came with a general refactor, especially of the event loop, which is hopefully now a little easier to follow. The answer to 1. seems to have been to introduce a kind of handshake phase to the initiation process if a relay address has been given. This waits on identify messages to/from the relay, and successful listening on the circuit address before starting any other processes. Good news is, with this, relayed connections seem to be very quick and reliable ⭐

I introduced the DCUtR behaviour but I haven't been able to observe if it works yet, this could just be my particular NAT setup not allowing holepunching though. There are upgrade requests passing back and forth, but the upgrade itself fails. This doesn't seem to have any adverse effect on the existing relayed connection though.

Lastly, I removed the dial behaviour which was responsible for redialing peers if they disconnect unexpectedly. This was partly as I thought it might be effecting the DCUtR processes, and partly because I wanted to in general reduce complicating the connection logic at all. It may come back later, lets see (also connections seem pretty reliable anyway).

Closes: #423 and #361

📋 Checklist

  • Add tests that cover your changes
  • Add this PR to the Unreleased section in CHANGELOG.md
  • Link this PR to any issues it closes
  • New files contain a SPDX license header

@adzialocha
Copy link
Member

adzialocha commented Aug 20, 2023

In mDNS I observe that we have multiple (up to 4) connections / replication sessions with the same peer now, I guess that's due to running TCP and QUIC at the same time, right?

@sandreae
Copy link
Member Author

In mDNS I observe that we have multiple (up to 4) connections / replication sessions with the same peer now, I guess that's due to running TCP and QUIC at the same time, right?

Ah yeh, possibly. I can test the successful DCUtR without listening on TCP. If it still works we can remove listening on TCP for the "client" nodes.

@mycognosist
Copy link
Contributor

@sandreae

Thanks for the review request! I'm planning to read through everything and provide feedback tomorrow.

@sandreae
Copy link
Member Author

@sandreae

Thanks for the review request! I'm planning to read through everything and provide feedback tomorrow.

Great, thanks!

Copy link
Contributor

@mycognosist mycognosist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! All looks good to me. Well done!

Great to hear that the relayed connections are now reliable and speedy. The addition of the reservation timeouts is a nice touch and should provide significant stability for long-lived sessions / connections.

Nice to see the CLI args pared-down further; always love to see a codebase getting smaller :)

Give me a shout if you ever want more hands on deck to test DCUtR.

@sandreae
Copy link
Member Author

Sweet! All looks good to me. Well done!

Great to hear that the relayed connections are now reliable and speedy. The addition of the reservation timeouts is a nice touch and should provide significant stability for long-lived sessions / connections.

Nice to see the CLI args pared-down further; always love to see a codebase getting smaller :)

Give me a shout if you ever want more hands on deck to test DCUtR.

Thanks @mycognosist, glad you're happy with the direction. It it started with your good work 😄 !

Yeh, always satisfying to reduce code, we started with "we have all the options, yey!" and ending up at "really we only need these options for now, yey!"

Would defo be good to do some more DCUtR at some point. Happily, when it doesn't succeed, there are no negative effects on the currently relayed connection, so really it's a bonus. But it would be nice to have a rough idea of when it is likely to succeed.

@sandreae
Copy link
Member Author

In mDNS I observe that we have multiple (up to 4) connections / replication sessions with the same peer now, I guess that's due to running TCP and QUIC at the same time, right?

This was actually cos we were missing dial opts with conditions (dialer used to contain these) for dialing peers discovered over mDNS. We only want to dial when the peer is disconnected and not already being dialed. Additionally we dial concurrency to 1. With this i observe max one connection being established as a listener or dialer (we still can't avoid these two connections being established).

@sandreae sandreae merged commit bf77be8 into main Aug 22, 2023
10 checks passed
@adzialocha
Copy link
Member

In mDNS I observe that we have multiple (up to 4) connections / replication sessions with the same peer now, I guess that's due to running TCP and QUIC at the same time, right?

This was actually cos we were missing dial opts with conditions (dialer used to contain these) for dialing peers discovered over mDNS. We only want to dial when the peer is disconnected and not already being dialed. Additionally we dial concurrency to 1. With this i observe max one connection being established as a listener or dialer (we still can't avoid these two connections being established).

Nice one! That looks much more transparent.

@adzialocha adzialocha deleted the network-service-refactor branch August 25, 2023 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add holepunching via libp2p's DCUTR over QUIC libp2p: AutoNAT on QUIC falsely reports public NAT status
3 participants