Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(@libp2p/tcp): race condition in onSocket #2763

Merged
merged 5 commits into from
Oct 23, 2024

Conversation

tabcat
Copy link
Contributor

@tabcat tabcat commented Oct 11, 2024

Title

fix(@libp2p/tcp): race condition in onSocket

Description

This fixes a race condition in the onSocket listener method.
onSocket gets a net.Socket parameter that needs to be closed later
before the tcp server can close.

The net.Socket is used to create a MultiaddrConnection but the
connection is not tracked with this.connections until after it has been
upgraded.

If the tcp listener.close is called while a the MultiaddrConnection is waiting
to be upgraded, then the MultiaddrConnection socket cannot be closed as
it does not exist in this.connections.

Instead of adding the MultiaddrConnection to this.connections on
creation I decided to create a separate property named
this.maConnections. I decided to track the non-upgraded connections
separately because I was unsure how this.connections must be used with other things
like metrics.

Notes & open questions

Closes #2760

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if necessary (this includes comments as well)
  • I have added tests that prove my fix is effective or that my feature works

This fixes a race condition in the onSocket listener method.
onSocket gets a net.Socket parameter that needs to be closed later
before the tcp server can close.
The net.Socket is used to create a MultiaddrConnection but the
connection is not tracked with this.connections until after it has been
upgraded.
If the tcp listener.close is called while a the MultiaddrConnection is waiting
to be upgraded, then the MultiaddrConnection socket cannot be closed as
it does not exist in this.connections.

Instead of adding the MultiaddrConnection to this.connections on
creation I decided to create a separate property named
this.maConnections. I decided to track the non-upgraded connections
separately because I was unsure how this.connections must be used with other things
like metrics.
@tabcat tabcat marked this pull request as ready for review October 11, 2024 12:16
@tabcat tabcat requested a review from a team as a code owner October 11, 2024 12:16
Copy link
Member

@achingbrain achingbrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is necessary. When we close the transport and call this.pause(true) we close the socket server which should destroy all open sockets.

This is the same as calling maConn.abort(err) - the interesting thing it does is call socket.destroy() - see toMultiaddrConnection for this.

It's possible there's something missing though - could you please add a test to show the problem you're seeing?

packages/transport-tcp/src/listener.ts Outdated Show resolved Hide resolved
@achingbrain
Copy link
Member

achingbrain commented Oct 11, 2024

Actually server.close says:

Stops the server from accepting new connections and keeps existing connections.

So yes, it is necessary to close un-upgraded connections separately, but we should probably just close them immediately if they fail to upgrade, if we're not doing that already.

@tabcat
Copy link
Contributor Author

tabcat commented Oct 11, 2024

Yes, they are aborted inside of the catch block of the upgrade call currently.

The problem is that maConn is added to this.connections inside of the then block of the upgrade. If close is called before the then block is ran then the connection cannot be closed and this.server.close never resolves.

@tabcat
Copy link
Contributor Author

tabcat commented Oct 11, 2024

It's possible there's something missing though - could you please add a test to show the problem you're seeing?

I can add a test. The test that I have been using exists in https://github.com/tabcat/delay5 which I should be able to replicate in a separate PR.

Copy link
Contributor Author

@tabcat tabcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much cleaner :)

packages/transport-tcp/src/listener.ts Outdated Show resolved Hide resolved
@achingbrain
Copy link
Member

Sorry for jumping in, I thought I'd just drag this across the line.

This has shown up a slightly different problem in that we currently close the socket server but wait for the "close" event via the callback passed to this.server.close - the "close" event is only emitted once all connections have closed - if the connections don't close the promise will never resolve which could be what's causing ChainSafe/lodestar#6053

The callback was added in #2058 it was not in the original PR that added the "pause" functionality - libp2p/js-libp2p-tcp#218

@achingbrain achingbrain merged commit aa8de9f into libp2p:main Oct 23, 2024
29 of 30 checks passed
@achingbrain achingbrain mentioned this pull request Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

race condition causes @libp2p/tcp close method to hang
2 participants