Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add connection monitor #2644

Merged
merged 5 commits into from
Aug 14, 2024
Merged

feat: add connection monitor #2644

merged 5 commits into from
Aug 14, 2024

Conversation

achingbrain
Copy link
Member

@achingbrain achingbrain commented Jul 31, 2024

Adds a connection monitor that periodically ensures remote peers are still online and contactable by trying to send a single byte via the ping protocol, and sets the .rtt property of the connection to how long it took.

If the remote responds that the ping protocol is not supported, it tries to infer the round trip time by how long it took to fail.

If the remote is unresponsive or opening the stream fails for any other reason, the connection is aborted with the thrown error.

It's possible to configure the ping interval, how long we wait before considering a peer to be inactive and whether or not to close the connection on failure.

Closes #2643

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if necessary (this includes comments as well)
  • I have added tests that prove my fix is effective or that my feature works

Adds a connection monitor that periodically ensures remote peers are
still online and contactable by trying to send a single byte via the
ping protocol, and sets the `.rtt` property of the connection to how
long it took.

If the ping protocol is not supported by the remote, it tries to infer
the round trip time by how long it took to fail.

If the remote is unresponsive or opening the stream fails for any
other reason, the connection is aborted with the throw error.

It's possible to configure the ping interval, how long we wait before
considering a peer to be inactive and whether or not to close the
connection on failure.

Closes #2643
@achingbrain achingbrain requested a review from a team as a code owner July 31, 2024 06:52
@achingbrain
Copy link
Member Author

This was discussed on the maintainers call and there was a suggestion to make this a service that the user would have to configure.

I'm having a hard time thinking of when you wouldn't want this so it's implemented as part of libp2p and on by default.

Happy to talk about this further before merge.

@justin0mcateer
Copy link

LGTM. This will eliminate the complexity of managing this at the application layer which is a major improvement.

Copy link
Member

@maschad maschad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this, I left a suggestion and I also have a question.

packages/libp2p/src/connection-monitor.ts Outdated Show resolved Hide resolved
packages/libp2p/src/connection-monitor.ts Show resolved Hide resolved
Copy link
Member

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.. only comment i have is about being a little more explicit with the assertions in the tests instead of .gte(0) after a delay of 100ms

Copy link
Member

@wemeetagain wemeetagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this be disable-able. This is definitely something that most applications will want to use, but users should be able to turn this off and not pay for it.

As an example, in Lodestar, we use an application-specific ping protocol, with a response that means something quite different. We would like to disable this and handle pings / keep-alives in an application-specific manner.

})
])

conn.rtt = Date.now() - start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not divided by 2 too?

Copy link
Member Author

@achingbrain achingbrain Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one times how long it takes to send and receive one byte, (e.g. one round trip) but after the stream has been opened (e.g. after multistream select has finished).

The ERR_UNSUPPORTED_PROTOCOL error is thrown by multistream select, which takes two round trips, so that needs dividing by two but the happy path here doesn't, because we reset the start variable before timing the single byte rtt.

achingbrain and others added 2 commits August 9, 2024 10:43
Co-authored-by: Chad Nehemiah <chad.nehemiah94@gmail.com>
Copy link
Member

@maschad maschad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work

@achingbrain
Copy link
Member Author

I'd like to see this be disable-able

@wemeetagain a flag has been added:

const node = await createLibp2p({
  connectionMonitor: {
    enabled: false
  }
})

@achingbrain achingbrain merged commit 7939dbd into main Aug 14, 2024
24 checks passed
@achingbrain achingbrain deleted the feat/add-connection-monitor branch August 14, 2024 12:14
@achingbrain achingbrain mentioned this pull request Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ping recording connection latency/aliveness
5 participants