Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: subsequent dial abort #763

Closed
wants to merge 1 commit into from
Closed

Conversation

vasco-santos
Copy link
Member

@vasco-santos vasco-santos commented Sep 30, 2020

This PR propagates the AbortController signal to already in progress dials.

When a nodes attempts to dial another node, libp2p checks if there is an in progress dial before adding the dial request to the queue of dials. If there is a dial in progress, the new dial will reuse the same promise already being resolved.

There is one scenario where unexpected behaviours might occur, which this PR aims to solve. Considering the next flow:

libp2p.dial(peerId)
libp2p.dial(peerId, { signal })

The second dial will leverage the previous dial (already in progress), but the abort signal would not be triggable. While this flow would not be common, under the hood it might be. If libp2p has autoDial enabled (which is the default!), once new peer addresses are added to the AddressBook, the peer will be dialed (if connMgr upper bound not reached).

The autoDial will trigger a libp2p.dial(peerId), but then a user might perform in the application level a libp2p.dial(peerId, { signal }) at the same time. If the user tries to use the signal, it will simply not work.

--

In this PR, the timeout signal is leveraged to abort a dial, if subsequent abort signals are called. The ideal solution would be to add new signals on the fly and combine them (as done in the initial dial https://github.com/libp2p/js-libp2p/blob/v0.29.0/src/dialer/index.js#L162 ). However, the transports are already listening for abort on the previous reference and it would need to be replaces, which would mean considerable changes in the transports to support this.

} else {
// track subsequent dial abort
options.signal && options.signal.addEventListener('abort', onAbort)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to look more at the underlying issue.

I don't think this is the right way to solve this problem and it's likely to have some unwanted side effects. If we assume an internal system like the DHT is actively trying to dial a peer, if the application has a shorter timeout tolerance, and dials and aborts before the internal DHT dial is done it will terminate the underlying dial which is not good. We need to be able to cancel the existing context without affecting any dials that are already in progress.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern in this issue is how to guarantee that we do not reach unexpected states.
As an example, if the application layer performs a dial and it is aborted, but then a parallel dial connection was achieved, does this seem correct? I understand the possible side effects for other subsystems, but should a connection exist after an application layer abort? I don't know, but if we do not abort other dials, the abort signal option should be clear on what it does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abort signals should only apply to their current context. If aborting can affect other contexts it would make unexpected states much worse, as you can get a lot of really nasty collisions between the subsystems and application layer. While it might be odd to have a connection established when you aborted a dial, the circumstances of how that parallel dial can vary quite a bit, and aborting the current context is not the right way to manage that kind of behavior if that's what the user needs.

@achingbrain
Copy link
Member

Fixed in #1678

@achingbrain achingbrain deleted the fix/subsequent-dial-abort branch April 18, 2023 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants