-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(libp2p): add autodial retry threshold config option #1943
Changes from 1 commit
3626644
9b59ccc
dea67af
bcc23a3
295544a
8285e9e
e65724f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -53,6 +53,11 @@ export const AUTO_DIAL_PRIORITY = 0 | |||||
*/ | ||||||
export const AUTO_DIAL_MAX_QUEUE_LENGTH = 100 | ||||||
|
||||||
/** | ||||||
* @see https://libp2p.github.io/js-libp2p/interfaces/libp2p.index.unknown.ConnectionManagerInit.html#autoDialPeerRetryThreshold | ||||||
*/ | ||||||
export const AUTO_DIAL_PEER_RETRY_THRESHOLD = 1000 * 60 * 10 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we could safely lower this and still achieve the goals of not spamming likely-to-fail multiaddrs, while also not blocking users with a 10 minute wait (or forcing a refresh in the browser) 1-2 minutes would be a significant improvement and minor inconvenience in the case of internet outages. A good approach could be determining how long it takes(D) to dial typical size of peers(N) and setting the value to some ratio of that number. If we expect users to typically find 100-200 peers per session (in the browser that's likely), and it takes 5 minutes to dial them all, then 5 minutes should be default.. but I doubt it needs to be that high. |
||||||
|
||||||
/** | ||||||
* @see https://libp2p.github.io/js-libp2p/interfaces/index._internal_.ConnectionManagerConfig.html#inboundConnectionThreshold | ||||||
*/ | ||||||
|
@@ -62,3 +67,13 @@ export const INBOUND_CONNECTION_THRESHOLD = 5 | |||||
* @see https://libp2p.github.io/js-libp2p/interfaces/index._internal_.ConnectionManagerConfig.html#maxIncomingPendingConnections | ||||||
*/ | ||||||
export const MAX_INCOMING_PENDING_CONNECTIONS = 10 | ||||||
|
||||||
/** | ||||||
* Store as part of the peer store metadata for a given peer, the value for this | ||||||
* key is a stringified number representing the timestamp of the last time a | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the existing comment is wordier than needs to be |
||||||
* dial attempted failed with the relevant peer. | ||||||
* | ||||||
* Used to insure we do not endlessly try to auto dial peers we have recently | ||||||
* failed to dial. | ||||||
SgtPooki marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
*/ | ||||||
export const LAST_DIAL_FAILURE_KEY = 'last-dial-failure' |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -7,13 +7,15 @@ import { dnsaddrResolver } from '@multiformats/multiaddr/resolvers' | |||||
import { type ClearableSignal, anySignal } from 'any-signal' | ||||||
import pDefer from 'p-defer' | ||||||
import PQueue from 'p-queue' | ||||||
import { fromString as uint8ArrayFromString } from 'uint8arrays/from-string' | ||||||
import { codes } from '../errors.js' | ||||||
import { getPeerAddress } from '../get-peer.js' | ||||||
import { | ||||||
DIAL_TIMEOUT, | ||||||
MAX_PARALLEL_DIALS_PER_PEER, | ||||||
MAX_PARALLEL_DIALS, | ||||||
MAX_PEER_ADDRS_TO_DIAL | ||||||
MAX_PEER_ADDRS_TO_DIAL, | ||||||
LAST_DIAL_FAILURE_KEY | ||||||
} from './constants.js' | ||||||
import { combineSignals, resolveMultiaddrs } from './utils.js' | ||||||
import type { AddressSorter, AbortOptions, PendingDial } from '@libp2p/interface' | ||||||
|
@@ -230,9 +232,22 @@ export class DialQueue { | |||||
// clean up abort signals/controllers | ||||||
signal.clear() | ||||||
}) | ||||||
.catch(err => { | ||||||
.catch(async err => { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if i'm not mistaken, this dial-queue is also used for direct dials, should we discern between an auto-dial triggered dial vs a manually/user requested dial? |
||||||
log.error('dial failed to %s', pendingDial.multiaddrs.map(ma => ma.toString()).join(', '), err) | ||||||
|
||||||
if (peerId != null) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
to address manual dials? or do we want a specific dial option to flag whether we will update the LAST_DIAL_FAILURE_KEY for a peer that we only use with auto-dialer? |
||||||
// record the last failed dial | ||||||
try { | ||||||
await this.peerStore.patch(peerId, { | ||||||
maschad marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
metadata: { | ||||||
[LAST_DIAL_FAILURE_KEY]: uint8ArrayFromString(Date.now().toString()) | ||||||
} | ||||||
}) | ||||||
} catch (err: any) { | ||||||
log.error('could not update last dial failure key for %p', peerId, err) | ||||||
} | ||||||
} | ||||||
|
||||||
// Error is a timeout | ||||||
if (signal.aborted) { | ||||||
const error = new CodeError(err.message, codes.ERR_TIMEOUT) | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -66,6 +66,12 @@ export interface ConnectionManagerInit { | |
*/ | ||
autoDialMaxQueueLength?: number | ||
|
||
/** | ||
* When we've failed to dial a peer, do not autodial them again within this | ||
* number of ms. (default: 10 minutes) | ||
*/ | ||
autoDialPeerRetryThreshold?: number | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. honestly, even 1 minute would significantly help from my testing experience. Once we add in a badPeer/bad multiaddr, it will free up time to check others instead of constantly spamming the same addresses. |
||
|
||
/** | ||
* Sort the known addresses of a peer before trying to dial, By default public | ||
* addresses will be dialled before private (e.g. loopback or LAN) addresses. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,7 +11,9 @@ import delay from 'delay' | |
import pWaitFor from 'p-wait-for' | ||
import Sinon from 'sinon' | ||
import { stubInterface } from 'sinon-ts' | ||
import { fromString as uint8ArrayFromString } from 'uint8arrays/from-string' | ||
import { AutoDial } from '../../src/connection-manager/auto-dial.js' | ||
import { LAST_DIAL_FAILURE_KEY } from '../../src/connection-manager/constants.js' | ||
import { matchPeerId } from '../fixtures/match-peer-id.js' | ||
import type { Libp2pEvents } from '@libp2p/interface' | ||
import type { Connection } from '@libp2p/interface/connection' | ||
|
@@ -224,4 +226,56 @@ describe('auto-dial', () => { | |
// should only have queried peer store once | ||
expect(peerStoreAllSpy.callCount).to.equal(1) | ||
}) | ||
|
||
it('should not re-dial peers we have recently failed to dial', async () => { | ||
const peerWithAddress: Peer = { | ||
id: await createEd25519PeerId(), | ||
protocols: [], | ||
addresses: [{ | ||
multiaddr: multiaddr('/ip4/127.0.0.1/tcp/4001'), | ||
isCertified: true | ||
}], | ||
metadata: new Map(), | ||
tags: new Map() | ||
} | ||
const undialablePeer: Peer = { | ||
id: await createEd25519PeerId(), | ||
protocols: [], | ||
addresses: [{ | ||
multiaddr: multiaddr('/ip4/127.0.0.1/tcp/4002'), | ||
isCertified: true | ||
}], | ||
// we failed to dial them recently | ||
metadata: new Map([[LAST_DIAL_FAILURE_KEY, uint8ArrayFromString(`${Date.now() - 10}`)]]), | ||
tags: new Map() | ||
} | ||
|
||
await peerStore.save(peerWithAddress.id, peerWithAddress) | ||
await peerStore.save(undialablePeer.id, undialablePeer) | ||
|
||
const connectionManager = stubInterface<ConnectionManager>({ | ||
getConnectionsMap: new PeerMap(), | ||
getDialQueue: [] | ||
}) | ||
|
||
autoDialler = new AutoDial({ | ||
peerStore, | ||
connectionManager, | ||
events | ||
}, { | ||
minConnections: 10, | ||
autoDialInterval: 10000 | ||
}) | ||
autoDialler.start() | ||
void autoDialler.autoDial() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See my other comment as well, but do we need all the test scaffolding for auto-dialer or can we just attempt to dial the peer directly? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. regardless of the answer here, we may want another test regarding direct-dials covering the expected cases (dial them again even if failed auto dial previously, because it's a direct dial request) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The last-failed-dial key is only used by the auto-dialer to exclude those peers from autodialing, it's not used by explicit dials. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see that now, thanks |
||
|
||
await pWaitFor(() => { | ||
return connectionManager.openConnection.callCount === 1 | ||
}) | ||
await delay(1000) | ||
maschad marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
expect(connectionManager.openConnection.callCount).to.equal(1) | ||
expect(connectionManager.openConnection.calledWith(matchPeerId(peerWithAddress.id))).to.be.true() | ||
expect(connectionManager.openConnection.calledWith(matchPeerId(undialablePeer.id))).to.be.false() | ||
}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should add another test to ensure that the undialable peer is eventually dialled once the threshold has expired. |
||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a more intentional variable name here would be nice