You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When stopping the Discovery domain, you need to await for the current task T1 to finish (i.e. one iteration of the discovery queue, where we discover a node/identity and its linked nodes/identities).This is because we don't have the ability to abort currently asynchronous side-effectful tasks which is scheduled in #297. The task itself involves establishing a node connection to the remote agent N 2, however, an edge case that we have not fully considered is one where N 2 has shutdown and is no longer running. In such a situation, the connection timeout which is passed from NodeConnectionManager to NodeConnection to GRPCClientAgent to GRPCClient is what is going to determine how long to wait for connection readiness (and thus how long until we can catch an error and exit the discovery process). This timeout is set to 20s for NodeConnectionManager, which is propagated to all connection timeouts.
In instances of this behaviour, you'll see retried attempts to connect through the proxy. Then the ErrorGRPCClientTimeout should be thrown, which is then rethrown as ErrorNodeConnectionTimeout. You should get this exception on withConnF, which is used by requestChainData in NodeManager, which is called by Discovery.
We need to ensure that this is indeed the sequence of events in practice, and we need to ensure that errors are correctly caught and logged out.
In our Discovery, the default timeout shouldn't be 20s, that's too long. The withConnF method should be able to override the default timeout set in NodeConnectionManager, for example by providing a value as a parameter.
Specification
When stopping the
Discovery
domain, you need to await for the current taskT1
to finish (i.e. one iteration of the discovery queue, where we discover a node/identity and its linked nodes/identities).This is because we don't have the ability to abort currently asynchronous side-effectful tasks which is scheduled in #297. The task itself involves establishing a node connection to the remote agentN 2
, however, an edge case that we have not fully considered is one whereN 2
has shutdown and is no longer running. In such a situation, the connection timeout which is passed fromNodeConnectionManager
toNodeConnection
toGRPCClientAgent
toGRPCClient
is what is going to determine how long to wait for connection readiness (and thus how long until we can catch an error and exit the discovery process). This timeout is set to 20s forNodeConnectionManager
, which is propagated to all connection timeouts.In instances of this behaviour, you'll see retried attempts to connect through the proxy. Then the
ErrorGRPCClientTimeout
should be thrown, which is then rethrown asErrorNodeConnectionTimeout
. You should get this exception onwithConnF
, which is used byrequestChainData
inNodeManager
, which is called byDiscovery
.We need to ensure that this is indeed the sequence of events in practice, and we need to ensure that errors are correctly caught and logged out.
Additional context
Tasks
Discovery
, the default timeout shouldn't be 20s, that's too long. ThewithConnF
method should be able to override the default timeout set inNodeConnectionManager
, for example by providing a value as a parameter.T1
when we stop the discovery instead of waiting for it to finish. In this case ifT1
finishes even after stopping, ensure thatT1
is removed from the DB, so you don't redo the work.The text was updated successfully, but these errors were encountered: