Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Test] Add unit test for validator Router connections #3198

Merged
merged 4 commits into from
Apr 3, 2024

Conversation

vicsn
Copy link
Contributor

@vicsn vicsn commented Apr 1, 2024

Motivation

Due to the recently introduced --allow-external-peers flag, validator Routers do not by default accept connections from other validator Routers. We noticed on tests with only validators that the snarkos_router_connected_total metric returned 0.

To make this behavior clear, added a unit test test_validator_connection. In order for validator's Routers to connect to one another, their address has to be explicitly passed via --peers.

Note that if we want to change that behaviour, we need to either:

  • add Gateway features to Router (e.g. a Resolver to track whether we are not connected to the same address twice)
  • do a bigger refactor to merge Gateway and Router so the validator has just one TCP stack

Related PRs

Closes https://github.com/AleoHQ/snarkOS/issues/3196
Corrects yet another regression from https://github.com/AleoHQ/snarkOS/pull/3163
Improves on https://github.com/AleoHQ/snarkOS/pull/3192

@vicsn vicsn requested review from joske, niklaslong and kpandl April 1, 2024 18:06
@vicsn vicsn marked this pull request as ready for review April 1, 2024 18:41
@@ -251,7 +252,7 @@ impl<N: Network> Router<N> {
}

/// Ensure the peer is allowed to connect.
fn ensure_peer_is_allowed(&self, peer_ip: SocketAddr) -> Result<()> {
fn ensure_peer_is_allowed(&self, peer_ip: SocketAddr, node_type: NodeType) -> Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we have to trust the node type as it's given to us by the peer, I don't think we previously used it for anything so we're introducing a new assumption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. The easiest immediate resolution is to pass in validator routers explicitly via --peers. I added a unit test for it and updated the PR description. 2974612

// Connect node0 to node1.
node0.connect(node1.local_ip());
// Sleep briefly.
tokio::time::sleep(Duration::from_millis(200)).await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deadline could be used here to avoid the sleep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will keep it in mind for future tests, for now I'll keep the logic similar to the rest of the tests in the file.

Copy link
Collaborator

@ljedrz ljedrz Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC connect returns a JoinHandle that can be .awaited (and once it's complete, it means the connection is either ready or that it has failed), so there is no need for a sleep here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried it. Just awaiting node0's connect is not enough, because node1 might not be connected yet. I suggest we leave refactoring this file for later.

@vicsn vicsn changed the title Ensure validator Router accepts connections from validators Add unit test validating validator Router connections Apr 2, 2024
@vicsn vicsn changed the title Add unit test validating validator Router connections Add unit test for validator Router connections Apr 2, 2024
@howardwu howardwu requested a review from ljedrz April 2, 2024 16:54
// Connect node1 to node0.
node1.connect(node0.local_ip());
// Sleep briefly.
tokio::time::sleep(Duration::from_millis(200)).await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto comment above


{
// Connect node0 to node1.
node0.connect(node1.local_ip());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't we racing against the Heartbeat (that might cause node1 to attempt to connect to node0) here? then again, that direction should fail, so it's probably fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heartbeat is not turned on, only if we call initialize_routing.

Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits addressed.

@howardwu howardwu merged commit bbd5aaa into mainnet-staging Apr 3, 2024
0 of 20 checks passed
@howardwu howardwu deleted the ensure_validator_connects_to_validators branch April 3, 2024 21:49
@howardwu howardwu changed the title Add unit test for validator Router connections [Test] Add unit test for validator Router connections Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Peers are not sufficiently connecting to each other
4 participants