Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mute denied nodes #322

Merged
merged 14 commits into from
Mar 29, 2021
Merged

Conversation

dvdplm
Copy link
Contributor

@dvdplm dvdplm commented Mar 26, 2021

When a node is not allowed to connect, send a Mute message to NodeConnector.

A node can be disallowed to connect if the chain is over-quota or if the whole chain is banned, i.e. on the denylist.

Also dials down the logging of latest finalized block from info to debug.

Send a `Mute` message to `NodeConnector` when a node is from a chain on the denylist OR if the chain is overquota.
(Also: dial down logging of finalized blocks a bit)
@dvdplm dvdplm requested a review from maciejhirsz March 26, 2021 12:28
@dvdplm dvdplm self-assigned this Mar 26, 2021
type Result = ();
fn handle(&mut self, _msg: Mute, ctx: &mut Self::Context) {
log::trace!(target: "NodeConnector::Mute", "Muting a node");
ctx.stop();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I test this locally the nodes keep trying to reconnect every 5-ish seconds so there's plenty of churn here on this. Is there a way to disconnect the socket and trigger an error on the node side? I think (but not sure) that nodes keep trying on error but not forever, so if we could do that it might be a solution to the bandwidth consumption?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about adding Muted state to ConnMultiplex (might need a rename there) to then stop parsing messages in StreamHandler::handle, but still keeping the connection open. There is no way to keep it open without reading messages into buffers in actix-web AFAIK, but not doing the deserialization should help a lot (not if we are limited on the bandwidth though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying, instead of stopping the NodeConnector actor when muting a node, we'd change the state to ConnMultiplex::Muted? We'd still need the conn_id/msg.id() to be able to look it up in the self.multiplex, but maybe you're thinking of using something non-serde to look up that key and avoid deserializing the full NodeMessage?

Tell me though, is the gain you're thinking about that it's cheaper to keep the node connected but not deserialize any of their messages than closing the actor and create a new one ever n seconds when they reconnect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the following hack: I added a muted: bool to the NodeConnector actor and instead of muting a node with ctx.close(None); ctx.stop(); I did self.muted = true and then changed the StreahHandler::handle to check this boolean as the first thing.
That would keep 1 NodeConnector actor around for every node but not process any of their messages. It feels pretty dirty and I'm not convinced it's better to keep the node connected rather than recreating them, but maybe you're right and that this is cheaper.

It feels to me like the right way to do this is for telemetry to close the connection with a proper reason and then change substrate to behave properly:

  • When a chain is denied, telemetry closes the connection with a CloseCode::Policy (1008) (or possibly CloseCode::Abnormal (1006)); substrate does not try to reconnect.
  • When a chain is over quota, telemetry closes the connection with a CloseCode::Again (1013) and the node is free to try to reconnect later (maybe with exponential backoff if we want to get fancy).

That seems like the only way we have to reduce both processing resources and bandwidth. Wdyt?

Copy link
Contributor

@maciejhirsz maciejhirsz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grumbles and a suggestion how to "mute" the connector.

backend/src/node/connector.rs Show resolved Hide resolved
type Result = ();
fn handle(&mut self, _msg: Mute, ctx: &mut Self::Context) {
log::trace!(target: "NodeConnector::Mute", "Muting a node");
ctx.stop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about adding Muted state to ConnMultiplex (might need a rename there) to then stop parsing messages in StreamHandler::handle, but still keeping the connection open. There is no way to keep it open without reading messages into buffers in actix-web AFAIK, but not doing the deserialization should help a lot (not if we are limited on the bandwidth though).

backend/src/aggregator.rs Outdated Show resolved Hide resolved
backend/src/aggregator.rs Outdated Show resolved Hide resolved
backend/src/aggregator.rs Outdated Show resolved Hide resolved
backend/src/node/connector.rs Outdated Show resolved Hide resolved
dvdplm and others added 8 commits March 27, 2021 22:08
Co-authored-by: Maciej Hirsz <1096222+maciejhirsz@users.noreply.github.com>
Co-authored-by: Maciej Hirsz <1096222+maciejhirsz@users.noreply.github.com>
Co-authored-by: Maciej Hirsz <1096222+maciejhirsz@users.noreply.github.com>
Co-authored-by: Maciej Hirsz <1096222+maciejhirsz@users.noreply.github.com>
Addr's version of `do_send` does not return `Result`
Send Close frame with reason when the remote missed a heartbeat
@dvdplm dvdplm merged commit 3da4ac2 into master Mar 29, 2021
@dvdplm dvdplm deleted the dp-mute-node-connector-when-denied-or-overquota branch March 29, 2021 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants