Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use tokio::time::timeout to timeout fetch state and send encrypted #889

Merged
merged 10 commits into from
Oct 17, 2024

Conversation

ppca
Copy link
Contributor

@ppca ppca commented Oct 13, 2024

The previous implementation caused troubles: 1) 1s timeout have already stuck the protocol; 2) 500ms timeout still see many timeouts on /state endpoint.
I am guessing the .timeout() that came with reqwest package cannot successfully time out an async call, the counting of time may not have considered task/thread switching.
So I switched to using tokio::time::timeout and now problem solved.
Dev success increased from 70% to 100%.

This change sets the timeout to 1000ms for both fetch /state and for send encrypted. Both can be tuned as an environment variable.

@ppca ppca changed the title Xiangyi/tune refresh intervals fix timeout code with tokio Oct 13, 2024
url
);
continue;
let work = self.http.get(url.clone()).send();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have issues with node stability, should we add this info to metrics? Each node can report the number of active participants. We can even add an alert on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good! I'll do it in another PR since this PR will already be big

@ppca ppca changed the title fix timeout code with tokio use tokio::time::timeout to timeout fetch state and send encrypte Oct 15, 2024
@ppca ppca changed the title use tokio::time::timeout to timeout fetch state and send encrypte use tokio::time::timeout to timeout fetch state and send encrypted Oct 15, 2024
@ppca ppca marked this pull request as ready for review October 15, 2024 21:17
@volovyks volovyks linked an issue Oct 16, 2024 that may be closed by this pull request
@ppca ppca force-pushed the xiangyi/tune_refresh_intervals branch from 6201a06 to 58d7056 Compare October 16, 2024 18:59
@ppca
Copy link
Contributor Author

ppca commented Oct 16, 2024

as discussed with @volovyks , since we might replace the client later with sockets etc, we will not refactor the timeouts into contract.

@ChaoticTempest
Copy link
Member

as discussed with @volovyks , since we might replace the client later with sockets etc, we will not refactor the timeouts into contract.

I see, then let's not bake these sort of configuration into terraform at all and just keep with the defaults in the CLI itself for now

Copy link
Collaborator

@volovyks volovyks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -63,6 +63,10 @@ pub enum Cli {
/// referer header for mainnet whitelist
#[arg(long, env("MPC_CLIENT_HEADER_REFERER"), default_value(None))]
client_header_referer: Option<String>,
#[clap(flatten)]
mesh_options: mesh::Options,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do partners need to update their terraform setup?

@ppca ppca merged commit ecc6fd3 into develop Oct 17, 2024
3 checks passed
@ppca ppca deleted the xiangyi/tune_refresh_intervals branch October 17, 2024 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix 12 node system
3 participants