Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Monitoring Report Interval for low allowable_throughput_failure_interval_seconds #451

Closed
wants to merge 5 commits into from

Conversation

waahm7
Copy link
Contributor

@waahm7 waahm7 commented Aug 23, 2023

Description of changes:
The report interval was 1 second by default, which can be slow if allowable_throughput_failure_interval_seconds is also low like 1 second in the IMDS Client. It took around 3 seconds to figure out if the connection was stuck. This change improves that time to around 1.5 seconds.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@waahm7 waahm7 changed the title Fix Monitoring Report Interval Improve Monitoring Report Interval for Low allowable_throughput_failure_interval_seconds Aug 23, 2023
@waahm7 waahm7 changed the title Improve Monitoring Report Interval for Low allowable_throughput_failure_interval_seconds Improve Monitoring Report Interval for low allowable_throughput_failure_interval_seconds Aug 23, 2023
Copy link
Contributor

@graebm graebm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked offline and you gave the stats of:

/3 = ~1.6 seconds
/5 = ~1.4 seconds
/10 = ~1.2 seconds

maybe we should go for /10? I mean ... being 20% off still isn't fantastic, and if you're setting the timeout to 1sec, you obviously want that timeout to be fast

Copy link
Contributor

@TingDaoK TingDaoK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we are using minimum_throughput_bytes_per_second.

So that if we are sampling the data less than 1 secs, we will have a inaccurate rate per second :(

The failing test has

    struct aws_http_connection_monitoring_options monitor_opt = {
        .allowable_throughput_failure_interval_seconds = 1,
        .minimum_throughput_bytes_per_second = 1000,
    };

The test failing because is doing a 900 Byte transfer after a second, so as we sampling the data every 300 ms, there will be two sampling result in 0 B/s, but one sampling results in 3000 B/s.

And the test allows 1 secs to failure. So, in 1 secs, there always be one sampling is larger than the minimal, so that it never shutdown the connection as expected.

:(

@waahm7 waahm7 closed this Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants