-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix panic when failing to create Duration
for exponential backoff from a large float
#3621
Conversation
A new generated diff is ready to view.
A new doc preview is ready to view. |
tracing::warn!( | ||
"could not create `Duration` for exponential backoff: {e}" | ||
); | ||
Err(ShouldAttempt::No) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems logical to me but just to be sure we've done our due diligence I'm going to poke at this.
- Why does it make sense to not attempt instead of just capping the backoff to
retry_cfg.max_backoff()
? - I glanced through the SEP I don't see any guidance on this scenario, have we asked other SDKs what they do to ensure there is consistency? It may be worth updating the SEP if this is a grey area.
Separate question, if we are setting 100 attempts what does the token bucket look like here? I kind of would have thought we'd be at least getting close to exceeding our quotas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this would be better:
Duration::try_from_secs_f64(backoff).unwrap_or(Duration::MAX).min(retry_cfg.max_backoff())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After internal discussion, we learned it's still debatable whether we should continue making attempts. So we're not going to deliberately make that decision in this PR (and not making that decision doesn't make anythings worse than it is today). We can still make an improvement in that area once we've reached a consensus.
This PR allows customers to perform retries even when overflow occurs, which is updated in 54ad1f0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duration::try_from_secs_f64(backoff).unwrap_or(Duration::MAX).min(retry_cfg.max_backoff())
This one-liner is tempting, but there was some clarification in the design that jitter also needs to be applied to max_backoff
. To make everything contained in one function (worrying about overflow, whether it should fallback to max_backoff
, and for the ease of testing), I put things the way I did in the above commit. Let me know if you can think of a different way.
A new generated diff is ready to view.
A new doc preview is ready to view. |
A new generated diff is ready to view.
A new doc preview is ready to view. |
A new generated diff is ready to view.
A new doc preview is ready to view. |
@@ -21,6 +21,9 @@ use std::{ | |||
process::Command, | |||
}; | |||
|
|||
const SMITHY_RS_RUNTIME_VERSIONER_AUDIT_PREVIOUS_RELEASE_TAG: &str = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why do we need an env variable when the arguments include previous_release_tag.
Also can we not make use of the env attribute so that clap
sources this for us into the Audit
command line arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During development, we usually run runtime-versioner
as part of pre-commit hooks. Do you know if there is a way to specify --previous-release-tag <some release tag>
to the above invocation done by pre-commit hooks?
Also can we not make use of the env attribute
Haven't used this feature before. Let me try that. Updated in 3b76d4b. Thanks for the suggestion.
A new generated diff is ready to view.
A new doc preview is ready to view. |
Motivation and Context
Avoids panic when Duration for exponential backoff could not be created from a large float.
Description
Duration::from_secs_f64 may panic. This PR switches to use a fallible sibling
Duration::try_from_secs_f64
to avoid panics. IfDuration::try_from_secs_f64
returns anErr
we fallback tomax_backoff
for subsequent retries. Furthermore, we learned from internal discussion that jitter also needs to be applied tomax_backoff
. This PR updatescalculate_exponential_backoff
to handle all the said business logic in one place.Testing
should_not_panic_when_exponential_backoff_duration_could_not_be_created
More details
Without changes in this PR:
With changes in this PR:
Checklist
CHANGELOG.next.toml
if I made changes to the smithy-rs codegen or runtime cratesCHANGELOG.next.toml
if I made changes to the AWS SDK, generated SDK code, or SDK runtime cratesAppendix
runtime-versioner bug fix
This PR also fixes a limitation in
runtime-versioner audit
. I included the fix in the PR because the issue occurred with special conditions, and we don't get to reproduce it every day. The way the issue manifests is this.release-2024-04-30
smithy-rs
release has been maderelease-2024-05-08
git merge main
, pre-commit hooks run, and we then getaudit
failures fromruntime-versioner
:This happens because when the latest
main
is being merged to our branch,runtime-versioner audit
should useprevious_release_tag
release-2024-05-08
to perform audit but pre-commit hooks run the tool using the latest previous release tag that can be traced back fromHEAD
of our branch, which isrelease-2024-04-30
. Hence the error.The fix adds an environment variable
SMITHY_RS_RUNTIME_VERSIONER_AUDIT_PREVIOUS_RELEASE_TAG
to specify a previous release tag override, in addition to a--previous-release-tag
command-line argument.Furthermore, the fix has relaxed certain checks in
audit
. Taking our example for instance, whenHEAD
is now behindrelease-2024-05-08
, it's OK to fail even ifrelease-2024-05-08
is not the ancestor ofHEAD
(as stated above,git merge-base --is-ancestor
does not know that while main is being merged) as long asrelease-2024-04-28
(the latest release seen fromHEAD
) is the ancestor ofrelease-2024-05-08
.To use the fix, set the environment variable with the new release tag and perform
git merge main
:By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.