-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add static stability support to IMDS credentials provider #2191
Conversation
This commit removes logic of provider timeout from `LazyCredentialsCache`. Prior to the commit, it raced provider's `provide_credentials` method against a timeout future; if the timeout future won, it always yielded a `ProviderTimeOut`. However, this did not work well with the timeout behavior of `ImdsCredentialsProvider` described in the static stability design, which says it should use expired credentials even in the face of a credentials read timeout. The logic in question has been moved to the `ProvideCredentials` trait with the aim of each trait implementer defining what to do in the case of read timeout.
This commit enhances the reliability of `ImdsCredentialsProvider`. It allows requests to be dispatched even with expired credentials. This in turn allows the target service to makes the ultimate decision as to whether requests sent are valid or not rather than the client SDK determining the their validity. The basic idea is that `ImdsCredentialsProvider` now stores a last retrieved credentials and can provide it if it cannot reach the IMDS endpoint.
This commit adds integration tests exercising use cases concerned with static stability. They use S3 as an example service for which credentials retrieved from IMDS are used, but it can be any service.
This commit reverts the attribute `#[cfg(feature = "test-util")]` given to `aws_credential_types::lazy_caching::Builder::time_source` to make it `#[doc(hidden)]` as the method is used in `aws_config::ConfigLoader` where `#[cfg(feature = "test-util")]` cannot be specified.
This commit is in response to failures in CI found at https://github.com/awslabs/smithy-rs/actions/runs/3879881661/jobs/6617584317
A new generated diff is ready to view.
A new doc preview is ready to view. |
timeout: Duration, | ||
) -> provider::Result { | ||
for (name, provider) in &self.providers { | ||
let timeout_per_provider = timeout.div_f64(self.providers.len() as f64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot be a simple division as each provider has a different idea of how long timeout should be. This implies that we should consider what it means for a credentials provider to time out in a broader scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least preserved today's behavior with 4592ca5, but a set of APIs provide_credentials
and provide_credentials_with_timeout
needs to be reconsidered.
This commit is in response to a failure in CI found at https://github.com/awslabs/smithy-rs/actions/runs/3885601988/jobs/6629970662
This commit moves tests in aws/sdk/integration-tests/imds to aws/sdk/integration-tests/s3, as there is no IMDS integration tests of code generation.
This commit addresses #2191 (comment). It attempts to preserve the same read timeout behavior for `CredentialsProviderChain` but certainly is not the best way to implement the read timeout aware credentials provider API.
A new generated diff is ready to view.
A new doc preview is ready to view. |
Ok(creds) => creds, | ||
_ => match &*self.last_retrieved_credentials.read().await { | ||
Some(creds) => Ok(creds.clone()), | ||
_ => Err(CredentialsError::provider_timed_out(timeout)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should return the original error rather than a timeout error since this error case can also happen in non-timeout cases (such as the IMDS calls failing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error case can also happen in non-timeout cases
Hmm, I could be wrong but isn't the original error (one coming out of non-timeout cases) captured in creds
whose type is Result<Credentials, CredentialsError>
?
// does not respect provider-specific read timeout behavior, e.g. the IMDS credentials provider | ||
// wants to provide expired credentials, if any, in the case of read timeout. | ||
let sleep_future = sleeper.sleep(timeout); | ||
let timeout_future = Timeout::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this top-level timeout cause a race condition with the IMDS provider's timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I realized that after pushing the commit. One possible workaround is to update the signature of provide_credentials_with_timeout
to take something like aws_smithy_async::rt::Sleep
but sharable.
CredentialsProviderChain::provide_credentials_with_timeout
passes it through the chain like "passing the torch" one provider to the next.
Regardless of whether the above idea is implemented as part of the PR, we will work on an RFC because the design around timeout and credentials providers affects not just ImdsCredentialsProvider
.
Will close this draft for now. Need to rework on what method to add to the |
Motivation and Context
Implements #2117
Description
This PR adds static stability support to the IMDS credentials provider. The basic idea is that
ImdsCredentialsProvider
now holds on to a last retrieved credentials and if it cannot fetch the latest credentials from IMDS possibly due to the service being unavailable, it will serve the last retrieved credentials instead. That way, the Rust SDK will not fail fast in sending a request with expired credentials and allows the target service to make the ultimate decision as to whether the request sent is valid.Even though IMDS is available, it may return stale credentials. Static stability support specifies that
ImdsCredentialsProvider
should extend the credentials expiry by the calculated amount. The implementation of that is defined inImdsCredentialsProvider::extend_expiration
and references that in botocore.The code changes in the PR that accomplish the above are mostly in
aws-config
. However, we found out that those changes alone are not enough to meet requirements for static stability support. Specifically, the timeout logic inLazyCredentialsCache
prior to this PR did not work well with the timeout logic required by the updatedImdsCredentialsProvider
(see cfeab3b for more details). The bottom lines is that the additional changes have been made inaws-credential-types
and theProvideCredentials
trait now has a new methodprovide_credentials_with_timeout
to address the said impedance mismatch.Testing
test_request_should_be_sent_when_first_call_to_imds_returns_expired_credentials
).Refresh Failures
).test_successive_requests_should_be_sent_with_expired_credentials_and_imds_being_called_only_once
).read_timeout_during_credentials_refresh_should_yield_last_retrieved_credentials
).test_request_should_be_sent_with_expired_credentials_after_imds_returns_500_during_credentials_refresh
).log_message_informing_expired_credentials_are_used
).Checklist
CHANGELOG.next.toml
if I made changes to the AWS SDK, generated SDK code, or SDK runtime cratesBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.