Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement clock skew interceptor #972

Merged
merged 54 commits into from
Oct 17, 2023
Merged

feat: implement clock skew interceptor #972

merged 54 commits into from
Oct 17, 2023

Conversation

lauzadis
Copy link
Contributor

@lauzadis lauzadis commented Oct 6, 2023

Implements a clock skew interceptor which detects clock skews and applies corrections. The clock skew is stored as part of the interceptor's state so it can be applied to all future requests.

Issue #

Closes awslabs/aws-sdk-kotlin#213

Description of changes

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@lauzadis lauzadis marked this pull request as ready for review October 6, 2023 18:32
@lauzadis lauzadis requested a review from a team as a code owner October 6, 2023 18:32
}

// Clock skew to be applied to all requests
private var currentSkew: Duration = Duration.ZERO
Copy link
Contributor Author

@lauzadis lauzadis Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is this subject to race conditions, and should this be an Atomic?

(can a single client invoke multiple operations at once?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is, a client can execute many operations concurrently.

Copy link
Contributor

@aajtodd aajtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, I'm going to hold off on the sibling PR until we get some things cleaned up here.

@@ -153,6 +154,15 @@ public class AwsHttpSigner(private val config: Config) : HttpSigner {
}
}

// Apply clock skew if necessary
signingConfig = if (attributes.contains(HttpOperationContext.ClockSkew)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can just move this up to inside where we create the signing config and set all the other parameters negating the need to call tobuilder(). We might also want to clarify how this interacts AwsSigningAttributes.SigningDate when it's set in the context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to my other comment about signingDate defaulting to Instant.now(). We currently depend on that codepath in the Builder, I could refactor to default Instant.now() in this builder and then move this clock skew inside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yeah I think we can move this into the builder still and just use Instant.now() + skew. We need to figure out how this behaves w.r.t pre-existing signing dates in the context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have no code that sets the signing date via context. I think if someone creates a custom interceptor to set signing date, we should just use that and not apply skew. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine as long as we document it. IIRC it's reason for existing is to facilitate testing.

// Apply clock skew if necessary
signingConfig = if (attributes.contains(HttpOperationContext.ClockSkew)) {
val builder = signingConfig.toBuilder()
builder.signingDate = builder.signingDate?.plus(attributes[HttpOperationContext.ClockSkew])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: This is wrong, this will only apply the skew if builder.signingDate isn't null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually never null by this point, because when building the config we either take the configured value or Instant.now().

}

// Clock skew to be applied to all requests
private var currentSkew: Duration = Duration.ZERO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is, a client can execute many operations concurrently.

val logger = coroutineContext.logger<ClockSkewInterceptor>()

if (currentSkew != Duration.ZERO) {
logger.info { "applying clock skew $currentSkew to client" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should be at debug level.

/**
* An error indicating that the client and server clock are skewed. A retry may be attempted after correction.
*/
Skew,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we need a different retry error type for this? I'm thinking this is a ClientSide error most likely since your clock is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this new error type was created during my early iterating, I think ClientSide makes sense

@@ -43,6 +43,8 @@ public expect class Instant : Comparable<Instant> {
*/
public operator fun minus(duration: Duration): Instant

public fun until(other: Instant): Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix Missing tests and docs. Also could this be an extension function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's gained by making this an extension function, just more idiomatic? It's already used as an extension function myInstant.until(otherInstant)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just more idiomatic

public override suspend fun modifyBeforeDeserialization(context: ProtocolResponseInterceptorContext<Any, HttpRequest, HttpResponse>): HttpResponse {
val logger = coroutineContext.logger<ClockSkewInterceptor>()

val clientTime = context.protocolRequest.headers["x-amz-date"]?.let {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Should we also be looking at Date header on the request and take first one that we find out of (x-amz-date, Date)?

Also should this check come after looking for server time and fallback to Instant.now()? That could be risky if the delta between when the server set the response header and when we get to here is large but in practice it should be way less than the skew threshold so it seems reasonable safe as fallback. This appears to be what Java does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my tests, we don't send a Date header, but sure, I can check both headers and then fallback to Instant.now()

* It may be negative if the serverTime is in the past.
* @param serverTime the server's time
*/
public fun Instant.getSkew(serverTime: Instant): Duration = this.until(serverTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem like they could be internal

/**
* After receiving a response, check if the client clock is skewed and apply a correction if necessary.
*/
public override suspend fun modifyBeforeDeserialization(context: ProtocolResponseInterceptorContext<Any, HttpRequest, HttpResponse>): HttpResponse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Should this only apply clock skew if there was an error? I suppose if we have breached the threshold it shouldn't hurt anything in theory to start compensating before it becomes an issue 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was leaning towards applying skew even if there hasn't been an error yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from what Java v2 does. They only begin applying clock skew correction when they receive an exception indicating the error (see RetryableStageHelper and ClockSkewAdjuster).

This hardcodes us into believing that anything beyond 4 minutes of skew is intolerable no matter what the service thinks. Maybe some services choose to handle more. Maybe some services don't care about clock skew. I believe we should only handle skew updates during clock skew exceptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should only handle skew updates during clock skew exceptions.

I can live with this, but devil's advocate...whats the harm in correcting for it proactively?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We spoke offline and agreed we could ship this as-is and follow up if we see issues with it.

The main concern is if a service starts enforcing sub-minute skew, the current implementation would fail to comply because the threshold is set at 4 minutes.

Copy link
Contributor

@aajtodd aajtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good but some correctness issues and question about where this should live to address.

ex?.sdkErrorMetadata?.attributes?.set(ErrorMetadata.Retryable, true)
ex?.let { return Result.failure(it) }
} else {
logger.trace { "client clock ($clientTime) is not skewed from the server ($serverTime)" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We probably don't need this log statement to say we didn't do anything.

Instant.fromIso8601(it)
} ?: context.executionContext[HttpOperationContext.ClockSkewApproximateSigningTime]

if (clientTime.isSkewed(serverTime, context.protocolResponse?.status?.description)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: This is using the HttpStatusCode description which is not the same as the the AWS error code that the isSkewed implementation is looking for. The change to predicate off exceptions probably means we need to move this interceptor into aws-sdk-kotlin/aws-runtime/aws-http package instead of smithy-kotlin since this is now AWS specific.

Comment on lines 55 to 61
// Errors possibly caused by clock skew
private val POSSIBLE_CLOCK_SKEW_ERROR_CODES = listOf(
"PriorRequestNotComplete",
"RequestTimeout",
"RequestTimeoutException",
"InternalError",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: These don't appear to be related to clock skew. Looks like Java considers InvalidSignatureException, SignatureDoesNotMatch, and AuthFailure as possible errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 Good catch. I took the wrong list of exceptions from Java and then convinced myself they could be caused by clock skew.

* @param responseCodeDescription the server's response code description
* @param serverTime the server's time
*/
internal fun Instant.isSkewed(serverTime: Instant, responseCodeDescription: String?): Boolean =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: responseCodeDescription -> errorCode

Also responseCodeDescription should probably be non-nullable since this always returns false otherwise which means it's really a required parameter for determining if a clock is skewed.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@lauzadis lauzadis merged commit 4a0dda6 into main Oct 17, 2023
13 of 14 checks passed
@lauzadis lauzadis deleted the feat-clock-skew branch October 17, 2023 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle Clock Skew
3 participants