Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: decrease generated artifact size #1057

Merged
merged 4 commits into from
Apr 2, 2024
Merged

refactor: decrease generated artifact size #1057

merged 4 commits into from
Apr 2, 2024

Conversation

aajtodd
Copy link
Contributor

@aajtodd aajtodd commented Mar 20, 2024

Issue #

see awslabs/aws-sdk-kotlin#411

Description of changes

This PR attempts to decrease the generated artifact size of service clients by doing the following:

  • Inline several higher order function calls that are used fairly heavily in generated serialization code
  • Remove suspend from most HTTP operation serializers and deserializers

The changes and results are detailed in the sections below for each of these.

Inline higher order functions

You might consider this a bug since it was introduced with a refactor but in any case we have a lot of generated code
in serializers and deserializers that looks something like:

internal class PutBucketLifecycleConfigurationOperationSerializer: HttpSerialize<PutBucketLifecycleConfigurationRequest> {
    override suspend fun serialize(context: ExecutionContext, input: PutBucketLifecycleConfigurationRequest): HttpRequestBuilder {
        val builder = HttpRequestBuilder()
        builder.method = HttpMethod.PUT

        builder.url {
            path.trailingSlash = true
            parameters.decodedParameters {
                add("lifecycle", "")
            }
        }

        builder.headers {
            if (input.checksumAlgorithm != null) append("x-amz-sdk-checksum-algorithm", input.checksumAlgorithm.value)
            if (input.expectedBucketOwner?.isNotEmpty() == true) append("x-amz-expected-bucket-owner", input.expectedBucketOwner)
        }

        if (input.lifecycleConfiguration != null) {
            val payload = serializeBucketLifecycleConfigurationPayloadWithXmlNameLifecycleConfiguration(input.lifecycleConfiguration)
            builder.body = HttpBody.fromBytes(payload)
        }
        if (builder.body !is HttpBody.Empty) {
            builder.headers.setMissing("Content-Type", "application/xml")
        }
        return builder
    }
}

All of the invocations like builder.url {...}, builder.headers {...}, parameters.decodedParameters{...}, etc take
a lambda argument. This results in a lot of backing classes to hold the captured state (e.g. input) from the outer context.

main

> ls -lsa services/*/build/libs/*-jvm*.jar
4196 -rw-r--r-- 1 todaaron staff 3652000 Mar 20 09:06 services/dynamodb/build/libs/dynamodb-jvm-1.1.1-SNAPSHOT.jar
5768 -rw-r--r-- 1 todaaron staff 5083203 Mar 20 09:06 services/s3/build/libs/s3-jvm-1.1.1-SNAPSHOT.jar

> ls -lsa aws-runtime/aws-config/build/libs/*-jvm*.jar
1080 -rw-r--r-- 1 todaaron staff 1101995 Mar 20 09:05 aws-runtime/aws-config/build/libs/aws-config-jvm-1.1.1-SNAPSHOT.jar

with inlining

> ls -lsa services/*/build/libs/*-jvm*.jar
4448 -rw-r--r-- 1 todaaron staff 3601011 Mar 20 09:12 services/dynamodb/build/libs/dynamodb-jvm-1.1.1-SNAPSHOT.jar
4860 -rw-r--r-- 1 todaaron staff 4794421 Mar 20 09:13 services/s3/build/libs/s3-jvm-1.1.1-SNAPSHOT.jar

> ls -lsa aws-runtime/aws-config/build/libs/*-jvm*.jar
1072 -rw-r--r-- 1 todaaron staff 1096939 Mar 20 09:12 aws-runtime/aws-config/build/libs/aws-config-jvm-1.1.1-SNAPSHOT.jar

DELTA AFTER INLININING

Artifact Delta %
Dynamodb -1.39%
S3 -5.68%
aws-config -0.46%

Remove most suspend points for generated HttpSerde

The only serializers and deserializers that suspend are the ones that deal with streaming types but we generate all operation serializers and deserializers as if they will suspend. Deserializers that just read the payload only suspend to pull the payload into memory to invoke the format (e.g. JSON, XML, etc) deserializer on it. This suspension point can be lifted into the runtime by providing separate interfaces for suspend and non.

> ls -lsa services/*/build/libs/*-jvm*.jar
3284 -rw-r--r-- 1 todaaron staff 3359574 Mar 20 11:53 services/dynamodb/build/libs/dynamodb-jvm-1.1.1-SNAPSHOT.jar
4740 -rw-r--r-- 1 todaaron staff 4490532 Mar 20 11:54 services/s3/build/libs/s3-jvm-1.1.1-SNAPSHOT.jar

> ls -lsa aws-runtime/aws-config/build/libs/*-jvm*.jar
1024 -rw-r--r-- 1 todaaron staff 1046552 Mar 20 11:52 aws-runtime/aws-config/build/libs/aws-config-jvm-1.1.1-SNAPSHOT.jar

DELTA FROM INLINING

Artifact Delta %
Dynamodb -6.70%
S3 -6.34%
aws-config -4.59%

Totals after inlining + http serde changes

Total delta with both inlining and HTTP serde changes compared to original (JVM) artifact sizes

Artifact Original Size Bytes After Size Bytes Delta %
Dynamodb 3652000 3359574 -8.34%
S3 5083203 4490532 -12.38%
aws-config 1101995 1046552 -5.16%

Appendix

The extracted artifacts before and after changes:

Latest S3 JVM jar:

> du -h                                                                                                                                                                                                 12:09:05 [1/17]
476K    ./endpoints/internal
596K    ./endpoints
 80K    ./paginators
140K    ./express
 48K    ./auth
 52K    ./internal
 60K    ./waiters
 52K    ./presigners
8.5M    ./model
6.7M    ./serde
 17M    .

After inlining + HTTP serde

> du -h
476K    ./endpoints/internal
596K    ./endpoints
 80K    ./paginators
140K    ./express
 48K    ./auth
 52K    ./internal
 60K    ./waiters
 36K    ./presigners
8.5M    ./model
4.9M    ./serde
 15M    .

For comparison with Java v2 SDK:

Java S3 latest:

s3-2.25.9.jar                                     2024-03-13 22:15   3572387      

Java DDB latest:

dynamodb-2.25.9.jar                               2024-03-13 22:17   2744634  

Next Steps


SdkSerializable

As noted in awslabs/aws-sdk-kotlin#411 (comment) the way we generate nested struct/union serialization causes backing classes to be generated to hold the required state. I looked for ways to remove this but none are easy/clean. The best solution here is to revisit serialization and make it format specific like we did for XML deserialization . This would remove quite a bit of size from artifacts I'd imagine as we have a lot of these in practice.

/**
 * Payload serializer for WebsiteConfiguration with a different XML name trait (WebsiteConfiguration)
 */
internal fun serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration(input: WebsiteConfiguration): ByteArray {
    val serializer = XmlSerializer()
    val ERRORDOCUMENT_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Struct, XmlSerialName("ErrorDocument"))
    val INDEXDOCUMENT_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Struct, XmlSerialName("IndexDocument"))
    val REDIRECTALLREQUESTSTO_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Struct, XmlSerialName("RedirectAllRequestsTo"))
    val ROUTINGRULES_DESCRIPTOR = SdkFieldDescriptor(SerialKind.List, XmlSerialName("RoutingRules"), XmlCollectionName("RoutingRule"))
    val OBJ_DESCRIPTOR = SdkObjectDescriptor.build {
        trait(XmlSerialName("WebsiteConfiguration"))
        trait(XmlNamespace("http://s3.amazonaws.com/doc/2006-03-01/"))
        field(ERRORDOCUMENT_DESCRIPTOR)
        field(INDEXDOCUMENT_DESCRIPTOR)
        field(REDIRECTALLREQUESTSTO_DESCRIPTOR)
        field(ROUTINGRULES_DESCRIPTOR)
    }

    serializer.serializeStruct(OBJ_DESCRIPTOR) {
        input.errorDocument?.let { field(ERRORDOCUMENT_DESCRIPTOR, it, ::serializeErrorDocumentDocument) }
        input.indexDocument?.let { field(INDEXDOCUMENT_DESCRIPTOR, it, ::serializeIndexDocumentDocument) }
        input.redirectAllRequestsTo?.let { field(REDIRECTALLREQUESTSTO_DESCRIPTOR, it, ::serializeRedirectAllRequestsToDocument) }
        if (input.routingRules != null) {
            listField(ROUTINGRULES_DESCRIPTOR) {
                for (el0 in input.routingRules) {
                    serializeSdkSerializable(asSdkSerializable(el0, ::serializeRoutingRuleDocument))
                }
            }
        }
    }
    return serializer.toByteArray()
}

All of the field(<DESCRIPTOR>, T, ::serializeFoo) calls and serializeSdkSerializable(...) calls generate an additional backing class.

> javap WebsiteConfigurationPayloadSerializerKt*

Compiled from "WebsiteConfigurationPayloadSerializer.kt"
final class aws.sdk.kotlin.services.s3.serde.WebsiteConfigurationPayloadSerializerKt$serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration$1$4$1 extends kotlin.jvm.internal.FunctionReferenceImpl implem
ents kotlin.jvm.functions.Function2<aws.smithy.kotlin.runtime.serde.Serializer, aws.sdk.kotlin.services.s3.model.RoutingRule, kotlin.Unit> {
  public static final aws.sdk.kotlin.services.s3.serde.WebsiteConfigurationPayloadSerializerKt$serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration$1$4$1 INSTANCE;
  aws.sdk.kotlin.services.s3.serde.WebsiteConfigurationPayloadSerializerKt$serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration$1$4$1();
  public final void invoke(aws.smithy.kotlin.runtime.serde.Serializer, aws.sdk.kotlin.services.s3.model.RoutingRule);
  public java.lang.Object invoke(java.lang.Object, java.lang.Object);
  static {};
}

Reduce operation error handling overhead

throwFooOperationError is a top level function that gets generated into a separate .class file. Class files
have an overhead though so it may be smaller to just encode this into the operation error deserializer interface so they share the same class file OR for AWS protocols at least we could combine all operation handlers into a single function like throwS3Error(...). This should work because AWS protocols all have the type of the error in the response and so having lots of separate functions is unnecessary. They would behave the same if combined into one.

> javap PutBucketLifecycleConfigurationOperationDeserializerKt.class
Compiled from "PutBucketLifecycleConfigurationOperationDeserializer.kt"
public final class aws.sdk.kotlin.services.s3.serde.PutBucketLifecycleConfigurationOperationDeserializerKt {
  public static final java.lang.Void access$throwPutBucketLifecycleConfigurationError(aws.smithy.kotlin.runtime.operation.ExecutionContext, aws.smithy.kotlin.runtime.http.HttpCall, byte[]);
}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@aajtodd aajtodd marked this pull request as ready for review March 28, 2024 18:44
@aajtodd aajtodd requested a review from a team as a code owner March 28, 2024 18:44
Comment on lines 16 to +17
@InternalApi
public sealed interface HttpSerializer<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: What benefit does nesting the variants of this interface provide? There are no common members and no way they could be used interchangeably...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespacing, HttpSerializer.Streaming vs HttpStreamingSerializer (or HttpSerializerStreaming).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's solely namespacing, it seems clearer to use object. Seeing a common parent HttpSerializer<T> interface in a few places tripped me up when first reviewing this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no common members and no way they could be used interchangeably...

I'm not sure what this has to do with whether to nest variants inside the body of the sealed interface or not. It's not solely namespacing as you still need the parent sealed interface.

Comment on lines +19 to +33
/**
* Serializer for streaming operations that need full control over serialization of the body
*/
@InternalApi
public interface Streaming<T> : HttpSerializer<T> {
public suspend fun serialize(context: ExecutionContext, input: T): HttpRequestBuilder
}

/**
* Serializer for non-streaming (simple) operations that don't need to ever suspend.
*/
@InternalApi
public interface NonStreaming<T> : HttpSerializer<T> {
public fun serialize(context: ExecutionContext, input: T): HttpRequestBuilder
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Could these be made fun interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could be but I think it was perhaps a mistake originally to do so as we don't know if we'll want to ever expand the interface in some way in the future which we cannot do with a functional interface.

Comment on lines +40 to +48
@Suppress("DEPRECATION")
internal constructor(
execution: SdkOperationExecution<I, O>,
context: ExecutionContext,
serializer: HttpSerialize<I>,
deserializer: HttpDeserialize<O>,
typeInfo: OperationTypeInfo,
telemetry: SdkOperationTelemetry,
) : this(execution, context, serializer.intoSerializer(), deserializer.intoDeserializer(), typeInfo, telemetry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Mark as @Deprecated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's needed since HttpSerialize<T> and HttpDeserialize<T> are deprecated which will trigger a deprecation warning anyway.

Comment on lines +44 to 46
@Suppress("DEPRECATION")
private fun buildOperation(
requestBody: String = "{\"TableName\": \"foo\"}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: In general, this PR suppresses a lot of now-deprecated patterns in existing tests. Shouldn't we update those tests to use the modern methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated many of them but some of them. use UnitSerializer or IdentitySerializer and I didn't feel like re-writing them. The changes we made here are an optimization, there isn't anything wrong with the old interfaces other than the way they are used in codegen introduces more suspend paths than is actually necessary.

@@ -77,7 +77,7 @@ abstract class HttpBindingProtocolGenerator : ProtocolGenerator {
* The function should have the following signature:
*
* ```
* suspend fun throwFooOperationError(context: ExecutionContext, call: HttpCall): Nothing {
* fun throwFooOperationError(context: ExecutionContext, call: HttpCall, payload: HttpByteArray?): Nothing {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: HttpByteArray -> ByteArray

Comment on lines +246 to +249
val requestBuilder = when (serializer) {
is HttpSerializer.NonStreaming -> serializer.serialize(modified.context, modified.subject)
is HttpSerializer.Streaming -> serializer.serialize(modified.context, modified.subject)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are doing the same thing, is the when block necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, there is no common method between the two you have to "unwrap" the actual type by matching on it in it with when.

Comment on lines 16 to +17
@InternalApi
public sealed interface HttpSerializer<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's solely namespacing, it seems clearer to use object. Seeing a common parent HttpSerializer<T> interface in a few places tripped me up when first reviewing this PR.

@aajtodd aajtodd merged commit 5acf7ef into main Apr 2, 2024
13 checks passed
@aajtodd aajtodd deleted the ft-shrink branch April 2, 2024 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants