Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Flexible checksums #1482

Closed
wants to merge 11 commits into from
Closed

Feature: Flexible checksums #1482

wants to merge 11 commits into from

Conversation

Velfi
Copy link
Contributor

@Velfi Velfi commented Jun 21, 2022

EDIT (2022/06/27)

At @rcoh 's suggestion, I've broken this PR up into smaller PRs to make review easier:

I'll open more PRs for remaining functionality once those get merged.

Motivation and Context

Supports the flexible checksums feature. Without this feature, we can't update our S3 models.

Description

See the flexible checksums RFC for details on what this PR accomplishes.

Testing

I wrote several tests to test the new functionality on multiple different levels

Checklist

  • I have updated CHANGELOG.next.toml if I made changes to the smithy-rs codegen or runtime crates
  • I have updated CHANGELOG.next.toml if I made changes to the AWS SDK, generated SDK code, or SDK runtime crates

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

add: base64::encoded_length
add: ChecksumBody
add: ChecksumValidatedBody
update: signer to support streaming payloads with trailers
add: checksum inlineables
add: AwsChunkedEncodingBody + options struct
update: make append_merge_header_maps pub
add: test for response checksum validation
add: ByteStream map method
add: get_parts_mut method to operation::Output
update: support formatting the string type in RustCodegenDecorator.kt
update: S3 models
@Velfi Velfi requested a review from a team as a code owner June 21, 2022 20:03
Copy link
Collaborator

@rcoh rcoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to review this all in one sitting, but will revisit tomorrow. Incredible work on this!

aws/rust-runtime/aws-http/Cargo.toml Outdated Show resolved Hide resolved
.chunk_length
.or(this.options.stream_length)
.unwrap_or_default();
Poll::Ready(Some(Ok(prefix_with_chunk_size(data, total_chunk_size))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this adds a significant amount of copying—how hard would it be to refactor it to send the chunk size and the chunk separately?

I'm also a little confused about how this works—we never switch back into WritingChunkSize AFAICT

aws/rust-runtime/aws-http/src/content_encoding.rs Outdated Show resolved Hide resolved
aws/rust-runtime/aws-http/src/content_encoding.rs Outdated Show resolved Hide resolved
aws/rust-runtime/aws-inlineable/src/body_with_checksum.rs Outdated Show resolved Hide resolved
aws/rust-runtime/aws-inlineable/src/body_with_checksum.rs Outdated Show resolved Hide resolved
aws/rust-runtime/aws-inlineable/src/body_with_checksum.rs Outdated Show resolved Hide resolved
update: ChecksumValidatedBody::poll_inner to log at the appropriate level
rename: checksum_header_name_to_checksum_algorithm to header_name_to_algorithm
rename: checksum_algorithm_to_checksum_header_name to algorithm_to_header_name
fix: use new ClientCodegenContext instead of CodegenContext
fix: prefer static over const declaration for `HeaderName`s
fix: add missing param to TestProtocolTraitImplGenerator.generateTraitImpls
add: tests for wrapped body retryability
refactor: swap pin-project for pin-project-lite
refactor: swap tempfile for temp-file in integration tests
@Velfi Velfi requested a review from a team as a code owner June 23, 2022 18:48
@github-actions
Copy link

A new generated diff is ready to view.

A new doc preview is ready to view.

@github-actions
Copy link

A new generated diff is ready to view.

A new doc preview is ready to view.

Comment on lines +232 to +238
*this.state = AwsChunkedBodyState::WritingChunk;
let total_chunk_size = this
.options
.chunk_length
.or(this.options.stream_length)
.unwrap_or_default();
Poll::Ready(Some(Ok(prefix_with_chunk_size(data, total_chunk_size))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code seems to straddle two lines between supporting multiple chunks in some ways but not in others–it makes it pretty confusing to follow. I think we should maybe remove chunk_size as a parameter entirely?

Comment on lines +31 to +33
/// Test connection for the movies IT
/// headers are signed with actual creds, at some point we could replace them with verifiable test
/// credentials, but there are plenty of other tests that target signing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of date comment

Comment on lines +22 to +27
// static INIT_LOGGER: std::sync::Once = std::sync::Once::new();
// fn init_logger() {
// INIT_LOGGER.call_once(|| {
// tracing_subscriber::fmt::init();
// });
// }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this useful?

Comment on lines +306 to +324
async fn collect_body_into_string(mut body: aws_smithy_http::body::SdkBody) -> String {
use bytes::Buf;
use bytes_utils::SegmentedBuf;
use http_body::Body;
use std::io::Read;

let mut output = SegmentedBuf::new();
while let Some(buf) = body.data().await {
output.push(buf.unwrap());
}

let mut output_text = String::new();
output
.reader()
.read_to_string(&mut output_text)
.expect("Doesn't cause IO errors");

output_text
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use str::from_utf8(&body.collect().into_bytes())

Comment on lines +106 to +132
fn poll_trailers(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Result<Option<HeaderMap<HeaderValue>>, Self::Error>> {
let this = self.project();
match (
this.checksum.headers(),
http_body::Body::poll_trailers(this.inner, cx),
) {
// If everything is ready, return trailers, merging them if we have more than one map
(Ok(outer_trailers), Poll::Ready(Ok(inner_trailers))) => {
let trailers = match (outer_trailers, inner_trailers) {
// Values from the inner trailer map take precedent over values from the outer map
(Some(outer), Some(inner)) => Some(append_merge_header_maps(inner, outer)),
// If only one or neither produced trailers, just combine the `Option`s with `or`
(outer, inner) => outer.or(inner),
};
Poll::Ready(Ok(trailers))
}
// If the inner poll is Ok but the outer body's checksum callback encountered an error,
// return the error
(Err(e), Poll::Ready(Ok(_))) => Poll::Ready(Err(e)),
// Otherwise return the result of the inner poll.
// It may be pending or it may be ready with an error.
(_, inner_poll) => inner_poll,
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really cool! (how checksum body uses structured trailers then AWS chunked writes the trailers into the body)

Comment on lines +250 to +252
pub fn checksum_mismatch(expected: Bytes, actual: Bytes) -> Self {
Self::ChecksumMismatch { expected, actual }
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the Display impl, I wonder if we should have a max length on these fields

Comment on lines +79 to +101
pub trait Checksum: Send + Sync {
/// Given a slice of bytes, update this checksum's internal state.
fn update(&mut self, bytes: &[u8]) -> Result<(), BoxError>;
/// Either return this checksum as a `HeaderMap` containing one HTTP header, or return an error
/// describing why checksum calculation failed.
fn headers(&self) -> Result<Option<HeaderMap<HeaderValue>>, BoxError>;
/// Return the `HeaderName` used to represent this checksum algorithm
fn header_name(&self) -> HeaderName;
/// "Finalize" this checksum, returning the calculated value as `Bytes` or an error that
/// occurred during checksum calculation. To print this value in a human-readable hexadecimal
/// format, you can print it using Rust's builtin [formatter].
///
/// _**NOTE:** typically, "finalizing" a checksum in Rust will take ownership of the checksum
/// struct. In this method, we clone the checksum's state before finalizing because checksums
/// may be used in a situation where taking ownership is not possible._
///
/// [formatter]: https://doc.rust-lang.org/std/fmt/trait.UpperHex.html
fn finalize(&self) -> Result<Bytes, BoxError>;
/// Return the size of this checksum algorithms resulting checksum, in bytes. For example, the
/// CRC32 checksum algorithm calculates a 32 bit checksum, so a CRC32 checksum struct
/// implementing this trait method would return 4.
fn size(&self) -> u64;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I like this design! Have we already gone back and forth on this? I might consider splitting out HttpChecksum as `CheckSum + headers/header_name

@@ -6,7 +6,7 @@
use bytes::Bytes;
use http::{HeaderMap, HeaderValue};
use http_body::{Body, SizeHint};
use pin_project::pin_project;
use pin_project_lite::pin_project;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for doing this!!

Comment on lines +145 to +163
pin_project! {
#[doc = "Stream of binary data"]
#[doc = ""]
#[doc = "`ByteStream` wraps a stream of binary data for ease of use."]
#[doc = ""]
#[doc = "## Getting data out of a `ByteStream`"]
#[doc = ""]
#[doc = "`ByteStream` provides two primary mechanisms for accessing the data:"]
#[doc = "1. With `.collect()`:"]
#[doc = ""]
#[doc = " [`.collect()`](crate::byte_stream::ByteStream::collect) reads the complete ByteStream into memory and stores it in `AggregatedBytes`,"]
#[doc = " a non-contiguous ByteBuffer."]
#[doc = " ```no_run"]
#[doc = " use aws_smithy_http::byte_stream::{ByteStream, AggregatedBytes};"]
#[doc = " use aws_smithy_http::body::SdkBody;"]
#[doc = " use bytes::Buf;"]
#[doc = " async fn example() {"]
#[doc = " let stream = ByteStream::new(SdkBody::from(\"hello! This is some data\"));"]
#[doc = " // Load data from the stream into memory:"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, are you sure this is required here? There are other places where we use pin_project_lite and doc comments:

https://github.com/awslabs/smithy-rs/blob/a0539e20b069a7de021c84521d8f3c7ba098ad6d/rust-runtime/aws-smithy-async/src/future/fn_stream.rs#L16

@Velfi Velfi marked this pull request as draft June 27, 2022 17:33
@Velfi
Copy link
Contributor Author

Velfi commented Jun 27, 2022

I converted this into a draft and will split it into several smaller PRs.

@Velfi Velfi mentioned this pull request Jul 20, 2022
2 tasks
@Velfi
Copy link
Contributor Author

Velfi commented Jul 21, 2022

Closing this in favor of #1561

@Velfi Velfi closed this Jul 21, 2022
@Velfi Velfi deleted the feature/flexible-checksums branch July 21, 2022 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants