-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43535: [C++] Support the AWS S3 SSE-C encryption #43601
Conversation
|
7e1f200
to
1134b07
Compare
@pitrou @mapleFU @felipecrv can you help to review the PR, Thanks |
I took a quick look and here are some assorted thoughts:
|
Also, for the other readers wondering, I've found a useful (though possibly outdated?) comparison of S3 encryption options here: |
@pitrou Thanks for your review, for #1, i would try to add ut for the change, for #2, yes, actually ONLY the Key is needed, that's why i only expose one SetSSECKey function, and three Get function, and private the new added datamember. |
@pitrou I've submitted another commit to only expose the ssec-key in S3Options, and then calculate the MD5 every time on the fly, can you help to review whether there are other comments? I would try to add the ut if there is no interface change required, Thanks |
bb2e024
to
539fa7c
Compare
@pitrou Can you help to review the PR again?Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a careful around this weekend
/// \param expect_input_key_size, default 32 | ||
/// \return true if the decode and calculate MD5 success, otherwise return false | ||
ARROW_EXPORT | ||
bool CalculateSSECKeyMD5(const std::string& base64_encoded_key, std::string& md5_result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind change to Result<std::string>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I would try to refine the return value for the function.
// Decode the Base64-encoded key to get the raw binary key | ||
Aws::Utils::ByteBuffer rawKey = | ||
Aws::Utils::HashingUtils::Base64Decode(base64_encoded_key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do this handles error if base64_encoded_key
is not valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://github.com/aws/aws-sdk-cpp/blob/ac2da09e6930e3988d1289717e2df5d4b7408f17/src/aws-cpp-sdk-core/source/utils/base64/Base64.cpp#L91-L121, seems this aws util API didn't validate the input properly,but directly calculate the output size and allocate the buffer, so i add the check to ensure every character is the valid character here.
Meanwhile, I also add the related Unit test for the CalculateSSECKeyMD5 to test the different input value.
|
||
// the key needs to be // 256 bits(32 bytes)according to | ||
// https://docs.aws.amazon.com/AmazonS3/latest/userguide/ServerSideEncryptionCustomerKeys.html#specifying-s3-c-encryption | ||
if (rawKey.GetLength() != expect_input_key_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this not equal to 256?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the expect_input_key_size is exposed via S3Options, so it's possible for this value to be set as any length.
de05328
to
8649388
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix style by pre-commit run --show-diff-on-failure --color=always --all
?
ASSERT_FALSE(CalculateSSECKeyMD5("").ok()); // invalid base64 | ||
ASSERT_FALSE(CalculateSSECKeyMD5("%^H").ok()); // invalid base64 | ||
ASSERT_FALSE(CalculateSSECKeyMD5("INVALID").ok()); // invalid base64 | ||
ASSERT_FALSE(CalculateSSECKeyMD5("MTIzNDU2Nzg5").ok()); // invalid, the input key size not match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use ASSERT_RAISES_WITH_MESSAGES()
instead?
@@ -443,6 +443,7 @@ class TestS3FS : public S3TestMixin { | |||
// Most tests will create buckets | |||
options_.allow_bucket_creation = true; | |||
options_.allow_bucket_deletion = true; | |||
options_.sse_customer_key = "1WH9aTJ0+Tn0NLbTMHZn9aCW3Li3ViAdBsoIldPCREw="; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to cover the workflow with sse_customer_key nonempty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we always set sse_custom_key
, we can't test no sse_custom_key
case.
Can we add a test that uses sse_custom_key
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, and ideally the SSE-C encryption test would write a file with the encryption key, and check that trying to read it with another key (or with no key at all) fails.
I'll merge this in the next week if nobody objects it. |
…face to set the ca for windows
…add the SSEC related function after 1.9.201
… in linux as the minio with mingw could NOT work well
…the host could success during the TLS handshake
…r http mionio server, rollback to use the random host ip
c18343f
to
452185c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge if CI is green, thanks for the updates @ripplehang !
@github-actions crossbow submit -g cpp |
This comment was marked as outdated.
This comment was marked as outdated.
@github-actions crossbow submit -g cpp |
Revision: ce67a27 Submitted crossbow builds: ursacomputing/crossbow @ actions-7d5b001eaa |
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 00e7c65. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
server-side encryption with customer-provided keys is an important security feature for aws s3, it's useful when user want to manager the encryption key themselves, say, they don't want the data to be exposed to the aws system admin, and ensure the object is safe even the ACCESS_KEY and SECRET_KEY is somehow leaked.
Some comparison of S3 encryption options :
https://www.linkedin.com/pulse/delusion-s3-encryption-benefits-ravi-ivaturi/
What changes are included in this PR?
Add the sse_customer_key member for S3Options to support server-side encryption with customer-provided keys (SSE-C keys).
Add the tls_ca_file_path, tls_ca_dir_path and tls_verify_certificates members for S3Options.
Refine the unit test to start the minio server with self-signed certificate on linux platform, so the unit test could cover the https case on linux, and http case on non-linux platform.
Are these changes tested?
Yes
Are there any user-facing changes?
Only additional members to S3Options.