Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set length of avro and rc input file after memory input file is created #23667

Merged
merged 1 commit into from
Oct 25, 2024

Conversation

mwong77
Copy link
Contributor

@mwong77 mwong77 commented Oct 3, 2024

Description

In AWS Athena, we currently support encrypting S3 objects using CSE-KMS. The size of these encrypted object will be greater than the size when unencrypted since padding will be added. The input.readAllBytes(); function strips the padding from an encrypted object so we have to update the length variable as well if we create a new in memory input file.

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@nineinchnick @anusudarsan

Copy link

cla-bot bot commented Oct 3, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added the hive Hive connector label Oct 3, 2024
if (estimatedFileSize < BUFFER_SIZE.toBytes()) {
try (TrinoInputStream input = inputFile.newStream()) {
byte[] data = input.readAllBytes();
inputFile = new MemoryInputFile(path, Slices.wrappedBuffer(data));
}
}
length = min(inputFile.length() - start, length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if similar change is needed in LinePageSourceFactory and RcFilePageSourceFactory as well ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test for this ?
E.g. we test filesystem with AWS encryption at io.trino.filesystem.s3.TestS3FileSystemAwsS3WithEncryption

Copy link
Contributor Author

@mwong77 mwong77 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking with @pettyjamesm, I will need to make a similar change for RcFilePageSourceFactory and LinePageSourceFactory as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raunaqmorarka After talking with @pettyjamesm, I don't think we can add tests for this right now since there is no test class that performs client side encryption/decryption of input files. I think the most appropriate place to put these tests would be in the TestHiveFileFormat test class as well. We can probably add a test for this after client side encryption has been supported in native S3 file system. On another note, I am not sure if other file systems support client side encryption.

Copy link

cla-bot bot commented Oct 16, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

This comment was marked as outdated.

@ebyhr
Copy link
Member

ebyhr commented Oct 23, 2024

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Oct 23, 2024
Copy link

cla-bot bot commented Oct 23, 2024

The cla-bot has been summoned, and re-checked this pull request!

@mwong77 mwong77 changed the title Set length of avro input file after memory input file is created Set length of avro/rc input file after memory input file is created Oct 25, 2024
@mwong77 mwong77 changed the title Set length of avro/rc input file after memory input file is created Set length of avro and rc input file after memory input file is created Oct 25, 2024
@mwong77 mwong77 force-pushed the fix-avro-reader-file-length branch 2 times, most recently from 0efa935 to 19995ec Compare October 25, 2024 17:59
When a memory input file is created for avro, rc, and line readers, we
need to update the length that will be passed in to the reader since the
length of the memory input file can possibly be less than the original
input file length.
Copy link
Member

@pettyjamesm pettyjamesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, thanks @mwong77.

Note: this change is not user visible at the moment (and therefore should require no release notes) because it requires client side encryption support which Trino does not (yet) have.

@pettyjamesm pettyjamesm merged commit 93fb674 into trinodb:master Oct 25, 2024
56 checks passed
@github-actions github-actions bot added this to the 464 milestone Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector
Development

Successfully merging this pull request may close these issues.

5 participants