-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set length of avro and rc input file after memory input file is created #23667
Set length of avro and rc input file after memory input file is created #23667
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
if (estimatedFileSize < BUFFER_SIZE.toBytes()) { | ||
try (TrinoInputStream input = inputFile.newStream()) { | ||
byte[] data = input.readAllBytes(); | ||
inputFile = new MemoryInputFile(path, Slices.wrappedBuffer(data)); | ||
} | ||
} | ||
length = min(inputFile.length() - start, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check if similar change is needed in LinePageSourceFactory and RcFilePageSourceFactory as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add a test for this ?
E.g. we test filesystem with AWS encryption at io.trino.filesystem.s3.TestS3FileSystemAwsS3WithEncryption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking with @pettyjamesm, I will need to make a similar change for RcFilePageSourceFactory
and LinePageSourceFactory
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raunaqmorarka After talking with @pettyjamesm, I don't think we can add tests for this right now since there is no test class that performs client side encryption/decryption of input files. I think the most appropriate place to put these tests would be in the TestHiveFileFormat
test class as well. We can probably add a test for this after client side encryption has been supported in native S3 file system. On another note, I am not sure if other file systems support client side encryption.
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
98134ac
to
c5280b9
Compare
This comment was marked as outdated.
This comment was marked as outdated.
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
c5280b9
to
9620fea
Compare
0efa935
to
19995ec
Compare
When a memory input file is created for avro, rc, and line readers, we need to update the length that will be passed in to the reader since the length of the memory input file can possibly be less than the original input file length.
19995ec
to
69adc35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, thanks @mwong77.
Note: this change is not user visible at the moment (and therefore should require no release notes) because it requires client side encryption support which Trino does not (yet) have.
Description
In AWS Athena, we currently support encrypting S3 objects using CSE-KMS. The size of these encrypted object will be greater than the size when unencrypted since padding will be added. The
input.readAllBytes();
function strips the padding from an encrypted object so we have to update thelength
variable as well if we create a new in memory input file.Additional context and related issues
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:
@nineinchnick @anusudarsan