Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File metadata is lost during multipart S3 copy #367

Closed
gribbet opened this issue Feb 27, 2015 · 5 comments
Closed

File metadata is lost during multipart S3 copy #367

gribbet opened this issue Feb 27, 2015 · 5 comments
Assignees
Labels
service-api This issue is due to a problem in a service API, not the SDK implementation.

Comments

@gribbet
Copy link

gribbet commented Feb 27, 2015

Original metadata is always dropped during copy for files larger than 5GB (where multipart copy is required). For smaller files the behavior is correct.
CopyCallable.initiateMultipartUpload is always setting NewObjectMetadata on the CopyObjectRequest so the original data is destroyed.
In my particular case I have a Content-Disposition header that does not get copied.

@manikandanrs
Copy link
Contributor

@gribbet

For all objects under the threshold limit, we use one single call (PUT Object Copy API) that by default copies the metadata from source to destination ignoring a few specific headers

However for multipart copies, the metadata needs to be set in the InitiateMultipart request and the SDK cannot determine what metadata needs to be copied from the source. There are encryption related headers that cannot be copied and needs to be explicitly specified by the user in the request.

Is it feasible for you to explicitly set the metadata in request ? If not can you specify the use case ?

@gribbet
Copy link
Author

gribbet commented Feb 27, 2015

The API should probably not handle metadata differently depending on file size. It is confusing behavior. This issue was quite difficult to track down.
Note AWS SDKs in other languages (eg. Python) don't have this issue.
Perhaps the newObjectMetadata on line 255 should be set to the existing metadata rather than creating a new one? I don't understand why "the SDK cannot determine what metadata needs to be copied from source". How about all of it except for encryption headers?

Yes, a reasonable workaround that we have already implemented is to query and explicitly set the existing metadata for large files.

@kiiadi kiiadi self-assigned this Aug 9, 2016
@kiiadi
Copy link
Contributor

kiiadi commented Aug 11, 2016

@gribbet apologies for the extended delay in getting back to you on this issue. Unfortunately as @manikandanrs said this is an issue with the S3 service rather than the Java SDK. The Python SDK actually has a similar issue (see aws/aws-cli#1145).

The handling of metadata on a single copy request is actually done by S3 itself (via the x-amz-metadata-directive header) see S3 Copy docs for more info. S3 handles this because certain metadata is intended not to be persisted across copy (e.g. storage class / server-side encryption). This "black-list" of meta-data is maintained by S3 and is subject to change - and therefore it doesn't really make sense for us to do this filtering in the SDK itself.

Unfortunately S3 does not support the x-amz-metadata-directive header on InitiateMultipartTransfer or CopyPart requests. I've raised this to the service team and will come back on this issue when I hear back from them.

@kiiadi
Copy link
Contributor

kiiadi commented Aug 31, 2016

@gribbet I contacted the S3 service team and they are aware of the inconsistency - it's possible that they'll fix it in a future version of the service. However given there is a workaround there are higher priority issues to resolve. Given this is not a Java SDK specific problem I'm going to close this issue. I will communicate back when the service team resolves this inconsistency in multi-part copy.

@akotranza
Copy link

Since this seems as though it will never get fixed why not write your own s3 sync function that preserves metdata 🙄

Here's a really ugly one in node 8.x, hopefully this helps someone
https://gist.github.com/akotranza/51f452f975469e1fa78c2748dd115c87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service-api This issue is due to a problem in a service API, not the SDK implementation.
Projects
None yet
Development

No branches or pull requests

5 participants