-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: Support CopyObject for all sizes #617
Comments
Why should this be supported? - what benefit does this provide a user? also at what junction a user really knows that they need to copy only a certain range of the object. Since we cannot append on the destination i don't see how this API behavior benefits anyone. |
HashBackup could put a multipart ranged copy to good use. HB packs files into arc files during the backup. These default to 100MB but can be larger, like 4GB. Over time, arc files get "holes" poked in them as files are deleted due to retention policies. For example, you backup a 75MB file and a 25MB file into 1 arc file and store it on S3. The first file is marked deleted. To actually recover space, the 100MB arc file has to be downloaded, packed, and uploaded. The download is where high costs are incurred. By using a series of mulipart copy requests, this packing operation could be done remotely without requiring a download. I think the only cost would be the request cost: I couldn't see where Amazon charges fees for copy based on the size of the data. (Just realized this is for the Go binding, and I'm using the Python binding) |
Yes but minio libraries are not meant for exposing lower level multipart operations. For that you should use AWS SDKs or copy minio library source into your repo. I don't see why we should explore range APIs while not exposing multipart APIs underneath. |
A range list of start-end offsets could be added to copy_object without exposing multipart. |
@harshavardhana Here is my reasoning about this:
When discussing this with @balamurugana - he gave the idea to do an even more general API that accepts multiple source objects with one or more start-end offset pairs for each source object, that can be used to create a single object on the server side using only copy-object. He believed that is a useful operation for working with related objects that are created separately and finally need to be stitched together (e.g. large video production/rendering applications, and @hashbackup's application above). This was going to be my next proposal. |
A negative aspect of exposing ranges is that it might not actually work as expected. After reading about copy object with ranges on S3, it seems that each range must be at least 5M, because it uses the multipart API. So if a user says to copy bytes 0-5 and bytes 20-30, what should happen? You could get very general and do a download, create a temp file with only the bytes needed, then upload it as a new file, but seems to be way out of scope for minio, and whether/how to do that would be very dependent on the storage service's capabilities. |
Moving this as blocked to discuss with @abperiasamy |
BTW this is not blocked anymore @deekoder |
CopyObject now supports objects of all sizes, copy-conditions, source object ranges, server-side-encryption with decryption of source and encryption of destination, and copying/setting user-metadata on the destination. In addition, the ComposeObject function is added, which enables creating objects from multiple source objects by providing a concatenation specification. These changes are available in version 3.0.0 onwards. |
High-level CopyObject requirements:
Currently, the library supports only copying objects <= 5GiB in size. Larger objects can also be copied via a multipart-copy-object strategy.
The multipart copy object operation consists of starting a new multipart upload, followed by 1 or more copy-object-part requests, and finally a complete-multipart request.
Note that copy object via a single PUT request does not support range headers, but copy-object-part does support this.
This feature was recently implemented in the Haskell SDK.
The text was updated successfully, but these errors were encountered: