api: Support CopyObject for all sizes #617

donatello · 2017-02-23T14:52:33Z

High-level CopyObject requirements:

Support copying objects of all sizes.
Support source objects with an arbitrary range header (i.e. any valid start and end offset of a source object) via multipart copy object.
Support all copy conditions (this is already supported).

Currently, the library supports only copying objects <= 5GiB in size. Larger objects can also be copied via a multipart-copy-object strategy.

The multipart copy object operation consists of starting a new multipart upload, followed by 1 or more copy-object-part requests, and finally a complete-multipart request.

Note that copy object via a single PUT request does not support range headers, but copy-object-part does support this.

This feature was recently implemented in the Haskell SDK.

harshavardhana · 2017-02-23T18:52:16Z

Support source objects with an arbitrary range header (i.e. any valid start and end offset of a source object) via multipart copy object.

Why should this be supported? - what benefit does this provide a user? also at what junction a user really knows that they need to copy only a certain range of the object.

Since we cannot append on the destination i don't see how this API behavior benefits anyone.

hashbackup · 2017-02-23T19:18:51Z

HashBackup could put a multipart ranged copy to good use. HB packs files into arc files during the backup. These default to 100MB but can be larger, like 4GB. Over time, arc files get "holes" poked in them as files are deleted due to retention policies.

For example, you backup a 75MB file and a 25MB file into 1 arc file and store it on S3. The first file is marked deleted. To actually recover space, the 100MB arc file has to be downloaded, packed, and uploaded. The download is where high costs are incurred.

By using a series of mulipart copy requests, this packing operation could be done remotely without requiring a download. I think the only cost would be the request cost: I couldn't see where Amazon charges fees for copy based on the size of the data.

(Just realized this is for the Go binding, and I'm using the Python binding)

harshavardhana · 2017-02-23T19:21:31Z

Yes but minio libraries are not meant for exposing lower level multipart operations. For that you should use AWS SDKs or copy minio library source into your repo.

I don't see why we should explore range APIs while not exposing multipart APIs underneath.

hashbackup · 2017-02-23T20:15:06Z

A range list of start-end offsets could be added to copy_object without exposing multipart.

donatello · 2017-02-24T09:27:30Z

Why should this be supported? - what benefit does this provide a user? also at what junction a user really knows that they need to copy only a certain range of the object.

Since we cannot append on the destination i don't see how this API behavior benefits anyone.

@harshavardhana Here is my reasoning about this:

Adding range-header is a simple extension to copy-object in the same spirit of put-object that handles all sizes transparently.
API-wise, adding a single range, is similar to get-object-partial that lets a user download a part of an object. For copy-object it just lets a user copy a part (i.e. a single contiguous segment) of an object into a new object.
The range-header just adds another possible source for copy-object and it is a simple extension - the logic to create a large object (>5GiB) already needs to use range-headers in the lower level copy-object-part API. It takes very little code to allow the caller of (high-level) copy-object to specify a start-end offset.
The input to this high-level API is the same as for copy-object-part (low-level) API, i.e. source object, optional range-offsets (only start and end), and optional copy-conditions.

When discussing this with @balamurugana - he gave the idea to do an even more general API that accepts multiple source objects with one or more start-end offset pairs for each source object, that can be used to create a single object on the server side using only copy-object. He believed that is a useful operation for working with related objects that are created separately and finally need to be stitched together (e.g. large video production/rendering applications, and @hashbackup's application above). This was going to be my next proposal.

hashbackup · 2017-02-24T13:56:10Z

A negative aspect of exposing ranges is that it might not actually work as expected. After reading about copy object with ranges on S3, it seems that each range must be at least 5M, because it uses the multipart API. So if a user says to copy bytes 0-5 and bytes 20-30, what should happen? You could get very general and do a download, create a temp file with only the bytes needed, then upload it as a new file, but seems to be way out of scope for minio, and whether/how to do that would be very dependent on the storage service's capabilities.

harshavardhana · 2017-04-12T23:11:39Z

Moving this as blocked to discuss with @abperiasamy

harshavardhana · 2017-06-27T07:52:13Z

BTW this is not blocked anymore @deekoder

donatello · 2017-07-06T00:28:32Z

CopyObject now supports objects of all sizes, copy-conditions, source object ranges, server-side-encryption with decryption of source and encryption of destination, and copying/setting user-metadata on the destination.

In addition, the ComposeObject function is added, which enables creating objects from multiple source objects by providing a concatenation specification.

These changes are available in version 3.0.0 onwards.

This was referenced Feb 23, 2017

Support CopyObject for all sizes minio/minio-java#528

Closed

Support copy_object for all sizes minio/minio-py#485

Closed

Support copyObject for all sizes minio/minio-js#552

Closed

Support CopyObject for all sizes minio/minio-dotnet#121

Closed

harshavardhana changed the title ~~Support CopyObject for all sizes~~ api: Support CopyObject for all sizes Mar 10, 2017

harshavardhana added the community label Mar 10, 2017

harshavardhana assigned donatello Mar 10, 2017

deekoder added this to the Future milestone Apr 6, 2017

deekoder added the priority: medium label Apr 6, 2017

donatello mentioned this issue Apr 10, 2017

Add large-object support for CopyObject API #644

Closed

harshavardhana added the blocked label Apr 12, 2017

deekoder assigned donatello and unassigned donatello Jun 1, 2017

deekoder removed the blocked label Jun 27, 2017

donatello closed this as completed Jul 6, 2017

harshavardhana added the fixed label Jul 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api: Support CopyObject for all sizes #617

api: Support CopyObject for all sizes #617

donatello commented Feb 23, 2017 •

edited by harshavardhana

Loading

harshavardhana commented Feb 23, 2017 •

edited

Loading

hashbackup commented Feb 23, 2017 •

edited

Loading

harshavardhana commented Feb 23, 2017 •

edited

Loading

hashbackup commented Feb 23, 2017

donatello commented Feb 24, 2017

hashbackup commented Feb 24, 2017

harshavardhana commented Apr 12, 2017

harshavardhana commented Jun 27, 2017

donatello commented Jul 6, 2017

api: Support CopyObject for all sizes #617

api: Support CopyObject for all sizes #617

Comments

donatello commented Feb 23, 2017 • edited by harshavardhana Loading

harshavardhana commented Feb 23, 2017 • edited Loading

hashbackup commented Feb 23, 2017 • edited Loading

harshavardhana commented Feb 23, 2017 • edited Loading

hashbackup commented Feb 23, 2017

donatello commented Feb 24, 2017

hashbackup commented Feb 24, 2017

harshavardhana commented Apr 12, 2017

harshavardhana commented Jun 27, 2017

donatello commented Jul 6, 2017

donatello commented Feb 23, 2017 •

edited by harshavardhana

Loading

harshavardhana commented Feb 23, 2017 •

edited

Loading

hashbackup commented Feb 23, 2017 •

edited

Loading

harshavardhana commented Feb 23, 2017 •

edited

Loading