Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement R2 multipart upload bindings #486

Merged
merged 2 commits into from
Feb 14, 2023
Merged

Implement R2 multipart upload bindings #486

merged 2 commits into from
Feb 14, 2023

Conversation

mrbbot
Copy link
Contributor

@mrbbot mrbbot commented Jan 30, 2023

This PR implements support for writing data to R2 buckets in multiple parts. See https://developers.cloudflare.com/r2/data-access/workers-api/workers-multipart-usage/. Miniflare's implementation is briefly described at the top of packages/r2/src/multipart.ts.

Unfortunately, Miniflare 2's storage abstraction is not very good at handling large data (doesn't support streaming reads/writes), or complex read/write operations (doesn't support transactions). This limits the reliability of this multipart implementation, but it should still be useful for testing. We should aim to improve this in Miniflare 3.

Whilst implementing this feature, I noticed a few quirks with R2:

  • httpMetadata and customMetadata are always {} in the R2Object returned from R2MultipartUpload#complete()
  • etags for completed multipart uploads with the same part contents are not the same. I'd expect these to be deterministic.

@changeset-bot
Copy link

changeset-bot bot commented Jan 30, 2023

⚠️ No Changeset found

Latest commit: 4976419

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@Frederik-Baetens
Copy link
Contributor

httpMetadata and customMetadata are always {} in the R2Object returned from R2MultipartUpload#complete()

Ah I know why that is 😅 That's not intentional and I'll get on fixing that

etags for completed multipart uploads with the same part contents are not the same. I'd expect these to be deterministic.

We're aware of this, and we're considering changing how the etag for multipart uploads is calculated to be compatible with how S3 does it, which is deterministic.

Comment on lines +116 to +120
// TODO: R2's multipart ETags don't seem to be deterministic, should ours be?
// https://stackoverflow.com/a/19896823
const hash = crypto.createHash("md5");
for (const md5Hex of md5Hexes) hash.update(Buffer.from(md5Hex, "hex"));
return `${hash.digest("hex")}-${md5Hexes.length}`;
Copy link
Contributor

@Frederik-Baetens Frederik-Baetens Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that this is how S3 calculates their etags, we might move to this way of calculating them ourselves in this manner too in the future.

Copy link
Contributor

@Frederik-Baetens Frederik-Baetens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impressive work :) I mostly took a look at the r2/src changes I hope that covers most of the user facing behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants