-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3: fix for corrupted cache on multipart uploads #1867
Conversation
I think some test would be appropriate for this one, but I guess we need test s3 instance. Am I right on this one @efiop ? |
@pared Indeed, we've agreed that I'll add a test for it, since our s3 testing is not transparent and requires access to our s3 account. I'll send one ASAP. |
The etag of multipart objects depends of the number of parts, when copying to the cache we should do so in the same number of parts that the original object was moved/uploaded in. Fixes part of #1410
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Specifically because of this getmoto/moto#1941 Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Turned out to be quite buggy. Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
@olveirap Adjusted your patch to support multiparts with variable chunk size and added a test for that to our test suit. We will release this during this week. Thank you so much! 🙂 |
* master: (45 commits) s3: fix for corrupted cache on multipart uploads (iterative#1867) Update PULL_REQUEST_TEMPLATE.md Create PULL_REQUEST_TEMPLATE.md Delete pull_request_template.md github: add PR template with a small checklist ignore: introduce .dvcignore, apply when searching for stage files add: dont use 'file' as var name during iteration test: move corrupted hardlink cache test test_repro -> test_run test_repro: make cache use hardlink cache: make reflink, copy default links diff: cli: add documentation link Update README.rst tests: be paranoic with caplog dvc: bump to 0.35.7 dvc: deprecate --remove-outs logger: use exception instead of error version: show when the release version is dirty 'till'=>'until' in `b_ref` arg. help text Adds space after period. (See previous commit.) Changes ',' for '.' in `-t` help text. ...
@olveirap Just a heads up: 0.40.0 with this fix is out. 🙂 |
Added a check for when the external output of a script to s3 is multipart, and setting the multipart chunksize to the correct one to recover the same ETag.
Also set the multipart threshold to ContentLength + 1 of the object if it's single part to avoid the copy to do it in multiple parts.
Added an exception for when the output object's ETag doesn't match the one copied to the cache.
Patch for s3 in #1410. Multipart uploads of variable size not supported