Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add configurable checksumming for blob uploads and downloads #246

Merged
merged 14 commits into from
Aug 26, 2020

Conversation

andrewsg
Copy link
Contributor

No description provided.

@andrewsg andrewsg requested review from crwilcox and frankyn August 14, 2020 19:27
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 14, 2020
response = upload.transmit_next_chunk(transport, timeout=timeout)
except resumable_media.DataCorruption:
# Attempt to delete the corrupted object.
self.delete()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails for any reason, both exceptions will be reported, using the Python exception pattern "During handling of the above exception, another exception occurred"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What occurs here if a valid previous generation of the blob exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the default case, if versioning is off, there is no way to recover the older version. Unfortunately this can only be resolved by backend changes to make server-side checksumming possible for resumable uploads.

In the case where versioning is on, the previous generation still exists, but is not live. This is consistent with behavior in nodejs, which I modeled this code after. We could potentially try to roll back and make the previous version live, but we should also modify nodejs and other languages to be consistent across languages. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more of a documentation issue until there's a better fix in the backend to support checksums better.

I'd prefer consistency here with clear acknowledgment that these cases can occur when using checksumming. It's not done automatically so it doesn't change behavior for uploads.

:type checksum: str
:param checksum:
(Optional) The type of checksum to compute to verify
the integrity of the object. If the upload is completed in a single
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "if the upload is completed in a single request" / "if the upload is too large" is undeniably ugly and hopefully we can implement server-side handling of resumable upload checksums to resolve that soon

@andrewsg andrewsg changed the title Add configurable checksumming for blob uploads and downloads feat: add configurable checksumming for blob uploads and downloads Aug 14, 2020
@andrewsg
Copy link
Contributor Author

Recommend hold off on merging until google-resumable-media releases to 1.0 (and bump up the dependency version number in setup.py here)

tests/system/test_system.py Outdated Show resolved Hide resolved
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Aug 21, 2020
tests/system/test_system.py Show resolved Hide resolved
response = upload.transmit_next_chunk(transport, timeout=timeout)
except resumable_media.DataCorruption:
# Attempt to delete the corrupted object.
self.delete()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more of a documentation issue until there's a better fix in the backend to support checksums better.

I'd prefer consistency here with clear acknowledgment that these cases can occur when using checksumming. It's not done automatically so it doesn't change behavior for uploads.

@tseaver
Copy link
Contributor

tseaver commented Aug 24, 2020

googleapis/google-resumable-media-python#142 is the PR for bumping GRMP to 1.0.

@andrewsg
Copy link
Contributor Author

@tseaver @frankyn PTAL

@tseaver Has the PR you linked made it to release yet? If it's released I'll use this PR to also update setup.py to use the new version.

@andrewsg
Copy link
Contributor Author

Bumped version number. This should be good to merge as soon as final review comes through. Thanks all

Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @andrewsg!

@andrewsg
Copy link
Contributor Author

@tseaver PTAL soon if you are available, I've been asked to release. Thanks!

@tseaver
Copy link
Contributor

tseaver commented Aug 26, 2020

@tseaver tseaver merged commit 23b7d1c into master Aug 26, 2020
@tseaver tseaver deleted the crc32c branch August 26, 2020 19:20
cojenco pushed a commit to cojenco/python-storage that referenced this pull request Oct 13, 2021
…oogleapis#246)

Co-authored-by: Tres Seaver <tseaver@palladion.com>
Co-authored-by: Frank Natividad <frankyn@users.noreply.github.com>
cojenco pushed a commit to cojenco/python-storage that referenced this pull request Oct 13, 2021
…oogleapis#246)

Co-authored-by: Tres Seaver <tseaver@palladion.com>
Co-authored-by: Frank Natividad <frankyn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants