Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: Update Blob.update_storage_class to support rewrite tokens #6527

Merged

Conversation

erikwebb
Copy link
Contributor

@erikwebb erikwebb commented Nov 15, 2018

Fixes #6524

Test Plan

Test code (test_6524.py) -

from google.cloud import storage

client = storage.Client()

# https://console.cloud.google.com/storage/browser/[bucket-id]/
bucket = client.get_bucket('test_cloud-python_6524')

# Then do other things...
blob = bucket.get_blob('big_file.bin')

# Test with multiple-sized files
blob.update_storage_class('NEARLINE')

Create a 5GB file big_file.bin in the test_pr6524 bucket -

dd if=/dev/urandom of=big_file.bin bs=1k count=5000000
gsutil cp big_file.bin gs://test_cloud-python_6524/big_file.bin

Test using google-cloud-storage==1.13.0 -

$ virtualenv 6524-old
...
$ ./6524-old/bin/pip install -r requirements.txt
$ gsutil ls -L gs://test_cloud-python_6524/big_file.bin | grep 'Storage class:'
    Storage class:          MULTI_REGIONAL
$ ./6524-old/bin/python test_6524.py
Traceback (most recent call last):
  File "test_6524.py", line 10, in <module>
    blob.update_storage_class('NEARLINE')
  File "/usr/local/google/home/erikwebb/.local/lib/python2.7/site-packages/google/cloud/storage/blob.py", line 1497, in update_storage_class
    self._set_properties(api_response['resource'])
KeyError: 'resource'
$ gsutil ls -L gs://test_cloud-python_6524/big_file.bin | grep 'Storage class:'
    Storage class:          MULTI_REGIONAL

Reset object storage class to be safe -

gsutil rewrite -s MULTI_REGIONAL gs://test_cloud-python_6524/big_file.bin

Test using PR #6527 -

$ virtualenv 6524-new
...
$ ./6524-new/bin/pip install google-cloud-python/storage/
...
$ gsutil ls -L gs://test_cloud-python_6524/big_file.bin | grep 'Storage class:'
    Storage class:          MULTI_REGIONAL
$ time ./6524-new/bin/python test_6524.py

real    1m32.537s
user    0m0.750s
sys     0m0.194s
$ gsutil ls -L gs://test_cloud-python_6524/big_file.bin | grep 'Storage class:'
    Storage class:          NEARLINE

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Nov 15, 2018
@tseaver tseaver added api: storage Issues related to the Cloud Storage API. performance labels Nov 15, 2018
@erikwebb
Copy link
Contributor Author

Added test plan to original message

@crwilcox
Copy link
Contributor

Thanks for the PR @erikwebb. @frankyn and I will look it over! Also, sorry for the delay getting to this.

@frankyn
Copy link
Member

frankyn commented Nov 28, 2018

@erikwebb this could potentially be baked into the request and handled for the developer. It will reduce complexity of surface design. WDYT?

@erikwebb
Copy link
Contributor Author

@frankyn IIUC this can't actually be fixed by the developer without using rewrite() instead and handling tokens from there. As is, update_storage_class() does not easily return the token that will allow the continuation to be done.

@frankyn
Copy link
Member

frankyn commented Nov 28, 2018

Apologies, what I meant is handling the token management within the method when a developer calls it instead of exposing the token back to the developer.

The client library method would instead take care of handling the rewrite token and use it to continue the rewrite operation.

@erikwebb
Copy link
Contributor Author

@frankyn Oh okay, I see what you mean now. That's definitely another way to solve this. I have 2 different thoughts on that -

  1. This is definitely more straightforward to the user. update_storage_class() is clearly a helper function, rather than an API call, so it makes sense to make it as helpful as possible.
  2. This changes the semantics of the rewrite API call and could also potentially make this function appear to "hang," because the chunked operation is no longer exposed to the user.

I'm not sure I have a strong opinion. If we choose this path, then we need to call out the potential long execution of the function in docs or maybe adding an additional helper function that allows users to choose their behavior.

@crwilcox
Copy link
Contributor

I vote for internalizing the token handling. I think a method taking a while for a large file is expected enough for a user. The greatest benefit of the approach @frankyn is suggesting is we can address the issue while keeping the public interface in tact.

@erikwebb
Copy link
Contributor Author

Thanks, @crwilcox. If @frankyn agrees, then I'll basically move the code in the OP into update_storage_class().

@frankyn
Copy link
Member

frankyn commented Nov 29, 2018

Let's move forward with moving in rewrite token handling into the method.

To address the concern of the blocking call, could you add documentation stating it will block until the operation is complete with a short explanation? Additionally, it should have a link to the rewrite method if the user wants more control of the rewrite token.

@erikwebb
Copy link
Contributor Author

erikwebb commented Nov 29, 2018 via email

@crwilcox
Copy link
Contributor

crwilcox commented Dec 1, 2018

@erikwebb FYI: The linting for this recently changed and we formatted all of the code using black. This caused the conflicts you see. They should be straightforward to resolve though. Most changes are just whitespace or quote style.

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

@googlebot googlebot added cla: no This human has *not* signed the Contributor License Agreement. and removed cla: yes This human has signed the Contributor License Agreement. labels Dec 3, 2018
@erikwebb erikwebb force-pushed the blob-update_storage_class-fix-6524 branch from 4a7c750 to e6a795a Compare December 3, 2018 19:23
@googlebot
Copy link

CLAs look good, thanks!

@googlebot googlebot added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Dec 3, 2018
@erikwebb
Copy link
Contributor Author

erikwebb commented Dec 3, 2018

Adjusted diff to comments, resuming testing now.

@tseaver tseaver added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 3, 2018
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 3, 2018
@tseaver
Copy link
Contributor

tseaver commented Dec 3, 2018

I tagged this as do not merge and kokoro-force-run because the tests weren't running for whatever reason, and I suspect that we have a coverage issue (if not outright test failures) due to the changed implementation without corresponding unit test changes.

In addition to new / updated unit tests, the PR probably needs a new system test as well (I don't think we exercise Blob.update_storage_class in the system tests yet).

@frankyn frankyn requested a review from tseaver December 3, 2018 20:40
@frankyn
Copy link
Member

frankyn commented Dec 3, 2018

Thanks @tseaver I missed that.

@tseaver
Copy link
Contributor

tseaver commented Dec 3, 2018

Yep, the tests need to be updated to match / exercise the new implementation.

@erikwebb
Copy link
Contributor Author

erikwebb commented Dec 5, 2018

I'll add a new system test (both for a small and large file) while I'm working on this. Thanks for the guidance!

@erikwebb
Copy link
Contributor Author

erikwebb commented Dec 6, 2018

Quick question - I don't see any methods that check the metadata of an uploaded blob. I'd like to reinitialize a Blob object based on the metadata in GCS, but I don't see a way to do that (in the constructor or otherwise). exists() exercises the right call, but doesn't preserve the retrieved data anywhere. Am I missing something?

If not, I'll add it for the benefit of my test in one of two ways - class method that returns a fully populated Blob object or add a parameter to __init__ that loads parameters.

@tseaver
Copy link
Contributor

tseaver commented Dec 6, 2018

@erikwebb I think Blob.reload is the method you need.

@erikwebb
Copy link
Contributor Author

erikwebb commented Dec 6, 2018

@erikwebb I think Blob.reload is the method you need.

Thanks, @tseaver! I didn't realize it was in a mixin.

@erikwebb erikwebb force-pushed the blob-update_storage_class-fix-6524 branch from e6a795a to 7b4fbee Compare December 6, 2018 19:40
@erikwebb erikwebb force-pushed the blob-update_storage_class-fix-6524 branch from 7b4fbee to a3cecc9 Compare December 6, 2018 19:49
@tseaver tseaver added kokoro:force-run Add this label to force Kokoro to re-run the tests. and removed kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Dec 7, 2018
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 7, 2018
@erikwebb erikwebb force-pushed the blob-update_storage_class-fix-6524 branch from a3cecc9 to 915ee59 Compare December 7, 2018 22:09
@tseaver tseaver added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 12, 2018
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 12, 2018
while True:
token, _, _ = self.rewrite(self, token=token)
if token is None:
break

This comment was marked as spam.

This comment was marked as spam.

@erikwebb erikwebb force-pushed the blob-update_storage_class-fix-6524 branch from 915ee59 to 108f733 Compare December 12, 2018 20:07
@tseaver tseaver added kokoro:force-run Add this label to force Kokoro to re-run the tests. and removed do not merge Indicates a pull request not ready for merge, due to either quality or timing. labels Dec 12, 2018
Copy link
Contributor

@tseaver tseaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready to merge once CI is green (I had to kick Kokoro just now).

@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 12, 2018
@tseaver tseaver merged commit b98b7a2 into googleapis:master Dec 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. cla: yes This human has signed the Contributor License Agreement. performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants