Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

public_url returning quoted_name with '/' #3809

Closed
jlvcm opened this issue Aug 14, 2017 · 19 comments
Closed

public_url returning quoted_name with '/' #3809

jlvcm opened this issue Aug 14, 2017 · 19 comments
Assignees
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@jlvcm
Copy link

jlvcm commented Aug 14, 2017

i'm adding my files inside folders, so i give the blob name as "UUID/file.ext", but when i request the public_url() i get the fullname as quoted example: ".../MYBUCKET/UUID%2Ffile.ext"

both "%2F" and "/" in the link downloads the same file, but with different names

@lukesneeringer lukesneeringer added api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Aug 14, 2017
@lukesneeringer
Copy link
Contributor

Hi @jlvcm,
Thanks for reporting. We will look into this.

@yusefnapora
Copy link

yusefnapora commented Sep 6, 2017

Just to add, this also breaks generate_signed_url for objects with / characters. The signed string contains %2F instead of /, so the signature doesn't match the one generated by the server.

The problem seems to be this line: https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/storage/google/cloud/storage/blob.py#L1579

The safe='' overrides the default behavior of urllib.parse.quote, which is to leave / characters alone. If you remove that param it should fix the issue, although you might need to confirm that other callers aren't expecting slashes to be escaped.

@pior
Copy link

pior commented Oct 18, 2017

Hitting this bug as well.
Is there an ETA for the fix?

@tseaver tseaver self-assigned this Jan 8, 2018
@tseaver tseaver added the status: investigating The issue is under investigation, which is determined to be non-trivial. label Jan 8, 2018
@tseaver
Copy link
Contributor

tseaver commented Jan 8, 2018

Investigating: when I remove the safe='' from the quoting, I have failing system tests: the back-end does not allow fetching the file with the unquoted / embedded.

@frankyn Can you please clarify whether the client libraries should be quoting the / when it is part of an object name?

@tseaver
Copy link
Contributor

tseaver commented Jan 8, 2018

@frankyn
Copy link
Member

frankyn commented Jan 10, 2018

Apologies for the delay @tseaver. Clarifying question, does quoting in Python represent an HTML URL encoder?

@tseaver
Copy link
Contributor

tseaver commented Jan 10, 2018

@frankyn The quoting we are using is "URL" quoting (python2, python3): the / character gets converted to a %2F.

@frankyn
Copy link
Member

frankyn commented Jan 10, 2018

Thanks for clarifying, I generated a public URL using the Cloud Console Storage browser and quoting doesn't occur there. I'll forward the question to GCS team just to make sure, but given the behavior mentioned in comment#3 it sounds reasonable that quote shouldn't be quoting / in the resource path.

Thank you for your patience!

@frankyn
Copy link
Member

frankyn commented Jan 17, 2018

Response from GCS team is the URL shouldn't be "quoted" unless there's a specific reason. I don't have enough context to understand why a test breaks when quoting is removed.

Taking a look at the Ruby storage client library. When I generate a public_url, the result isn't quoted.

require "google/cloud/storage"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "bucket-name"
file = bucket.file "bucket-name"

puts file.public_url

=> "https://storage.googleapis.com/www.coderfrank.com/osdf/oijsd/sdfsdf/test.txt"

@tseaver
Copy link
Contributor

tseaver commented Jan 18, 2018

Hmm, I guess we need to figure out where the test failures in #4716 are coming from, then.

@tseaver
Copy link
Contributor

tseaver commented Feb 22, 2018

@frankyn Some evidence that the back-end actually wants the / separator escaped: the JSON returned from objects.insert for an object whose name is parent/child/filename.txt looks like so:

{'bucket': 'gcp-3809-1519337419615373',
 'contentType': 'text/plain',
 'crc32c': 'U35c9A==',
 'etag': 'COy2x+fFutkCEAE=',
 'generation': '1519337650379628',
 'id': 'gcp-3809-1519337419615373/parent/child/filename.txt/1519337650379628',
 'kind': 'storage#object',
 'md5Hash': 'ogBPN3MLlEVnCnOPoPye5Q==',
 'mediaLink': 'https://www.googleapis.com/download/storage/v1/b/gcp-3809-1519337419615373/o/parent%2Fchild%2Ffilename.txt?generation=1519337650379628&alt=media',
 'metageneration': '1',
 'name': 'parent/child/filename.txt',
 'selfLink': 'https://www.googleapis.com/storage/v1/b/gcp-3809-1519337419615373/o/parent%2Fchild%2Ffilename.txt',
 'size': '19',
 'storageClass': 'STANDARD',
 'timeCreated': '2018-02-22T22:14:10.217Z',
 'timeStorageClassUpdated': '2018-02-22T22:14:10.217Z',
 'updated': '2018-02-22T22:14:10.217Z'}

If I dont use the escaped form when making API calls against the object (e.g., GET or DELETE) then it 404s.

The fact that the / doesn't seem escaped in the console is not relevant here: that is a Jedi mind trick. :)

@tseaver
Copy link
Contributor

tseaver commented Feb 22, 2018

So, it looks to me as though the correct resolution is to update Blob.public_url (and maybe also Blob.generate_signed_url?) to avoid quoting embedded / characters, but continue quoting them for all API requests. I have updated #4716 accordingly.

@theacodes
Copy link
Contributor

theacodes commented Feb 22, 2018 via email

@frankyn
Copy link
Member

frankyn commented Feb 22, 2018

Oh gosh, that's confusing. So, it fails when you don't quote the object name during other operations. IIUC, then it would make sense to not quote "/" in the returned public_url and generated_signed_url instead.

Thanks @tseaver! Apologies, that I wasn't much help here.

@frankyn
Copy link
Member

frankyn commented Feb 24, 2018

@tseaver, I was able to find a better answer for this case.

The existing client library uses the XML API endpoints defined by _API_ACCESS_ENDPOINT which is used for both public_urls and signed_urls.
For the JSON API, percent-encoding is required and will fail as you observed in your prior comment.
For the XML API, percent-encoding is not required but does require UTF-8 encoding. That's why at first it looks like a Jedi mind trick with the public url generated by for example the Cloud Console Storage Browser as it uses the XML API endpoint.

JSON API endpoint - https://www.googleapis.com/storage/v1{path}
XML API endpoint - https://storage.googleapis.com

As for this PR, for the recommendation is to not quote '/'. Rationale is based more on the ease of parsing '/' versus '%2F'.

@tseaver tseaver removed the status: investigating The issue is under investigation, which is determined to be non-trivial. label Feb 26, 2018
@tseaver
Copy link
Contributor

tseaver commented Feb 26, 2018

#4716 will close this issue when it merges.

@frankyn
Copy link
Member

frankyn commented Feb 28, 2018

Thanks @tseaver ! I was OOO yesterday. Cheers!

@atteneder
Copy link

Maybe it's a regression, but I still get encoded slashes (%2F) in Blob.public_url in gcloud 0.18.3.

I un-encode them manually for the moment, but inconsistency (for example with the console's public URL) is not desirable imho.

Should this be re-opened? Or a new Issue?

@frankyn
Copy link
Member

frankyn commented Apr 15, 2020

Hi @atteneder,

Openning a new issue, thank you for raising this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

8 participants