Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"%" is not escaped in path, user and password in URL.build() #502

Closed
serhiy-storchaka opened this issue Sep 18, 2020 · 7 comments · Fixed by #503
Closed

"%" is not escaped in path, user and password in URL.build() #502

serhiy-storchaka opened this issue Sep 18, 2020 · 7 comments · Fixed by #503

Comments

@serhiy-storchaka
Copy link
Contributor

serhiy-storchaka commented Sep 18, 2020

>>> from yarl import URL
>>> URL.build(scheme="http", host="example.com", path="/%25")
URL('http://example.com/%25')
>>> URL.build(scheme="http", host="example.com", path="/%25").path
'/%'
>>> URL.build(scheme="http", host="example.com", user="%25")
URL('http://%25@example.com')
>>> URL.build(scheme="http", host="example.com", user="%25").user
'%'
>>> URL.build(scheme="http", host="example.com", password="%25")
URL('http://:%25@example.com')
>>> URL.build(scheme="http", host="example.com", password="%25").password
'%'

All other parts are properly escaped. For example:

>>> URL.build(scheme="http", host="example.com", fragment="%25")
URL('http://example.com/#%25')
>>> URL.build(scheme="http", host="example.com", fragment="%25").fragment
'%'
>>> URL.build(scheme="http", host="example.com", query={"%25": "%25"})
URL('http://example.com/?%2525=%2525')
>>> URL.build(scheme="http", host="example.com", query={"%25": "%25"}).query
<MultiDictProxy('%25': '%25')>

yarl 1.5.1

@webknjaz
Copy link
Member

Is it a bug report or a feature request? AFAIU it's not explicitly documented (nor tested) how this API should behave. I think we need to do this first in order to decide how to move forward.

cc @asvetlov

@serhiy-storchaka
Copy link
Contributor Author

It looks like a bug. The path is interpreted as (partially) encoded while encoded=False (by default). How otherwise I can create a URL with path containing literal %25?

@webknjaz
Copy link
Member

Fair enough. This does seem inconsistent. FWIW the note @ https://yarl.readthedocs.io/en/latest/api.html#yarl.URL says:

Sometimes encoding performed by yarl is not acceptable for certain WEB server.

Passing encoded=True parameter prevents URL auto-encoding, user is responsible about URL correctness.

Don’t use this option unless there is no other way for keeping URL attributes not touched.

Any URL manipulations don’t guarantee correct encoding, URL parts could be re-quoted even if encoded parameter was explicitly set.

Even though it's written in one place, it seems to be referring to any uses of the encoded throughout the lib.

From what I can gather, you're supposed to take care of doing encoding by yourself if you use encoded=True but if encoded=False is used, YARL attempts to guess whether to autoencode the input and when you pass %25 it treats that as already encoded string trying to emulate a typical web-browser behavior.

This is why I originally stated that the corner cases are poorly documented.

@webknjaz
Copy link
Member

P.S. It's worth nothing that yarl relies on stdlib urllib.parse and I have a hunch that it may affect some of the behaviors. It definitely needs more explicit test cases.

@serhiy-storchaka
Copy link
Contributor Author

Seems the problem is due to using the same quoters for URL() and URL.build(). In URL() components can be partially encoded. They can contain %-escaped parts and characters which need to be encoded. But URL.build() is expected to get non-encoded values, so path "/%25" means "/%25", not "/%". Quoters in URL.build() should use requote=False.

@serhiy-storchaka
Copy link
Contributor Author

There is the same issue with with_xxx() methods:

>>> URL('http://example.org').with_path('/%2d').path
'/-'
>>> URL('http://example.org').with_name('%2d').path
'/-'
>>> URL('http://example.org').with_fragment('%2d').fragment
'-'
>>> URL('http://example.org').with_user('%2d').user
'-'
>>> URL('http://example.org').with_password('%2d').password
'-'

@besfahbod
Copy link
Contributor

Thanks for this fix! It addresses a major issue with getting Unicode values in and out intact.

Although, I should note that, the change, although fixing a bug, was a large change in behavior for the main API, and is probably breaking many application code, such as aiohttp. I, personally, ended up here when trying to understand why a minor-version upgrade resulted in so many test failures in our application code.

My suggestion would have been to roll this out as a major-version update, considering it a API-breaking change. I guess it's a bit too late for that, though. But, worth considering if something like this happen again, or someone thinks it's really worth making that happen now.

netbsd-srcmastr referenced this issue in NetBSD/pkgsrc Oct 2, 2020
Fix dependencies.

1.6.0 (2020-09-23)
==================

Features
--------

- Allow for int and float subclasses in query, while still denying bool.
  `#492 <https://github.com/aio-libs/yarl/issues/492>`_


Bugfixes
--------

- Do not requote arguments in ``URL.build()``, ``with_xxx()`` and in ``/`` operator.
  `#502 <https://github.com/aio-libs/yarl/issues/502>`_
- Keep IPv6 brackets in ``origin()``.
  `#504 <https://github.com/aio-libs/yarl/issues/504>`_
bmwiedemann referenced this issue in bmwiedemann/openSUSE Oct 29, 2020
https://build.opensuse.org/request/show/838272
by user dirkmueller + dimstar_suse
- update to 1.6.0:
  - Allow for int and float subclasses in query, while still denying bool.
    `#492 <https://github.com/aio-libs/yarl/issues/492>`_
  - Do not requote arguments in ``URL.build()``, ``with_xxx()`` and in ``/`` operator.
    `#502 <https://github.com/aio-libs/yarl/issues/502>`_
  - Keep IPv6 brackets in ``origin()``.
    `#504 <https://github.com/aio-libs/yarl/issues/504>`_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants