Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL Params are decoded too often #1474

Closed
0xb35c opened this issue Dec 12, 2016 · 11 comments
Closed

URL Params are decoded too often #1474

0xb35c opened this issue Dec 12, 2016 · 11 comments
Labels
Milestone

Comments

@0xb35c
Copy link

0xb35c commented Dec 12, 2016

Long story short

Passing a=%25s3 as a URL parameter causes an exception, saying "ValueError: Unallowed PCT %s3"
a=%25s3 is at some point decoded to a=%s3 and tried to parse as URL, which is of course invalid.

Expected behaviour

a=%25s3 should not be decoded to a=%s3

Actual behaviour

a=%25s3 is at some point decoded to a=%s3 and tried to parse as URL

Steps to reproduce

I debugged the whole issue. It seems the problem is in the file client_reqrep.py:77-81, where (if i understand correctly) the passed params (as dict) are merged with the params already in the URL.
Here is a code snippet showing the issue.

In [19]: URL('http://foo.bar').with_query(URL('http://bar.foo').with_query('a=%25').query)
Out[19]: URL('http://foo.bar/?a=')

In [20]: URL('http://bar.foo').with_query('a=%25')
Out[20]: URL('http://bar.foo/?a=%25')

In [21]: URL('http://bar.foo').with_query('a=%25').query
Out[21]: <MultiDictProxy('a': '%')>

Also here is some testcode:

import aiohttp
import asyncio

async def main(loop):
    params = {'key1': '%25s3', 'key2': 'value2'}
    with aiohttp.ClientSession() as session:
        async with session.get('http://127.0.0.1', params=params) as resp:
            print(resp.status)
            print(await resp.text())

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main(loop))
    loop.close()

Your environment

  • Linux-Arch 4.8.11-1-ARCH
  • Python 3.5.2
  • aiohttp 1.1.4
@asvetlov
Copy link
Member

Thank you for report

@0xb35c
Copy link
Author

0xb35c commented Dec 12, 2016

No problem. Also in that context: I'd like to have a flag to mark URLs, data and params so that they won't be escaped or quoted. For testing web applications it might make sense to actually send a path or param like foo.bar/%s3?a=%s3

@wenxin-wang
Copy link

wenxin-wang commented Feb 7, 2017

Confirmed with

  • yarl==0.8.1
  • aiohttp==1.2.0

For example:

from yarl import URL
str(URL("'"))
str(URL("%27"))
# Both evaluates to "'"

This is causing unexpected redirect loop in aiohttp.ClientSession.get when the server return redirection to the quoted URLs on seeing unquoted ones, in my case for example, http://starbounder.org/Bolt_O's

Right now URL's quoting could be bypassed by using urllib.parse.quote and the encoded keyword parameter in URL.__init__:

from yarl import URL
from urllib.parse import quote
str(URL(quote("'"), encoded=True))
# Evaluates to "%27"

@fafhrd91
Copy link
Member

fafhrd91 commented Feb 7, 2017

i think this is fixed with yarl 0.9.1

@fafhrd91
Copy link
Member

fafhrd91 commented Feb 7, 2017

@0xb35c @wenxin-wang could you check aiohttp master and yarl 0.9.1?

@wenxin-wang
Copy link

@fafhrd91 I removed the original venv and used

pip install git+https://github.com/KeepSafe/aiohttp.git
# aiohttp==1.3.0a0
# yarl==0.9.1

I still got the problem above.

from yarl import URL
str(URL("'"))
str(URL("%27"))
# Both evaluates to "'"

To be more specific, try

import asyncio as aio
import aiohttp
async def get_content(session, url):
    async with session.get(url) as res:
        await res.release()
        return res.history

async def main(loop):
    async with aiohttp.ClientSession(loop=loop) as session:
        return await get_content(session, "http://starbounder.org/Bolt_O's")

loop = aio.get_event_loop()
print(loop.run_until_complete(main(loop)))

You will get 10 redirections where the server sent redirect to http://starbounder.org/Bolt_O%27s but the client still visit the unquoted URL.

@fafhrd91 fafhrd91 added this to the 1.4 milestone Feb 8, 2017
@fafhrd91 fafhrd91 modified the milestones: 2.1, 2.0 Mar 15, 2017
@lukkm
Copy link

lukkm commented Mar 27, 2017

Are there any updates on this thread? Also, is there any way to work around this for now? Like, send a request to http://bar.foo/?a=%25 without the client crashing.

@fafhrd91
Copy link
Member

@lukkm you can use yarl.URL('http://bar.foo/?a=%25', encoded=True) as url

@vladarts
Copy link

vladarts commented Mar 28, 2017

Hello! I have same problem.

During redirect, Location header is
https://yandex.ru/showcaptcha?cc=1&retpath=https%3A//yandex.ru/search%3Ftext%3D%25D1%258F%2B%25D0%25B2%2B%25D1%2588%25D0%25BE%25D0%25BA%25D0%25B5_ae7b4fe85db2da139661590c878cf956&t=0/1490699030/593528fc0cbb55f78c768d1184d492c8&s=fbbbb021daa71c81162711feefe39e22

But really request goes to url
https://yandex.ru/showcaptcha?cc=1&retpath=https://yandex.ru/search?text%3D%25D1%258F%2B%25D0%25B2%2B%25D1%2588%25D0%25BE%25D0%25BA%25D0%25B5_ae7b4fe85db2da139661590c878cf956&t=0/1490699030/593528fc0cbb55f78c768d1184d492c8&s=fbbbb021daa71c81162711feefe39e22

on yandex.ru it is a problem

if patch aiohttp.client.ClientSession#_request method (use url = URL(url, encoded=True) instead of url = URL(url) ) - everything works OK

UPDATE

This code is working

async with self.session.get(url, allow_redirects=False) as _response:
    response = Selector(await _response.text())

if _response.status == 302:
    redirect_url = URL(_response.headers['Location'], encoded=True)
    async with self.session.get(redirect_url) as _response:
        response = Selector(await _response.text())

@fafhrd91
Copy link
Member

added requote_redirect_url session attribute.

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants