-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aiohttp client throws http errors for the following redirect #2624
Comments
Thanks for report |
I am also getting this error "aiohttp.client_exceptions.ClientResponseError: 400, message='unexpected content-length header'
|
Most likely the server responds with at least two |
Thanks to Postel's Law, many webservers emit invalid http. This is similar to many webpages being invalid html. Yet browsers display these pages. The html5 standard now standardizes how everyone is supposed to treat broken html; no such standard exists for broken http. I'd like to know (1) are you interested in fixing this to work like browsers or (2) will you take patches that fix it to work like browsers or (3) aiohttp is a thing of beauty which perfectly implements the standard :-) For (1) I can provide a large number of test cases, and help triage them. For (2) I can write patches for the things which are most common in my web crawls. For (3) I will admire your idealism. |
I definitely prefer option (2), but let's discuss fixes case by case. |
|
@iho just install aiohttp 3.0 |
Example of code import aiohttp
import asyncio
async def main():
url = 'https://flyp.me/api/v1/order/create'
data = {
"order": {
"from_currency": "LTC",
"to_currency": "ZEC",
"ordered_amount": "0.01",
"destination": "t1SBTywpsDMKndjogkXhZZSKdVbhadt3rVt"
}
}
async with aiohttp.ClientSession() as session:
async with session.post(url, json=data) as response:
print(await response.text())
loop = asyncio.get_event_loop()
loop.run_until_complete(main()) Traceback Traceback (most recent call last):
File "/home/user/.pyenv/versions/flyp/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 678, in start
(message, payload) = await self._protocol.read()
File "/home/user/.pyenv/versions/flyp/lib/python3.6/site-packages/aiohttp/streams.py", line 533, in read
await self._waiter
File "/home/user/.pyenv/versions/flyp/lib/python3.6/site-packages/aiohttp/client_proto.py", line 161, in data_received
messages, upgraded, tail = self._parser.feed_data(data)
File "aiohttp\_http_parser.pyx", line 295, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in header'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "example.py", line 20, in <module>
loop.run_until_complete(main())
File "/home/user/.pyenv/versions/3.6.4/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
return future.result()
File "example.py", line 16, in main
async with session.post(url, json=data) as response:
File "/home/user/.pyenv/versions/flyp/lib/python3.6/site-packages/aiohttp/client.py", line 779, in __aenter__
self._resp = await self._coro
File "/home/user/.pyenv/versions/flyp/lib/python3.6/site-packages/aiohttp/client.py", line 331, in _request
await resp.start(conn, read_until_eof)
File "/home/user/.pyenv/versions/flyp/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 683, in start
message=exc.message, headers=exc.headers) from exc
aiohttp.client_exceptions.ClientResponseError: 400, message='invalid character in header' |
The problem is parsing the response by upstream nodejs HTTP parser. |
worth upgrading vendored lib |
@asvetlov thank you! |
@webknjaz upstream didn't fix the problem, it has added support for SOURCE HTTP verb only. |
Another example of getting
gives:
response that is gives the error (
and actual response that I'd like to parse (can't provide a public repro for it, but it gives the same error) is
|
I found that aiohttp behaved much better with Crawlera (at least with some sites) if I avoided the proxy_auth argument and explicitly entered my API key in the url. For example:
|
Hi all, I'm using python 3.7, on aiohttp-3.4.4. Cheers |
@unl1k3ly but |
@asvetlov thanks for prompt reply mate. I'm not sure what you mean. The request works with curl and python requests module. With aiohttp i get that error as output. In fact, my endpoint returns that http header... would be an away to bypass this exception and finally print it's content ? Cheers |
All im getting now is Cheers |
So, more updates on this... I've just tested with requests-futures and grequests and both seems to be returned the right content rather than raise an exception upon a response header. Thank you for all support. |
If you want to modify a parser code to recover after invalid headers string -- a PR is welcome. |
So, it makes impossible for aiohttp server to process requests with http signatures |
I'm stuck on aiohttp 3.6.3 because with aiohttp 3.7 and 3.8 I get an invalid character in header exception. The AIOHTTP_NO_EXTENSIONS workaround did not solve the issue for me. I would really appreciate a way to recover from the error and still receive the response body. |
This is invalid, because a response should be finished after receiving a 0 length chunk: i.e. The sample in the original issue seems to be fine now. |
Long story short
I have been fetching the front pages of millions of websites using aiohttp, and collected a large number of cases where aiohttp client's http parser throws errors for stuff that browsers appear to think is fine. Some of these are real bugs in aiohttp's parser, others might be places where browsers do not obey the standard, and aiohttp might want to be more forgiving.
Here's an initial bug to see if you'd like me to do more triage on these.
Here is a 302 redirect that seems to work fine in curl and Firefox but aiohttp's http parser pukes on it.
Expected behaviour
Note Content-Length: 0.
If I tell curl to follow the redirect:
that works and I see the actual https robots.txt file. My browsers also follow this redirect.
Actual behaviour
bug.py throws:
Steps to reproduce
$ python bug.py
More examples
Since I'm crawling a lot of terrible websites, I have an easy ability to find more examples.
Other messages
Your environment
aiohttp 2.3.6 CLIENT
Python 3.6.4
Linux (CentOS 7.4.1708)
The text was updated successfully, but these errors were encountered: